close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2010263567

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010263567
An external voice is extracted in a simultaneous call state. A first adaptive filter (111, 114) and a
second adaptive filter (112, 115) for setting and updating a filter coefficient simulating a transfer
system from a speaker to a microphone, and an input audio signal input to the speaker A
subtraction unit 12 and a subtraction unit 13 for extracting first and second residual signals,
which are differences between a simulation signal obtained by arithmetic processing with the
first and second adaptive filters and a microphone input speech signal, and a subtraction unit A
cancellation amount comparison unit 16 that monitors the difference amount between the
microphone input speech signal and the first residual signal at 12 and the difference amount
between the microphone input speech signal and the second residual signal in the subtraction
unit 13; And an extraction signal transmission unit for transmitting the residual signal on the
side. [Selected figure] Figure 1
Voice extraction device
[0001]
The present invention relates to a voice extraction apparatus that extracts and extracts voice by
suppressing and preventing acoustic echo and howling.
[0002]
In a hands-free phone system such as a conference system in which a call is performed using a
speaker and a microphone (FIG. 7), the transmission voice signal of the normal speaker A is
reproduced from the speaker on the speaker B side and at the same time the speaker The sound
15-04-2019
1
is received by the microphone of B, and therefore, it is reproduced from the speaker of the
speaker A side.
As a result, on the speaker A side, the voice uttered by oneself is outputted from the speaker on
the other side, and this can be heard as an echo.
[0003]
Also, the echo reproduced from the speaker on the speaker A side is received by the microphone
on the speaker A side to form a closed loop of the signal, and when the gain exceeds 1, howling
occurs.
[0004]
As a related technology for suppressing and preventing such acoustic echo and howling based on
adaptive signal processing, a speech communication system including an acoustic echo canceller
(Patent Document 1) and a loudspeaker apparatus including a howling canceler (Patent
Document 2) Is disclosed.
Also, as shown below, related techniques are disclosed in “Sound system and digital
processing” (co-authored by Toshiro Oga and Yoshio Yamazaki).
[0005]
In this acoustic echo canceller, for example, as shown in FIG. 8 (speaker B side), the reception
signal x (k), which is the voice of the other party (speaker A), is reproduced from the reception
speaker and the room acoustic transmission system is Then, it is received by the transmitting
microphone as an acoustic echo y '(k). Here, when the acoustic impulse response in the room is
represented by h '(k), y' (k) is a signal obtained by convolving x (k) and h '(k).
[0006]
The acoustic echo canceller obtains an estimated value h (k) of the impulse response of the room
15-04-2019
2
acoustic transmission system, and convolves it with the reception signal x (k) to synthesize an
estimated echo signal y (k). Acoustic echo is canceled by subtracting the synthesized y (k) from
the signal received by the microphone.
[0007]
Since the acoustic impulse response h '(k) in the room changes with changes in the surrounding
environment such as the movement of the position of the speaker or the microphone, an adaptive
filter is usually used to estimate h' (k). . Also, as this adaptive filter, an FIR filter is used because
stable real time operation is possible. At this time, the coefficient of the FIR filter becomes the
estimated value h (k) of the impulse response of the room acoustic transmission system.
[0008]
Furthermore, the adaptive filter calculates the filter coefficient (impulse response estimated
value) h (k) such that the power of the error signal e (k) is minimized when the reception signal x
(k) is present. Here, the error signal e (k) is calculated by the following [Equation 1]. (Equation 1)
e (k) = y '(k) + s (k) -y (k)
[0009]
At this time, if the transmission signal s (k) is 0, the error signal e (k) represents the echo
cancellation error y ′ (k) −y (k), and the filter coefficient h (k) minimizing this Is a good
estimate of the impulse response of the echo path.
[0010]
However, in a two-way call there is a double-talk, in which there is a transmit signal s (k).
If the transmission signal s (k) is present, e (k) does not become an echo cancellation error signal,
and estimation of the impulse response in this state causes an error in estimation. Therefore, in
the simultaneous call state, the adaptive operation of the adaptive filter is stopped or the
adaptation speed is reduced (Patent Document 1).
15-04-2019
3
[0011]
Next, a block diagram of an example of a loudspeaker system and a howling canceller is shown in
FIG. In this loudspeaker system, a microphone receives s (k), which is the speech of a utterer or
the sound of an instrument, etc. by a microphone and amplified by an amplifier in the same space
(indoor) with the utterer. It is a system which reproduces with the speaker.
[0012]
Further, the sound emitted from the speaker is received by the microphone through the indoor
space transmission system h 'to form a closed loop. If the gain of the amplifier is too large in this
system, the gain of the closed loop becomes 1 or more, and howling sounds.
[0013]
The howling canceller for suppressing this howling estimates the transfer function between the
speaker and the microphone and subtracts the synthesized signal y (k) from the microphone
reception signal, as in the above-mentioned acoustic echo canceller. Thus, the feedback signal y
'(k) is eliminated.
[0014]
However, whenever there is a signal x (k) necessary to estimate the transfer function, the speech
of the speaker (jamming signal) s (k) is input to the microphone.
This state corresponds to the simultaneous communication state in the acoustic echo canceller.
Also, the feedback signal y '(k) has a strong correlation with the estimated interference signal s
(k). Thus, the howling canceller must estimate the space transfer system under inferior
conditions compared to the acoustic echo canceller.
[0015]
For this reason, when using an adaptive algorithm, the method of dealing with when an SN ratio
is bad, ie, a method of making a step size small enough, and ensuring an estimation precision, is
15-04-2019
4
disclosed (patent document 2).
[0016]
JP, 2006-270147, A JP, 2006-19707, A
[0017]
However, the related art disclosed in Patent Document 1 has a disadvantage that it is not possible
to accurately detect the simultaneous call state and the occurrence of the simultaneous call state.
Further, in the related art disclosed in Patent Document 2 described above, since it takes time to
estimate the indoor transfer system, there is a disadvantage that the variation of the transfer
system can not be sufficiently followed.
Furthermore, in the related art disclosed in Patent Documents 1 and 2 described above, it is
necessary to stop the adaptive operation of the adaptive filter at the time of simultaneous call
state or to reduce the convergence speed. There is a disadvantage that the ability to track
changes is reduced.
[0018]
In addition, when the error signal e (k) is used to detect the simultaneous speech state, e (k)
becomes the transmission signal s (k) when the adaptive operation of the adaptive filter is good,
so s (k) If there is an error in estimation of adaptation, it will be difficult to use e (k) stably.
Furthermore, the error signal e (k) is the final transmission signal after canceling the echo and
feedback signal, but if detection of the simultaneous call status fails and an error occurs in
impulse response estimation, There is a disadvantage that the transmission signal is deteriorated.
In addition, even when the adaptive operation is constantly updated without stopping, the quality
of the transmission signal may be degraded. This can be a serious problem particularly when the
quality of the extracted transmission signal is required to be high, such as when the transmission
signal is used as an input signal for speech recognition processing.
15-04-2019
5
[0019]
The object of the present invention is to improve the disadvantages of the above related art, and
to output an external voice during a simultaneous call state in which the feedback sound emitted
from the speaker and the external sound from the sound source other than the speaker are
picked up by the microphone. It is an object of the present invention to provide a speech
extraction apparatus capable of extracting effectively.
[0020]
In order to achieve the above object, an audio extraction apparatus according to the present
invention is adaptive signal processing connected to a microphone and extracting an external
audio signal input to the microphone from an external sound source other than a preset speaker
as an extraction signal The adaptive signal processing unit is a transmission system from the
speaker to the microphone based on the audio signal input to the speaker and the microphone
input audio signal input from the microphone. First and second adaptive filters for setting and
updating a filter coefficient simulating the first and second filters, and a simulation signal
obtained by arithmetic processing of an input voice signal input to the speaker by the first
adaptive filter and the microphone input The difference with the speech signal is extracted as a
first residual signal, and the first residual signal is used as the first adaptive signal. And a
difference between the simulation signal obtained by arithmetic processing of the input audio
signal by the second adaptive filter and the microphone input audio signal as a second residual
signal. And a second subtraction unit that sends the second residual signal to the second adaptive
filter unit, and a difference amount between the microphone input speech signal and the first
residual signal in the first subtraction unit. And a subtraction amount monitoring unit for
monitoring the difference between the microphone input speech signal and the second residual
signal in the second subtraction unit, and transmitting the residual signal on the high difference
side as the extraction signal And an extraction signal transmission unit.
[0021]
Since the present invention is configured and functions as described above, according to this, two
different adaptive filters for setting and updating filter coefficients and two different filters for
generating residual signals based on simulated signals from different adaptive filters are
provided. And two subtraction units, and a subtraction amount monitoring unit monitoring a
subtraction amount subjected to subtraction processing in each subtraction unit, and extracting
the residual signal generated by the subtraction processing with the high subtraction amount
among the generated residual signals. By transmitting the signal as a signal, it is possible to
provide a voice extraction device capable of effectively extracting the external voice even in the
simultaneous call state.
15-04-2019
6
[0022]
FIG. 1 is a schematic block diagram illustrating an embodiment including a voice input device
according to the present invention.
FIG. 1 is a schematic block diagram illustrating an embodiment including a voice input device
according to the present invention.
FIG. 1 is a schematic block diagram illustrating an embodiment including a voice input device
(acoustic echo cancellation system) according to the present invention.
It is a flowchart which shows the whole operation processing step at the time of learning in the
speech input device disclosed in FIG. It is a flowchart which shows the whole operation |
movement process step at the time of learning completion in the speech input device disclosed in
FIG. It is a flowchart which shows the whole operation processing step at the time of relearning
in the speech input device disclosed in FIG. It is the schematic block diagram which showed the
structural example of the hands-free phone which is a telephone call amplification system. It is
the block diagram which showed an example of the acoustic echo canceller in the handsfree
phone shown in FIG. FIG. 2 is a schematic block diagram showing an example of a loudspeaker
system.
[0023]
First Embodiment The basic configuration of the first embodiment of the present invention will
be described.
[0024]
The first embodiment is, as shown in FIG. 1, a voice input device 1 that inputs a user's uttered
voice to a car navigation system 5 installed in a car.
The voice input device 1 includes an adaptive filter unit 11 for acquiring a voice signal from a car
audio system 4 installed in a car, and a microphone 3 for collecting speech voice by the user. It
has become.
15-04-2019
7
[0025]
The car audio system 4 is assumed to emit music or radio broadcast as an audio signal. Further,
the car audio system 4 is provided with a speaker 2 for transmitting the same audio signal as the
audio signal acquired by the adaptive filter unit 11 (hereinafter referred to as "input signal x (k)").
[0026]
In addition, the car navigation system 5 is a car navigation system that performs address
specification by a voice recognition function, includes a voice recognition unit 6 therein, and the
car navigation system based on a speech signal input by the voice recognition unit 6 It is
assumed that a function for specifying an address in map information preset to 5 is provided. For
this reason, it is desirable that the transmission signal input to the voice recognition unit 6 have
higher quality when performing the addressing.
[0027]
In addition, the adaptive filter unit 11 of the voice input device 1 self sets a filter coefficient that
simulates an indoor transfer system (feedback transfer system) 100 from the speaker 2 to the
microphone 3. The voice input device 1 is a computer provided with a processor, and implements
the operation functions of the respective units and units described below by performing an
execution process based on a preset program.
[0028]
The speaker 2 emits an analog audio signal from the car audio system 4. The analog audio signal
is an audio signal generated by performing D / A (Digital / Analog) conversion on the input signal
x (k) input to the delay buffer 113, and this audio signal is It is assumed to be amplified through
[0029]
15-04-2019
8
The microphone 3 is installed in a car in which the car audio system 4 is installed, and inputs a
voice from the outside of the voice input device 1 into the voice input device 1 as a microphone
input voice signal. This microphone input signal is output (reproduced) from the speaker 2 and
received by the microphone 3 as a microphone input signal via the feedback transfer system
100. The microphone input signal is A / D converted by an A / D (Analog / Digital) converter (not
shown), and as shown in FIG. 1, as the feedback sound signal d (k), the addition units 12, 13 and
It is assumed that the cancellation amount calculation units 14 and 15 are input.
[0030]
Here, in order to specify the address to the car navigation system 5, for example, it is assumed
that the user utters “Tokyo Hachioji” to the microphone 3. In this case, the state of the voice
input to the microphone 3 is the feedback voice input from the speaker 2 to the microphone via
the feedback transfer system 100, and the addressing voice uttered by the user (voice "Tokyo
Hachioji": transmission A simultaneous call state (double talk state) including the speech signal s
(k)) is obtained.
[0031]
Further, in the voice input device 1, the voice from the car audio system 4 (identical to the voice
supplied to the speaker 2) is A / D converted by an A / D (Analog / Digital) converter (not
shown), and input digital It is input as a signal (hereinafter referred to as "input signal x (k)").
Here, this input digital signal (input signal x (k)) is stored in the delay buffer 13.
[0032]
The adaptive filter unit 11 calculates a filter coefficient based on a delay buffer unit 113 which
acquires the input signal x (k) and temporarily stores it, and a reference signal output from the
adders 12 and 13 described later. The configuration is configured to include calculation means
111 and 112 and inner product calculation means (adaptive filter) 114 and 115 for performing
inner product calculation processing (convolution calculation) using the filter coefficients
determined by the filter coefficient calculation means 111. Further, in the adaptive filter unit 11,
adaptive signal processing is performed in the filter coefficient calculation unit 111 and the inner
15-04-2019
9
product calculation unit 114, and in the filter coefficient calculation unit 112 and the inner
product calculation unit 115.
[0033]
Here, the delay buffer means 113 simulates the delay time τ of the feedback sound signal d (k)
through the feedback transfer system 100, and the inner product calculation means 114 and
115 have voice propagation characteristics of the feedback transfer system 100. The transfer
function is to be simulated. In the embodiment according to the present invention, as described
above, the input signal x (k) is supplied to the delay buffer 113 in parallel with the output from
the speaker 3 to obtain the inner product calculation means 114 and 115. The simulated signals
yf (k) and yb (k) output from the above can be approximated to the feedback sound signal d (k).
[0034]
The filter coefficient calculation unit 111 calculates the transfer function of the indoor transfer
system 100 based on the residual signal ef (k) output from the addition unit 12 and the delayed
audio signal x (k−τ) from the delay buffer unit 113. The filter coefficient of the inner product
calculating means 114 is calculated according to (simulate) this transfer function (filter
coefficient calculation function). Also, the filter coefficient calculation unit 111 updates the
calculated filter coefficient, and notifies the inner product calculation unit 114 of this (filter
coefficient update setting function). Thus, the filter coefficient in the inner product calculation
means 114 is set.
[0035]
The filter coefficient update setting function is performed so that the residual signal ef (k) is as
small as possible. In addition, the filter coefficient update setting function may be set to be
performed every preset time interval (for example, every several microseconds to several
hundreds of microseconds).
[0036]
Hereinafter, a state in which the filter coefficient is updated in each of the filter coefficient
calculation units 111 and 112 is referred to as a “learning state”.
15-04-2019
10
[0037]
Also, the filter coefficient calculation unit 111 has a learning stop execution function of stopping
the updating of the filter coefficient in accordance with the control signal from the cancellation
amount comparison unit 16 described below.
Thus, the filter coefficient calculation unit 111 completes learning (learning completion state)
when a certain amount of cancellation is obtained, and the filter coefficient is fixed at this point.
[0038]
Further, when the filter coefficient is rewritten by the coefficient copying unit 116 described
below, the filter coefficient calculation unit 111 notifies the inner product calculation unit 114 of
the rewritten filter coefficient. As a result, among the filter coefficients updated (calculated) by
the filter coefficient calculation means 111 and 112, the inner product calculation means has a
high cancellation amount, that is, the filter coefficients identified more accurately (precisely) of
the indoor transmission system 100. It can be set to 114.
[0039]
The filter coefficient calculation means 112 is based on the residual signal eb (k) outputted from
the addition unit 13 shown below and the delayed speech signal x (k-τ) from the delay buffer
means 113, in the room transfer system 100. A transfer function is estimated, and a filter
coefficient of the inner product calculating means 115 is calculated (filter coefficient calculation
function) according to (simulating) this transfer function. Also, the filter coefficient calculation
unit 112 updates the calculated filter coefficient, and notifies the inner product calculation unit
115 of this (filter coefficient update setting function). Thereby, the filter coefficient in the inner
product calculation means 115 is set.
[0040]
15-04-2019
11
The filter coefficient update setting function is performed so that the residual signal eb (k) is as
small as possible. In addition, the filter coefficient update setting function may be set to be
performed every preset time interval (for example, every several microseconds to several
hundreds of microseconds).
[0041]
During learning of the filter coefficient calculation unit 111, the filter coefficient calculation unit
112 simultaneously updates the filter coefficients.
[0042]
Further, the filter coefficient calculation means 111 and 112 respectively have at least two types
of parameters for controlling the convergence speed (convergence speed parameters), that is, the
parameter value v1 with high convergence speed and the parameter value v2 with low
convergence speed. It can be set.
[0043]
Here, when learning in the filter coefficient calculation unit 111 is completed, that is, when the
impulse response identified by the filter coefficient calculation unit 111 is stable, the filter
coefficient calculation unit 112 converges the adaptive control with a reduced degree of
identification. It is assumed that the calculation and updating (adaptive control) of the filter
coefficient is performed at the speed (state where the convergence speed is reduced: v2).
[0044]
As a result, the adaptive filter unit 11 can reduce fluctuations such as destruction of filter
coefficients and estimation errors with respect to simultaneous calls and the like in the
microphone 3 that may occur suddenly.
[0045]
During the learning of the filter coefficient calculation unit 111, the filter coefficient is updated
by the parameter (v1) having a fast convergence, and further, the filter coefficient calculation
unit 112 is also subjected to a parameter (v1) having a fast convergence. At the same time, the
filter coefficients are updated.
[0046]
In the embodiment according to the present invention, the learning states (during learning,
15-04-2019
12
learning stop, learning start (learning restart)) in the filter coefficient calculation means 111 and
112 are controlled by the cancellation amount comparison unit 16 described below. I assume.
[0047]
The cancellation amount comparison unit 16 determines that learning has been completed, for
example, when the cancellation amount of the cancellation amount calculation unit 14 exceeds
can1 dB (for example, 24 dB), and the filter coefficient update unit 111 calculates the filter
coefficient. Control to stop updating (learning stop).
[0048]
When the cancellation amount of the cancellation amount calculation unit 14 falls below can2 dB
(for example, 9 dB), the cancellation amount comparison unit 16 determines that relearning is
necessary, and the filter coefficient update unit 111 updates the filter coefficient. Control to
resume (start relearning).
At this time, the filter coefficient updating means 111 and 112 simultaneously start updating.
[0049]
Thereby, for example, when a change in the indoor transmission system 100 occurs due to a
change in the positions of the microphone 3 and the speaker 2, it is possible to perform adaptive
signal processing that is quickly adapted to the change.
[0050]
The estimation of the transfer function of the indoor (feedback) transfer system 100 and the
calculation and updating of the filter coefficients in the filter coefficient updating means 111 and
112 are performed using an adaptive algorithm.
Here, as an adaptive algorithm, for example, a learning identification method, an LMS method, a
projection method, an RLS method, or the like can be applied.
15-04-2019
13
[0051]
The delay buffer means 113 delays the input signal x (k) input from the car audio 4 by the delay
time .tau., And the delayed product x (k-.tau.) Of the delayed signal is multiplied by the inner
product calculating means 114, 115 and the filter coefficient. The information is input to the
calculation means 111 and 112.
[0052]
Specifically, the inner product calculation means 114 and 115 are digital filters (typically FIR:
Finite Impulse Response Filter), and filter coefficient calculation for determining the filter
coefficients of the inner product calculation means 114 and 115, respectively. It is configured to
be connected to the means 111 and 112.
Further, the inner product calculation means 114, 115 performs a convolution calculation
process on the input delayed signal x (k-τ) with the filter coefficient calculated by the filter
coefficient calculation means 111.
Thereby, the inner product calculating means 114 generates a simulation signal yf (k), and
outputs the simulation signal yf (k) to the addition unit 12.
Further, the inner product calculating means 115 generates a simulation signal yb (k), and
outputs the simulation signal yb (k) to the adding unit 13.
[0053]
In the embodiment according to the present invention, adaptive signal processing in the adaptive
filter unit 11 is performed using a high-speed H∞ filter (FHF: high-speed calculation filter)
disclosed in Japanese Patent No. 4067269.
The adaptive filter unit 11 accurately and quickly identifies the characteristics of the feedback
transfer system (indoor space transfer system) 100 from the speaker 2 to the microphone 3 by
calculating the adaptation coefficient at high speed for each fixed time using the FHF. can do.
15-04-2019
14
[0054]
Further, this high-speed H∞ filter can control the convergence speed of adaptive signal
processing by the parameter γf.
The parameter γf has a value of 0 <γf <100, and the larger the value, the slower the
convergence speed.
Here, in this high-speed H∞ filter, for example, γf1 as the parameter v1 having a high
convergence speed and γf2 as a parameter v2 having a low convergence speed (where γf1
<γf2) are set in advance.
[0055]
By using this high-speed H 、 filter, even in the simultaneous speech (double talk) state of voice
input device 1, the coefficient destruction (estimation error) of the filter coefficient is less likely
to occur, and further, abrupt fluctuation in feedback transfer system 100. And estimation errors
and the like caused by following minute fluctuations can be effectively reduced.
[0056]
The coefficient copying unit 116 duplicates the filter coefficient calculated by the filter
coefficient calculation unit 112 in response to the request from the cancellation amount
comparison unit 16, and executes the filter coefficient rewriting execution to rewrite the filter
coefficient of the filter coefficient calculation 111 by this filter coefficient. It has a function.
The coefficient copying unit 116 may be set as a function of the comparison / determination unit
16.
[0057]
The simulation signal yf (k) and the feedback sound signal d (k) are input to the addition unit 12.
15-04-2019
15
The adding unit 12 adds the simulation signal yf (k) (minus component) and the feedback sound
signal d (k) (plus component), and removes the simulation signal yf (k) from the feedback sound
signal d (k). The residual signal ef (k) is output to the cancellation amount calculator 14 and to
the filter coefficient calculator 111. Further, the simulation signal ef (k) output here is input to
the voice recognition unit 6 of the car navigation system 5 as a transmission signal (Sout).
[0058]
Here, when the voice input state in the microphone 3 is in the simultaneous communication state
in which the addressing voice from the user is input to the microphone 3 and the adaptive signal
processing in the adaptive filter unit 11 functions effectively, the adding unit The residual signal
ef (k) (that is, the transmission signal (Sout)) sent out from 12 contains only the transmission
signal s (k) which is the addressed voice from the user, so that high quality transmission can be
achieved. A signal can be input to the speech recognition unit 6.
[0059]
Similar to the adding unit 12, the adding unit 13 receives the simulation signal yb (k) and the
feedback sound signal d (k).
The addition unit 13 adds the simulation signal yb (k) (minus component) and the feedback
sound signal d (k) (plus component) and removes the simulation signal yb (k) from the feedback
sound signal d (k). The residual signal eb (k) is output to the cancellation amount calculator 15
and is also output to the filter coefficient calculator 112 as a reference signal.
[0060]
The feedback sound signal d (k) and the residual signal ef (k) are input to the cancellation
amount calculator 14. Here, the cancellation amount calculation unit 14 calculates the difference
value of the input signal. Here, the cancellation amount calculation unit 14 calculates the value of
d (k) / ef (k) (in decibel expression, d (k) −ef (k)).
[0061]
15-04-2019
16
The feedback amount signal d (k) and the residual signal eb (k) are input to the cancellation
amount calculator 15. Here, the cancellation amount calculation unit 14 calculates the difference
value of the input signal, similarly to the cancellation amount calculation unit 14. Here, the
cancellation amount calculation unit 14 calculates the value of d (k) / eb (k) (in decibel
expression, d (k) −eb (k)).
[0062]
The cancellation amount comparison unit 16 has a cancellation amount monitoring function that
constantly monitors the cancellation amount in the cancellation amount calculation units 14 and
15. In addition, the cancellation amount comparison unit 16 is a filter when the cancellation
amount (referred to as foreground cancellation amount) of the cancellation amount calculation
unit 14 reaches (exceeds) a cancellation amount threshold (can1 dB: for example, 24 dB) set in
advance. A learning stop control function is provided that determines that learning in the
coefficient calculation means 111 is completed, and stops the coefficient calculation update
function in the filter coefficient calculation means 111. Thereby, the filter coefficient calculation
means 111 stops the calculation and updating of the filter coefficient. At this time, in the filter
coefficient calculation means 112, calculation and updating of the filter coefficient are
continuously performed.
[0063]
Furthermore, when the cancellation amount comparison unit 16 executes the learning stop
control function, it performs control to reduce the convergence speed of the calculation update
of the filter coefficient in the filter coefficient calculation means 112 (decrease the degree of
identification) (step size control function) ). Specifically, the cancellation amount comparison unit
16 sets the convergence speed of the calculation update of the filter coefficient in the filter
coefficient calculation means 112 to the preset slower (step size) parameter v2. Here, when the
filter coefficient calculation means 112 is a high-speed H フ ィ ル タ filter, it is set to γf 2 as
described above. Thereby, in the filter coefficient calculation means 112, calculation and
updating of the filter coefficient are continuously performed in a state where the convergence
speed is reduced.
[0064]
15-04-2019
17
Thereby, when the surrounding environment of the voice input device 1 and the feedback
transfer system 100 are stable (at the time of learning completion), the unintended fluctuation
generated in the filter coefficient by the adaptive filter unit 11 in the simultaneous
communication (double talk) state It is possible to effectively suppress coefficient destruction and
estimation errors in adaptive signal processing which are caused by following minute changes.
[0065]
When the cancellation amount comparison unit 16 determines that the cancellation amount
(foreground cancellation amount) of the cancellation amount calculation unit 14 is smaller (lower
than) a predetermined cancellation amount threshold (can 2 dB, for example, 9 dB), It is
determined that relearning is necessary in the filter coefficient calculation means 111, 112, and
control is performed to set the convergence speed parameter in the filter coefficient calculation
means 111, 112 to the preset step size parameter v2 of the faster convergence speed Learning
start function).
As a result, learning is simultaneously resumed in the filter coefficient calculation means 111 and
112, and calculation update of the filter coefficient is started.
[0066]
Furthermore, the cancellation amount comparison unit 16 calculates the cancellation amount
(foreground cancellation amount) calculated by the cancellation amount calculation unit 14 and
the cancellation amount (background cancellation) calculated by the cancellation amount
calculation unit 15 when learning by the filter coefficient calculation unit 111 is completed.
Amount) and compare their magnitude (cancellation amount comparison function).
[0067]
At this time, when the background cancellation amount is larger than the foreground cancellation
amount, the cancellation amount comparison unit 16 instructs the coefficient copying unit 116
to copy the filter coefficient of the filter coefficient calculation unit 112 to obtain the filter
coefficient calculation unit 111. Control to replace filter coefficients (filter coefficient
replacement control function).
[0068]
As described above, in the embodiment according to the present invention, adaptive signal
15-04-2019
18
processing that quickly follows fluctuations in the indoor (feedback) transmission system 100 is
performed even in the simultaneous communication (double talk) state in the voice input device
1. Therefore, for example, it is possible to effectively remove only audio sounds such as music
and radio in a car, and to a car navigation 5 (as a transmission signal (Sout)) a transmission
signal in which an address is uttered. Since the input can be made, the voice recognition function
of the car navigation can be effectively used even in a state where an audio signal is being sent in
a car (double talk state).
[0069]
[Description of Operation of First Embodiment] Next, an operation of the voice input device 1
according to the first embodiment at the time of learning will be described based on a flowchart
of FIG.
[0070]
(At the time of learning) First, the filter coefficient calculation means 111 and 112
simultaneously perform calculation update processing of the filter coefficient (step S1).
At this time, in the filter coefficient calculation means 111, 112, calculation and update of the
filter coefficient are performed at high speed based on a preset parameter v1 having a high
convergence speed (parameter γf1 in the case of an H∞ filter).
Here, when the cancellation amount comparison unit 16 detects that the cancellation amount in
the cancellation amount calculation unit 14 exceeds can1 dB (for example, 24 dB) (step S2), the
cancellation amount comparison unit 16 detects the cancellation of the filter coefficient
calculation unit 111. The control to stop the calculation update operation (the learning
operation) is performed (step S3), and the learning operation in the filter coefficient calculation
unit 112 is performed based on the parameter v2 with slow convergence speed (parameter γf2
in the case of H∞ filter). The control is performed, that is, the process of calculating and
updating the filter coefficient in the filter coefficient calculation unit 112 is performed in a state
where the convergence speed is reduced (step S4).
[0071]
Next, the operation of the voice input device 1 at the time of learning completion (state) in the
adaptive filter unit 11 will be described based on the flowchart of FIG.
15-04-2019
19
[0072]
(When Learning Stops) First, the cancellation amount comparison unit 16 constantly monitors
the cancellation amounts of the cancellation amount calculation units 14 and 15 (step S11).
Here, when the background cancellation amount exceeds the foreground cancellation amount
(step S12), the cancellation amount comparison unit 16 instructs the coefficient copying unit
116 to execute the coefficient copying function (step S13).
The coefficient copying unit 116 acquires the filter coefficient calculated by the filter coefficient
calculation unit 112, and performs a process of rewriting the filter coefficient in the filter
coefficient calculation unit 111 (step S14).
Thus, the filter coefficient calculated (updated) by the filter coefficient calculation unit 112 is
copied by the coefficient copying unit 116 and rewritten to the filter coefficient calculated by the
filter coefficient calculation unit 111, and the rewritten filter coefficient is used. An inner product
operation (convolution operation) is performed.
[0073]
(Re-Learning Start) Next, the operation of the voice input device 1 when the re-learning operation
in the adaptive filter unit 11 is started in the first embodiment will be described based on the
flowchart of FIG.
[0074]
First, the cancellation amount comparison unit 16 constantly monitors the foreground
cancellation amount and the background cancellation amount in the cancellation amount
calculation unit (step S21).
When it is detected that the foreground cancellation amount falls below the preset can1 dB (for
example, 9 dB) (step S22), the cancellation amount comparison unit 16 determines that the
15-04-2019
20
feedback transmission system 100 has a fluctuation, and the adaptive filter The start of the
relearning operation is instructed to the unit 11 (step S23). In response to this instruction, the
filter coefficient calculation means 111 and 112 simultaneously start the relearning operation
(step S24). At this time, it is assumed that the filter coefficient calculation means 111 and 112
both perform the calculation and update operation of the filter coefficient at high speed based on
the parameter v1 (γf1) having a high convergence speed.
[0075]
As described above, in the voice input device (acoustic echo cancellation device) according to the
present embodiment, means for performing the adaptive operation of the adaptive filter in
parallel (specifically, filter coefficient calculation means and inner product calculation means);
The adaptive signal processing in the simultaneous communication state is performed with high
accuracy without performing the detection processing of the simultaneous communication state
with high accuracy by means of a simple configuration including a unit (cancellation amount
comparison unit) that monitors the cancellation amount of Can. Further, it is possible to
effectively suppress the deterioration of the transmission signal (Sout) processed and output by
the voice input device (acoustic echo cancellation device).
[0076]
Furthermore, as described above, by performing the adaptive operation of the adaptive filter in
the first, second, and third embodiments using the H フ ィ ル タ filter (corresponding to the
“fast calculation filter”), in the simultaneous communication (double talk) state However, the
adaptive operation of the adaptive filter can be performed quickly, and further, the coefficient
destruction (estimate error) of the filter coefficient can be suppressed, and further, it can be
caused by the influence of the rapid fluctuation in the feedback transfer system and the minute
fluctuation. It is possible to effectively reduce estimation errors and the like.
[0077]
Second Embodiment Next, a second embodiment according to the present invention will be
described.
The device configuration part of the voice input device 1 in the second embodiment has the same
configuration as that of the first embodiment described above, as shown in FIG. Further, in place
15-04-2019
21
of the car audio system 4 and the car navigation system 5 in the first embodiment described
above, the karaoke apparatus 7 is installed in a room set in advance and performs reproduction
output of a karaoke accompaniment sound signal. .
[0078]
The karaoke apparatus 7 mixes therein a reproduction unit for reproducing and outputting a
karaoke accompaniment sound signal, and mixes the karaoke accompaniment sound signal from
the reproduction unit with the transmission signal (Sout) processed by the voice input device 1.
And a mixer 8 that performs processing (mixing processing), and provides the speaker 2 with the
synthesized audio signal subjected to the mixing processing. Here, the same parts as in the first
embodiment described above are given the same reference numerals.
[0079]
Further, in the second embodiment, the voice input device 1 acquires the synthesized voice signal
from the karaoke device 7 (mixer 8) as the input signal x (k), and transmits the signal (Sout) to
the mixer 8 as well. Make an input.
[0080]
Thus, even in the state where the voice signal input from the microphone 3 includes the karaoke
accompaniment sound signal and the voice uttered by the speaker (corresponding to the
simultaneous call state), the voice input device 1 is a karaoke sound that is the feedback sound
signal. The accompaniment sound signal can be effectively removed, and furthermore, only the
speech signal by the speaker (user) can be input to the mixer 8 as the transmission signal (Sout),
whereby generation of howling in the karaoke apparatus 7 can be realized. It can be effectively
suppressed.
[0081]
Third Embodiment Next, a third embodiment according to the present invention will be
described.
In this third embodiment, as shown in FIG. 3, acoustic echo cancellation devices (voice input
devices) 31 and 32 are respectively installed on the speaker A side and the speaker B side, and
15-04-2019
22
the speakers A and B are self-sides. The speaker and the microphone installed at are used to
make an interactive call.
The internal device components of the acoustic echo cancellation devices 31 and 32 have the
same configuration as the voice input device 1 of the first and second embodiments described
above.
[0082]
Here, the acoustic echo canceller 31 functions to suppress the acoustic echo generated from the
speaker on the speaker A side, and the acoustic echo canceller 32 suppresses the acoustic echo
generated from the speaker on the speaker B side To function.
[0083]
Further, the transmission signal (Sout) from the acoustic echo cancellation device 32 is input as
the input signal x (k) to the adaptive filter unit of the acoustic echo cancellation device 31 via the
transmission line 30 (here, xa (k)).
On the other hand, the transmission signal (Sout) from the acoustic echo cancellation device 31 is
input as the input signal x (k) to the adaptive filter of the acoustic echo cancellation device 32 via
the transmission line 30 (here, xb (here k)).
[0084]
As a result, in the third embodiment, on the speaker B side, in the audio signal input from the
microphone B, the speech signal of the other party (speaker A) output from the speaker B and
the speech voice by the speaker B are generated. Even in the input state (simultaneous
communication state: double talk state), the acoustic echo cancellation device 32 effectively
removes the speech signal of the speaker A as the feedback sound signal, and only the speech
signal by the speaker B is It can be transmitted to the speaker A side (transmission path) as a
transmission signal (Sout). On the other hand, also on the speaker A side, similarly, the speech
signal of the other party (speaker B) output from the speaker A and the speech voice of the
speaker A are input to the audio signal input from the microphone A Even in the (simultaneous
communication state: double talk state), the acoustic echo cancellation device 31 effectively
15-04-2019
23
removes the speech signal of the speaker B as the feedback sound signal, and only the speech
signal by the speaker A is sent out (Sout Can be sent to the speaker B side (transmission path).
[0085]
Thereby, in the third embodiment, the generation of acoustic echo can be effectively suppressed,
and furthermore, the echo reproduced from the speaker A on the speaker A side is received by
the microphone A on the speaker A side This makes it possible to effectively suppress the
occurrence of a phenomenon in which the closed loop of the audio signal is formed (also on the
speaker B side), and thus the occurrence of howling can be effectively prevented.
[0086]
As described above in the first, second, and third embodiments, in the voice input device
(acoustic echo cancellation device) of the present invention, means for performing the adaptive
operation of the adaptive filter in parallel (specifically, filter coefficient calculation means and
inner product Adaptive signal in the simultaneous call state without performing high-precision
simultaneous call state detection processing by a simple configuration including the calculation
means) and a means (cancel amount comparison unit) that monitors the cancellation amount of
the adaptive operation Processing can be performed with high accuracy.
Further, it is possible to effectively suppress the deterioration of the transmission signal (Sout)
processed and output by the voice input device (acoustic echo cancellation device).
[0087]
Furthermore, as described above, by performing the adaptive operation of the adaptive filter in
the first, second, and third embodiments using the H フ ィ ル タ filter (corresponding to the
“fast calculation filter”), in the simultaneous communication (double talk) state However, the
adaptive operation of the adaptive filter can be performed quickly, and further, the coefficient
destruction (estimate error) of the filter coefficient can be suppressed, and further, it can be
caused by the influence of the rapid fluctuation in the feedback transfer system and the minute
fluctuation. It is possible to effectively reduce estimation errors and the like.
[0088]
The present invention can be usefully applied to an echo cancellation system in a conference
system, a mobile phone or the like, or a howling cancellation system in a voice expansion device
15-04-2019
24
such as karaoke.
[0089]
DESCRIPTION OF SYMBOLS 1 voice input (sound collection) apparatus 2 speaker 3 microphone
4 car audio 5 car navigation system 6 voice recognition part 7 karaoke sound source 8 mixer 11
adaptive filter part 12, 13 addition part 14, 15 cancellation amount calculation part 16
cancellation amount comparison part 100 Feedback transfer system 111, 112 Filter coefficient
calculation means 113 Delay buffer means 114, 115 Inner product calculation means
15-04-2019
25
Документ
Категория
Без категории
Просмотров
0
Размер файла
39 Кб
Теги
jp2010263567, description
1/--страниц
Пожаловаться на содержимое документа