close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2011254420

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011254420
The present invention provides an echo cancellation technology that changes the magnitude of
an echo suppression gain according to the situation to reduce echo distortion simultaneously
while sufficiently suppressing echo. An echo cancellation technology according to the present
invention uses an echo suppression gain Gb ^ (f, k) using a signal D (f, k) and a received signal X
(f, k) obtained based on a collected sound signal. Of the signal D (f, k) is used to determine
whether the signal to be suppressed is a vowel or a consonant, using the signal D ′ (f, k)
obtained by removing the echo component from the signal D (f, k). If it is determined that V is a
vowel, then γ is the relaxation coefficient β (k), otherwise γ (<γ) is the relaxation coefficient
β (k), and the signal D (f, k) and the echo The result of subtracting the product of signal D (f, k)
and relaxation coefficient β (k) from the product of suppression gain Gb ^ (f, k) and relaxation
coefficient β (k) and adding to D (f, k) Process to obtain [Selected figure] Figure 3
Echo cancellation method, echo cancellation apparatus and echo cancellation program
[0001]
The present invention relates to an echo cancellation technology that suppresses an echo
component caused by a reception signal reproduced by a speaker from a collected sound signal
collected by a microphone by multiplying a gain for each frequency and suppressing the echo
component.
[0002]
The echo canceler has a two-stage configuration of linear echo cancellation by an adaptive filter
15-04-2019
1
and non-linear echo suppression by amplitude spectrum control.
The echo canceler 10 described in Non-Patent Document 1 is known as a prior art of a two-stage
echo canceler. The outline of the echo canceler 10 will be described with reference to FIG.
[0003]
The reception signal x (n) reproduced by the speaker 2 passes through the echo path 5 and goes
around the microphone 3. The echo canceler 10 suppresses an echo component caused by the
reception signal x (n) reproduced by the speaker 2 from the collected sound signal y (n) collected
by the microphone 3. Here, n is an integer representing time.
[0004]
In this configuration, the adaptive filter unit 11 uses the reception signal x (n) input from the
reception end 1 to erase the echo component from the collected sound signal y (n) by linear
processing, and the residual echo signal d1 (n). Ask for). Further, in the frequency domain
conversion unit 13, L residual echo signals d1 (n) from the current time n to d1 (n), d1 (n-1), ...,
d1 (n-L + 1) are regarded as one frame, It is converted into a signal D1 (f, k) in the frequency
domain. D1 (f, k) is a Fourier transform of the residual echo signal d1 (n), f is a discrete angular
frequency, k is a frame time, and when the Fourier transform length is F, f is 1 to F It is an
integer.
[0005]
The noise suppression unit 15 suppresses noise components included in the residual echo signal
D1 (f, k) to obtain a noise removal signal D2 (f, k). The frequency domain conversion unit 17
converts the reception signal x (n) into a signal X (f, k) in the frequency domain. Further, the
residual echo suppression unit 18 uses the signal X (f, k) to suppress the residual echo
component included in the noise removal signal D2 (f, k), and transmits the transmission signal
D3 (f, k). Ask. In the time domain conversion unit 19, the transmission signal D 3 (f, k) is
converted to the transmission signal d 3 (n) in the time domain, and is output to the transmission
end 4.
15-04-2019
2
[0006]
Here, attention is focused on the portion of the echo suppression processing in the residual echo
suppression unit 18. The residual echo suppression unit 18 obtains an echo suppression gain G
(f, k), and multiplies D (f, k) that is an input signal of the residual echo suppression unit 18 by G
(f, k) in the frequency domain. Is suppressing the echo. Specifically, the echo suppression gain G
(f, k) is expressed by G (f, k) = (| D2 (f, k) | <2>-| Y ^ (f, k) | <2>) / | D2 Calculate as (f, k) | <2> (1).
| · | Represents taking an absolute value. Further, the transmission signal D3 (f, k) is calculated as
D3 (f, k) = G (f, k) D2 (f, k) (2). In equation (1), Y ^ (f, k) is a pseudo residual echo, and in NonPatent Document 1, E [| Y ^ (f, k) | <2>] = E [| H (f, k) | <2>] | X (f, k) | <2> + βE [| Y ^ (f, k−1) |
<2>] (3) H (f, k) represents a pseudo residual echo path, and the minimum value of the ratio of E
[| X (f, k) | <2>] to E [| D2 (f, k) | <2>], etc. Use to determine. E [•] represents taking a collective
average. β is an oblivion constant which is set to a value according to the reverberation time.
[0007]
The amplitude spectrum control in the residual echo suppression unit 18 can eliminate the
residual echo component remaining when the adaptive filter unit 11 can not completely cancel
the echo. However, unlike the adaptive filter unit 11, some transmission voices not related to the
echo are also suppressed according to the echo suppression amount. As a result, there is a
problem that the transmission voice is distorted and difficult to hear.
[0008]
Therefore, Non-Patent Document 1 proposes a method of setting an original sound addition rate
1-α as a method of reducing speech distortion. In other words, the echo suppression gain is
obtained by setting the transmission signal to D3 (f, k) = (1−α) D3 (f, k) + αG (f, k) D2 (f, k) (4)
instead of equation (2). Reduce the influence of G (f, k). Here, the original sound addition rate α
is a real number from 0 to 1.
[0009]
S. Hanuchi, Y. Haneda, M. Tanaka, J. Sasaki, A. Kataoka, "Acoustic Echo Canceller with Noise
15-04-2019
3
Suppression and Echo Suppression Function," Transactions of the Institute of Electronics,
Information and Communication Engineers, A, 2004, Vol. J-87-A, No. 4, pp. 448-457
[0010]
If the original sound addition rate is increased and the echo suppression gain is decreased, the
distortion of the voice is reduced, but the echo cancellation performance is correspondingly
reduced, and these two are in a trade-off relationship. Although the optimal original sound
addition rate varies depending on the signal to be suppressed, the original sound addition rate of
the prior art is fixed, and a value can not necessarily be set according to the situation, and an
optimal original sound addition rate can not be set. There is.
[0011]
In the echo canceler, when the original sound addition rate is optimum for the signal of the vowel
part, the signal of the consonant part is different from the original amplitude and additionally the
characteristic of the frequency spectrum is changed due to suppression, so that it is another
consonant It is thought that the harmful effect of mishearing arises. This will be described below
with reference to FIG. If the transmission voice is a vowel and the residual echo is superimposed
on the transmission voice (see FIG. 2A), even if the transmission echo is lost due to the residual
echo suppression processing, the original spectrum and the outline are very different. Not (see
FIG. 2B). When the transmission voice is a consonant at a similar original sound addition rate, if
the transmission echo is lost due to the residual echo suppression processing in the signal (see
FIG. 2C) in which the residual echo is superimposed on the transmission speech, the original
amplitude is small. In addition to this, the characteristics of the frequency spectrum change due
to suppression (see FIG. 2D), so that they differ greatly from the original spectrum, causing
problems such as misinterpretation of another consonant.
[0012]
Conversely, when the original sound addition rate is optimized for the signal of the consonant
part, there arises a problem that sufficient echo cancellation performance can not be obtained
with the vowel part.
[0013]
15-04-2019
4
In order to solve the above-mentioned problems, the echo cancellation technology according to
the present invention uses a signal D (n) obtained on the basis of a collected signal and a received
signal x (n) for each signal D in a frequency domain for each frame. Convert to f, k) and X (f, k),
and using the signals D (f, k) and X (f, k), determine the echo suppression gain Gb ^ (f, k) , K)
using the signal D ′ (f, k) from which the echo component has been removed, it is determined
whether the signal to be suppressed is a vowel or a consonant, and the signal to be suppressed is
determined to be a vowel In this case γ2 is the relaxation coefficient β (k), otherwise γ1 is the
relaxation coefficient β (k), the signal D (f, k) and the echo suppression gain Gb ^ (f, k) and the
relaxation coefficient The result obtained by subtracting the product of the signal D (f, k) and the
relaxation coefficient β (k) from the product with β (k) and adding to D (f, k) is obtained
Performs processing as a second residual echo suppression signal D3 (f, k) the calculated,
converted second residual echo suppression signal D3 (f, k) to the signal d3 (n) of the time
domain.
Here, n represents time, f = 1, 2,..., F represent discrete angular frequencies, k represents frame
time, and γ1 <γ2.
[0014]
The present invention has the effect of changing the magnitude of the echo suppression gain
according to the situation to reduce the voice distortion simultaneously while sufficiently
suppressing the echo.
[0015]
The block diagram for demonstrating the conventional echo canceller 10. FIG.
2A shows a signal in which the residual echo is superimposed on the transmission voice when
the transmission voice is a vowel, FIG. 2B shows the signal after the residual echo suppression
processing is performed on the signal in FIG. 2A, and FIG. FIG. 2D is a diagram showing a signal
obtained by performing residual echo suppression processing on the signal of FIG. 2C, in which
the residual echo is superimposed on the transmission voice in the case of a consonant. FIG. 2 is
a block diagram for explaining an echo cancellation apparatus 100 according to the first
embodiment. FIG. 7 is a diagram for explaining the process flow of the echo canceler 100 of the
first embodiment. FIG. 2 is a block diagram for explaining an adaptive filter unit 11 of the echo
canceler 100 according to the first embodiment. FIG. 2 is a block diagram for explaining a noise
15-04-2019
5
suppression unit 15 of the echo canceler 100 of the first embodiment. FIG. 7 is a block diagram
for explaining a first residual echo suppression unit 130, a vowel / consonant determination unit
140, a relaxation coefficient determination unit 150, and a second echo suppression unit 160 of
the echo canceler 100 according to the first embodiment. FIG. 7 is a diagram for explaining the
process flow of the first residual echo suppression unit 130 of the echo canceler 100 of the first
embodiment. FIG. 7 is a diagram for explaining the processing flow of the vowel and consonant
determination unit 140, the relaxation coefficient determination unit 150, and the second echo
suppression unit 160 of the echo canceler 100 of the first embodiment. FIG. 6 is a block diagram
for explaining a relaxation coefficient determination unit 150 of the echo canceler 100 according
to the first embodiment. FIG. 11A illustrates the second residual echo suppressor 160a for
calculating the equation D3 (f, k) = {1-β (k) (1-Gb ^ (f, k))} D2 (f, k). 11A is a second residue for
calculating the equation D3 (f, k) = (1-.beta. (K)) D2 (f, k) +. Beta. (K) D'3 (f, k). The block diagram
for demonstrating the echo suppression part 160b. FIG. 16 is a block diagram for explaining a
relaxation coefficient determination unit 250 of the echo canceler 200 of the second
embodiment. FIG. 16 is a diagram for explaining the processing flow of the relaxation coefficient
determination unit 250 of the echo canceler 200 of the second embodiment. FIG. 7 is a diagram
for explaining a relaxation coefficient determination unit 250 of the echo canceler 200 of the
second embodiment. FIG. 16 is a block diagram for explaining a relaxation coefficient
determination unit 350 of the echo canceler 300 of the third embodiment. FIG. 16 is a diagram
for explaining the processing flow of the relaxation coefficient determination unit 350 of the
echo canceler 300 of the third embodiment.
[0016]
Hereinafter, embodiments of the present invention will be described in detail.
[0017]
Echo Canceller 100 The echo canceler 100 performs echo suppression for each frequency of an
echo component caused by the reception signal x (n) reproduced by the speaker 2 from the
collected sound signal y (n) collected by the microphone 3 Suppress by multiplying by the gain.
[0018]
For example, as shown in FIG. 3, the echo canceler 100 includes the adaptive filter unit 11, the
frequency domain conversion units 13 and 17, the noise suppression unit 15, the time domain
conversion unit 19, the first residual echo suppression unit 130, and the vowel consonant
determination. The unit 140 includes a relaxation coefficient determination unit 150 and a
second residual echo suppression unit 160.
15-04-2019
6
The echo canceler 100 according to the first embodiment will be described with reference to
FIGS. 3 and 4.
In FIG. 3, the parts corresponding to those in FIG. The same applies to the following figures.
<Adaptive Filter Unit 11> The adaptive filter unit 11 uses the reception signal x (n) input from
the reception end 1 to eliminate the echo component by linear processing from the collected
sound signal y (n) input from the microphone 3 The residual echo signal d1 (n) is obtained (s11)
and output to the frequency domain conversion unit 13. For example, as shown in FIG. 5, the
adaptive filter unit 11 includes an echo prediction unit 11a, a subtraction unit 11b, and an echo
path estimation unit 11c.
[0019]
The echo prediction unit 11a receives the filter coefficient vector H '(n) and the reception signal x
(n), and convolves them as in the following equation to obtain a pseudo echo signal y' (n), which
is subtracted by the subtraction unit 11b. Send to
[0020]
y ′ (n) = H ′ <T> (n) X (n) where H ′ (n) = [h ′ (n, 0)... h ′ (n, L−1)] <T> X ( n) = [x (n)... x (nL + 1)] <T>, [] <T> is the transpose of a vector, L is the filter length, and h ′ (n, l) is each filter
coefficient Represents
[0021]
The subtractor 11b receives the collected sound signal y (n) and the pseudo echo signal y '(n),
subtracts the pseudo echo signal y' (n) from the collected sound signal y (n), and obtains the
residual echo signal d1 (n). (= Y (n) −y ′ (n)) is obtained and sent to the frequency domain
conversion unit 13 and the echo path estimation unit 11 c.
[0022]
The echo path estimation unit 11c receives the residual echo signal d1 (n) and the received
speech signal x (n), and based on this, the error between the collected sound signal y (n) and the
pseudo echo signal y '(n) decreases. Thus, the filter coefficient vector H '(n) of the echo prediction
unit 11a is updated and sent to the echo prediction unit 11a.
15-04-2019
7
For example, using the normalized least mean square (NLMS) algorithm, the filter coefficient h ′
(n + 1) is updated as in the following equation.
[0023]
H ′ (n + 1) = H ′ (n) + (μd1 (n) X (n)) / (X <T> (n) X (n)) where μ is set to stabilize the
estimation It is a step size.
<Frequency domain transforming units 13 and 17> The frequency domain transforming unit 13
receives, for example, the residual echo signal d1 (n), and from the current time n, d1 (n), d1 (n1), ..., d1 (n- L pieces of L + 1) are regarded as one frame, and converted into a signal D2 (f, k) in
the frequency domain for each frame (s13), and sent to the noise suppression unit 15.
When the adaptive filter unit 11 is not provided in the echo canceler 100, the frequency domain
conversion unit 13 may be configured to receive the collected sound signal y (n). L usually uses
the number of samples corresponding to 10 ms or 20 ms.
[0024]
The frequency domain conversion unit 17 receives the reception signal x (n), converts it into the
signal X (f, k) in the frequency domain for each frame (s17), and sends it to the first echo
suppression unit 130. As a transformation method, there are discrete Fourier transform (DFT),
short-time Fourier transform (STFT), and the like. <Noise Suppressor 15> The noise suppressor
15 receives the residual echo signal D1 (f, k) in the frequency domain, and suppresses the noise
component N (f, k) contained in the signal D1 (f, k), The noise removal signal D2 (f, k) is obtained
(s15), and is sent to the first residual echo suppression unit 130 and the second residual echo
suppression unit 160. The noise suppression unit 15 includes, for example, as shown in FIG. 6, a
noise level estimation unit 15a, a noise suppression gain calculation unit 15b, and a
multiplication unit 15c.
[0025]
15-04-2019
8
The noise level estimation unit 15a receives the signal D1 (f, k), and obtains the collective
average E [| N (f, k) | <2>] from the input signal D1 (f, k) in the section where there is no speech. .
However, N (f, k) is a noise component included in the residual echo signal D1 (f, k).
[0026]
The noise suppression gain calculation unit 15b receives the signal D1 (f, k) and the collective
average E [| N (f, k) | <2>], and the noise suppression gain Ga ^ (f, k) Ask for
[0027]
[0028]
The multiplication unit 15c multiplies the residual echo signal D1 (f, k) by the noise suppression
gain Ga ^ (f, k) to obtain a noise removal signal D2 (f, k).
At that time, residual echo signal D1 (f, k) (original sound) is added to noise removal signal D2 (f,
k) at an appropriate ratio 1-α as shown in the following equation to mask speech distortion and
to reduce noise The configuration may be configured to suppress the audible deterioration of the
removal signal D2 (f, k).
D2 (f, k) = (1-α) D1 (f, k) + α Ga ^ (f, k) D1 (f, k)
[0029]
<First Residual Echo Suppressor 130> The first residual echo suppressor 130 receives the noise
removal signal D2 (f, k) and the reception signal X (f, k), and uses them to generate an echo
suppression gain Gb ^ (f). , K) are multiplied by the signal D 2 (f, k) to obtain a first residual echo
suppression signal D ′ 3 (f, k) (s 130). The first residual echo suppression unit 130 sends the
first residual echo suppression signal D'3 (f, k) to the vowel sound / consonance determination
unit 140, and the echo suppression gain Gb ^ (f, k) is transmitted to the second residual echo
suppression unit 160. Send to
15-04-2019
9
[0030]
For example, as shown in FIG. 7, the first residual echo suppression unit 130 includes an echo
suppression gain calculation unit 131 and a multiplication unit 135. Furthermore, the echo
suppression gain calculation unit 131 includes an acoustic coupling amount estimation unit 132,
an echo level estimation unit 133, and a gain calculation unit 134. The processing of each part
will be described using FIGS. 7 and 8.
[0031]
The acoustic coupling amount estimation unit 132 receives the noise removal signal D2 (f, k) and
the reception signal X (f, k). The acoustic coupling amount estimation unit 132 calculates a set
average E [| D2 (f, k) | <2>], E [| X (f) of the noise removal signal D2 (f, k) and the reception signal
X (f, k). , K) | <2>], and update the minimum value of the ratio of E [| D2 (f, k) | <2>] and E [| X (f,
k) | <2>]. The frequency characteristic E [| H (f, k) | <2>] of the amount of acoustic coupling is
determined (s132) and sent to the echo level estimation unit 133.
[0032]
The echo level estimation unit 133 receives the frequency characteristic E [| H (f, k) | <2>] of the
amount of acoustic coupling and the reception signal X (f, k), and the pseudo residual echo Y ^
according to equation (3). A set average E [| Y ^ (f, k) | <2>] of (f, k) is obtained (s133), and is
sent to the gain calculation unit 134.
[0033]
E [| Y ^ (f, k) | <2>] = E [| H (f, k) | <2>] | X (f, k) | <2> + βE [| Y ^ (f, k-1) | <2>] (3) The gain
calculation unit 134 receives the pseudo residual echo Y ^ (f, k) and the noise removal signal D2
(f, k), and the echo suppression gain is obtained according to equation (1). Gb ^ (f, k) is obtained
(s131, s134), and is sent to the multiplication unit 135 and the second residual echo suppression
unit 135.
[0034]
G (f, k) = (| D2 (f, k) | <2>-| Y ^ (f, k) | <2>) / | D2 (f, k) | <2> (1) Multiplier 135 obtains the first
15-04-2019
10
residual echo suppression signal D′ 3 (f, k) by multiplying the noise removal signal D2 (f, k) by
the echo suppression gain Gb ^ (f, k) according to equation (2) s135), sent to the vowelconsonant determination unit 140.
D’3(f,k)=G(f,k)D2(f,k) (2)
[0035]
<Vowel and Consonant Determination Unit 140> The vowel and consonant determination unit
140 receives the first residual echo suppression signal D′ 3 (f, k), and using this, the signal D2
(f, k) to be suppressed is a vowel It is determined whether it is a consonant (s140).
For example, as shown in FIG. 7, the vowel and consonant determination unit 140 includes a
determination evaluation value calculation unit 141 and a determination unit 143. The
processing of each part will be described with reference to FIGS. 7 and 9.
[0036]
The evaluation value calculator for determination 141 receives the first residual echo
suppression signal D'3 (f, k), and the sparsity of the spectrum of the first residual echo
suppression signal D'3 (f, k) The value S (D'3 (k)) which shows is calculated (s141), it sends to
decision section 143.
[0037]
[0038]
Here, D'3 (k) is a vector representation of D'3 (f, k), and D'3 (k) = {D'3 (0, k), D'3 (1, k), ... , D ′ 3
(F, k)}, where f h is the highest frequency considered and fl is the lowest frequency considered.
For example, 300 Hz to 3 kHz used in voice communication and 20 Hz to 20 kHz in an audible
range are set as the lowest frequency and the highest frequency.
15-04-2019
11
In this equation (5),
[0039]
[0040]
The value of | D'3 (f, k) | at fl ≦ f ≦ fh is 1 (the one frequency component has a value and the
other frequency component is 0) when the value is sparse. When the least sparse (when all
frequency components have the same value), √ (f h −fl + 1) is taken.
Therefore, if 0 ≦ S (D′ 3 (k)) ≦ 1, and if D′ 3 (f, k) is the vowel spectrum, S (D′ 3 (k))
becomes a value close to 1 (FIG. 2B). In the case of consonants, S (D′ 3 (k)) takes a value close
to 0 (see FIG. 2D).
[0041]
Therefore, determination unit 143 receives value S (D'3 (k)) indicating sparsity, determines
whether S (D'3 (k)) is equal to or greater than a predetermined threshold T, In the case of, it is
determined as the vowel, and when it is less than the threshold T, it is determined as the
consonant (s143). The determination unit 143 sends the determination result j (k) to the
relaxation coefficient determination unit 150. The threshold T is 0 ≦ T ≦ 1, and is determined
so that vowels and consonants can be determined in advance by experiments or the like (for
example, T = 0.5). Further, in the determination result j (k), for example, 0 may be set as
information indicating that it is a consonant and 1 may be set as information indicating that it is
a vowel.
[0042]
Note that the reason why the first residual echo suppression signal D'3 (f, k) is used for the vowel
and consonant determination is that if the echo component derived from the reception signal
remains in the signal used for the determination, the property of the signal to be suppressed To
make a false decision. Therefore, a signal from which an echo component has been removed can
15-04-2019
12
be used for vowel and consonant determination. The signal from which the echo component has
been removed is, for example, a signal obtained by eliminating the echo component by linear
processing in the adaptive filter unit 11, or by performing nonlinear echo suppression by the first
residual echo suppression unit 130, or at least one processing. If it is Therefore, as shown by a
long broken line in FIG. 7, the noise removal signal D2 (f, k) may be sent to the vowel / consonant
determination unit. However, since the residual echo component is included, the accuracy of the
determination is lowered.
[0043]
<Relaxation coefficient determination unit 150> The relaxation coefficient determination unit
150 sets 1 as the relaxation coefficient β (k) when it is determined that the signal to be
suppressed is a vowel, and in the other cases, γ is the relaxation coefficient β It is assumed that
(k) (s150). However, γ is set to 0 ≦ γ <1, and an appropriate value is obtained in advance by
experiments and the like, and is determined in advance. For example, as shown in FIG. 10, the
relaxation coefficient determination unit 150 includes storage units 151 and 153 and a
switching unit 155. The processing of each part will be described using FIGS. 9 and 10. The
relaxation coefficient determination unit 150 receives the determination result j (k). In the case
of information indicating that j (k) is a vowel, the switching unit 155 is connected to the storage
unit 151. The relaxation coefficient determination unit 150 extracts 1 from the storage unit 151,
determines the relaxation coefficient β (k) as β (k) = 1, and outputs it (s150, s151). If j (k) is
information indicating that it is a consonant, the switching unit 155 is connected to the storage
unit 153. The relaxation coefficient determination unit 150 extracts γ from the storage unit
153, determines the relaxation coefficient β (k) as β (k) = γ, and outputs it (s150, s153).
[0044]
The processes of the determination unit 143 of the vowel and consonant determination unit 140
and the processing of the relaxation coefficient determination unit 150 can be expressed by the
following equation.
[0045]
[0046]
<Second Residual Echo Suppressor 160> The second residual echo suppressor 160 receives, for
example, D2 (f, k), Gb ^ (f, k), and β (k). The second residual echo suppression signal D3 (f, k) is
determined by the following equation (s160), and is sent to the time domain conversion unit 19.
15-04-2019
13
[0047]
D3 (f, k) = {1-.beta. (K) (1-Gb ^ (f, k))} D2 (f, k) (7) A configuration example of the second residual
echo suppression unit 160 at this time is illustrated. It is shown in 11A.
The process will be briefly described below.
The subtraction unit 162a subtracts the echo suppression gain Gb ^ (f, k) received from the value
1 extracted from the storage unit 161a to obtain (1-Gb ^ (f, k)).
The multiplication unit 163a multiplies this value by the relaxation coefficient β (k) to obtain β
(k) (1-Gb ^ (f, k). The subtraction unit 165a subtracts β (k) (1-Gb ^ (f, k) from the value 1
extracted from the storage unit 164a to calculate {1-β (k) (1-Gb ^ (f, k))}. Ask. The multiplication
unit 166a multiplies this value by the noise removal signal D2 (f, k) to obtain and output a
second residual echo suppression signal D3 (f, k). With such a configuration, when the
transmission voice is determined to be a consonant, it is possible to weaken the echo suppression
gain and alleviate the loss of the frequency component of the consonant transmission voice.
[0048]
<Time domain conversion unit 19> The time domain conversion unit 19 receives the second
residual echo suppression signal D3 (f, k) and converts it into a signal d3 (n) in the time domain
(s19). Send to The conversion method may be inverse Fourier transform or the like
corresponding to the conversion method of the frequency domain conversion units 13 and 17.
[Program and Recording Medium] The above-described echo canceler can also be functioned by a
computer. In this case, a program for causing a computer to function as a target device (a device
having the functional configuration shown in various embodiments) or a process of each
processing procedure (as shown in each embodiment) The program to be executed may be
downloaded from a recording medium such as a CD-ROM, a magnetic disk, a semiconductor
storage device or the like into the computer via a communication line, and the program may be
executed. <Effect> With such a configuration, the relaxation coefficient (original sound addition
rate) can be changed according to the situation, and the effect of simultaneously reducing the
audio distortion while sufficiently suppressing the echo is achieved. Therefore, the voice can be
15-04-2019
14
more easily heard compared to the prior art.
[0049]
Since it is determined whether the signal to be suppressed is a consonant or a vowel, and the
relaxation coefficient (original sound addition rate) is changed according to the determination
result, when the signal to be suppressed is a consonant, the echo suppression gain is relaxed
small, The distortion of the voice is reduced to prevent the occurrence of false listening. When
the signal to be suppressed is a vowel, the echo suppression gain can be increased to obtain
sufficient echo cancellation performance.
[0050]
That is, in the present embodiment, it is possible to set an appropriate echo suppression gain at
each time according to the nature of the voice, and it is possible to balance the echo cancellation
amount and the ease of hearing the voice in a balanced manner. As a result, it becomes easier to
hear voices in hands-free calls and the like.
[0051]
Note that this relaxation of the echo suppression gain is effective for nonlinear suppression
processing, and even if it is introduced to the side of the adaptive filter unit 11, there is originally
no voice distortion and the echo cancellation amount is reduced, and the reverse is reversed. It is
an effect. Although noise can be introduced toward noise suppression, since noise often has a
broad spectrum close to the consonant of speech, it is determined that the noise is consonant and
the noise suppression performance is degraded, which is an advantage of the present invention.
Can not get.
[0052]
[Modification] If the input signal and the collected signal input to echo canceler 100 are analog
signals, echo canceler 100 has an A / D conversion unit (not shown) that converts an analog
signal to a digital signal. May be Moreover, when outputting an analog signal to the transmitting
15-04-2019
15
end 4, the echo canceler 100 may have a D / A conversion unit (not shown) that converts a
digital signal into an analog signal.
[0053]
In the adaptive filter unit 11, the echo component may be eliminated using the reception signal X
(f, k) and the collected signal Y (f, k) in the frequency domain. In that case, the frequency domain
conversion unit 13 is provided in the front stage of the adaptive filter unit 11. The adaptive filter
unit 11 receives the output signals X (f, k) and Y (f, k) of the frequency domain conversion units
13 and 17.
[0054]
The second residual echo suppression unit 160 receives D′ 3 (f, k) instead of Gb ^ (f, k) as
indicated by a long broken line in FIG. The echo suppression signal D3 (f, k) may be determined.
[0055]
D3 (f, k) = (1−β (k)) D2 (f, k) + β (k) D′ 3 (f, k) (8) From equation (2), D′ 3 (f) , K) = Gb ^ (f, k)
D2 (f, k).
The configuration of the second residual echo suppression unit 160 in this case is shown in FIG.
11B. The multiplication unit 162b subtracts the received relaxation coefficient β (k) from the
value 1 extracted from the storage unit 161b to obtain (1-β (k)). The multiplication unit 163
multiplies the received noise removal signal D2 (f, k) by this value (1-β (k)) to obtain (1-β (k))
D2 (f, k). The multiplication unit 164 b multiplies the received D ′ 3 (f, k) by the relaxation
coefficient β (k) to obtain β (k) D ′ 3 (f, k). The addition unit 165b adds (1−β (k)) D2 (f, k)
and β (k) D′ 3 (f, k) to obtain a second residual echo suppression signal D3 (f, k). ,Output.
[0056]
The configuration of the second residual echo suppression unit 160 is not limited to the
configurations of FIGS. 11A and 11B, and the noise removal signal D2 (f, k), the echo suppression
gain Gb ^ (f, k), and the relaxation coefficient A process in which the product of the noise removal
signal D 2 (f, k) and the relaxation coefficient β (k) is subtracted from the product of β (k) and
the result of subtraction is added to D 2 (f, k) And the second residual echo suppression signal
15-04-2019
16
D3 (f, k) can be obtained.
[0057]
The point of the present invention is that the vowel / consonant determination unit 140
determines whether the signal to be suppressed is a vowel or a consonant, and changes the
relaxation coefficient β (k) using the determination result.
Therefore, as indicated by a broken line in FIG. 4, the linear echo cancellation processing (s11) in
the adaptive filter unit 11 and the noise suppression processing (s15) in the noise suppression
unit 15 may not necessarily be performed, and corresponding units are not provided It is good.
In addition, when sending a signal other than the first residual echo suppression signal D′ 3 (f,
k) to the vowel and consonant determination unit 140, of the first residual echo suppression
processing (s130) in the first residual echo suppression unit 130. At least the echo suppression
gain calculation unit 131 may obtain the echo suppression gain (s131). As indicated by the
broken line in FIG. 8, the multiplication processing (s135) in the multiplication unit 135 may not
be performed, and the multiplication unit 135 is provided. It is not necessary. Note that the
processes in the adaptive filter unit 11, the noise suppression unit 15, the first residual echo
suppression unit 130, and the vowel / consonant determination unit 140 are examples, and other
conventional techniques may be used.
[0058]
For example, the evaluation value calculation unit 141 for determination of the vowel and
consonant determination unit 140 may obtain the sparsity of the spectrum of the first residual
echo suppression signal D′ 3 (f, k) by the method described in reference 1. [Reference 1] Akiko
Araki, Nakatani Tomohiro, Sawada Hiroshi, "Source number estimation and source separation
based on voice sparsity using Dirichlet a priori distribution", The 2009 JSAP Conference on
Acoustics, 2009 References At 1, if the value of φ is smaller than 1, the Dirichlet distribution
becomes a larger value as the vector α is sparse.
[0059]
In addition, the vowel and consonant determination unit 140 does not use the value indicating
15-04-2019
17
the sparsity of the spectrum, and the signal D2 (f, k) to be suppressed is a vowel or a consonant,
for example, by the method described in reference 2 or 3. It may be determined. [Reference 2]
Hideyuki Sawada, Atsushi Ohkado, "Sensing specific speakers for voice interface construction
under noisy environment", Transactions of the Institute of Electrical Engineers of Japan, 2006,
Vol. 126, No. 11, pp. 1446- 1453 [Reference 3] Niyada Katsuyuki, Hoshimi Masakatsu,
"Consonant recognition method for unspecified speakers using time series of band power and
LPC cepstrum coefficients", Transactions of the Institute of Electronics, Information and
Communication Engineers D, 1986, Vol. J69-D, No. 6, pp. 949-957 In this case, in reference
document 2, vowel and consonant are judged by the time average magnitude of the absolute
value of the waveform, and in reference document 3, the power is checked to see the power
Extract the dip (consonant part).
[0060]
When the adaptive filter unit 11 or the like is not provided, the signal received by the frequency
domain conversion unit 13 is a signal obtained based on the sound collection signal y (n) other
than the residual echo signal d1 (n) (for example, sound collection It may be the signal y (n) itself,
etc.).
[0061]
Also, the signals received by the first residual echo suppression unit 130 and the second residual
echo suppression unit 160 are the signals Y (f, k) and D1 (f, k) in the frequency domain other
than the noise removal signal D2 (f, k). And may be changed appropriately according to the
configuration of the echo canceler.
[0062]
The relaxation coefficient determination unit 150 sets β (k) = 1 or γ, but the invention is not
limited thereto. Β (k) = γ1 (= αγ) or γ2 (= α) (where 0 <α) The relaxation coefficient may be
multiplied by a constant α as <1).
The values of α and αγ are determined in advance as relaxation coefficients appropriate for
vowels by experiments etc. as relaxation coefficients appropriate for consonants (for example, α
= 0.5, γ = 0.5, γ1 = 0.25, γ2 = 0.5 etc.).
[0063]
15-04-2019
18
In addition, γ1 and γ2 and the relaxation coefficient β (k) may be configured to have different
values for each frequency.
At this time, γ1 = {γ1 (0), γ1 (1), ..., γ1 (F)}, γ2 = {γ2 (0), γ2 (1), ..., γ2 (F)}, β (k) = {Β (0,
k), β (1, k),..., Β (F, k)}, γ1 (f) ≦ γ2 (f), and at least a part of discrete angular frequencies f ′,
It is sufficient if γ1 (f ′) <γ2 (f ′). With such a configuration, an appropriate relaxation
coefficient can be set for each frequency. For example, since the consonant part increases as the
frequency increases, a configuration in which the relaxation coefficient is set to be small can be
considered.
[0064]
Echo Canceller 200 The echo canceller 200 according to the second embodiment will be
described only with reference to FIGS. The configuration and processing contents of the
relaxation coefficient determination unit 250 are different from those of the first embodiment.
[0065]
In addition to the determination result j (k), the vowel and consonant determination unit 140 has
a value S (D′ 3 (k) indicating the sparsity determined by the determination evaluation value
calculation unit 141 as indicated by the one-dot chain line in FIG. ) Is also output to the
relaxation coefficient determination unit 250. <Relaxation Factor Determination Unit 250> The
relaxation factor determination unit 250 receives the determination result j (k) and a value S (D'3
(k)) indicating sparsity. In the case of the information indicating that j (k) is a vowel, the
switching unit 258 is connected to the storage unit 251. The relaxation coefficient determination
unit 250 takes 1 out of the storage unit 251, determines the relaxation coefficient β (k) as β (k)
= 1, and outputs it (s250, s251).
[0066]
In the case of information indicating that j (k) is a consonant, the switching unit 258 is connected
to the adding unit 257. The relaxation coefficient determination unit 250 receives γ1 (k) =
15-04-2019
19
1−κ (T−S (D ′ (k)) from the addition unit 257, and determines the relaxation coefficient β (k)
as β (k) = γ1. , Output (s250, s257). Note that 0 ≦ κ ≦ 1 / T. FIG. 13 shows the relationship
between S (D '(k)) and β (k).
[0067]
The subtracting unit 254 subtracts S (D ′ (k)) received from the threshold T extracted from the
storage unit 254 to obtain (T−S (D ′ (k))). The multiplication unit 256 multiplies (T−S (D ′
(k))) by the value κ retrieved from the storage unit 255 to obtain κ (T−S (D ′ (k)). The adding
unit 257 subtracts κ (T−S (D ′ (k)) from the value 1 extracted from the storage unit 251 to
obtain γ1 (k) and stores it.
[0068]
The processes of the determination unit 143 and the relaxation coefficient determination unit
250 of the vowel and consonant determination unit 140 can be expressed by the following
equation.
[0069]
[0070]
<Effects> With this configuration, the same effects as in the first embodiment can be obtained.
Furthermore, even within the range of S (D '(k)) <T, a signal with very low sparsity has a small
suppression, while a signal with some sparsity has a large suppression. Setting becomes possible.
[0071]
[Modification] In the second embodiment, the relaxation coefficient β (k) is obtained in different
cases according to the relationship between the threshold T and S (D ′ (k)). ) May be a
monotonically increasing value as S (D ′ (k)) increases.
15-04-2019
20
[0072]
As described above, since 0 ≦ S (D′ 3 (k)) ≦ 1, such a configuration can be realized by setting
the threshold T = 1.
Furthermore, the processing of the vowel-consonant determination unit can be omitted and
simplified.
That is, in FIG. 7, the vowel and consonant determination unit 140 includes only the
determination evaluation value calculation unit 141, and outputs only S (D '(k)). In FIG. 12, the
storage unit 251 and the switching unit 258 are not provided, and the relaxation coefficient
determination unit 250 calculates and outputs β (k) = 1−κ (T−S (D ′ (k)) for each frame. .
Even in such a configuration, the magnitude of the echo suppression gain can be changed
according to the situation, and a signal with very low sparsity has a smaller suppression and a
signal with some sparsity has suppression. It is possible to make flexible settings such as setting
a large size.
[0073]
In addition, it is good also as a structure which takes the value which changes with every
frequency. At this time, κ = {κ (0), κ (1),..., ((F)}, and at least a part of discrete angular
frequency f ′, 1−κ (f ′) (T−S (D) It is sufficient if '(k) <γ2 (f'). With such a configuration, β
(k) can be set to a different value for each frequency to enable more detailed setting of the
relaxation coefficient.
[0074]
Echo Canceller 300 The echo canceller 300 according to the third embodiment will be described
with reference to FIGS. 3, 4, 7, 15, and 16. Only differences from the first embodiment will be
described. The configuration and processing contents of the relaxation coefficient determination
unit 350 are different from those of the first embodiment. <Relaxation Factor Determination Unit
350> The relaxation factor determination unit 350 receives the determination result j (k), the
reception signal X (k), and the first residual echo suppression signal D′ 3 (k). In the case of the
information indicating that j (k) is a vowel, the switching unit 356 is connected to the storage
15-04-2019
21
unit 354.
[0075]
The transmission voice detection unit 351 and the determination unit 352 each receive the
determination result j (k), and perform the following processing in the case where j (k) is
information indicating that it is a consonant.
[0076]
First, the transmission voice detection unit 351 obtains || D'3 (k) || / || X (k) ||.
Note that || • || represents taking a norm, and X (k) = {X (0, k), X (1, k),..., X (F, k)}.
[0077]
Judgment unit 352 receives this value || D'3 (k) | / ||| X (k) || and judges whether or not it is
smaller than threshold Tr, and switches judgment result j2 (k) to switching unit 356. Output to In
the case of information indicating that j2 (k) is smaller than the threshold value Tr, the switching
unit 356 is connected to the storage unit 354 regardless of the value of the determination result j
(k). The relaxation coefficient determination unit 350 takes 1 out of the storage unit 354, and
determines and outputs the relaxation coefficient β (k) as β (k) = 1 (s350, s354). However, Tr is
a predetermined positive real number, and is a value for adjusting the relaxation coefficient to be
1 when the consonant portion of the transmission voice becomes sufficiently smaller than the
reception signal, and it has been experimentally An appropriate value is obtained in advance and
determined in advance. Tr is a value larger than 0, for example, Tr = 0.01.
[0078]
Switching unit 356 receives information representing that determination result j (k) is a
consonant except for the above case, and receives information representing that determination
result j2 (k) is greater than threshold value Tr. Connection to the storage unit 355). The
relaxation coefficient determination unit 350 takes out γ1 (0 ≦ γ1 <1) from the storage unit
355, determines the relaxation coefficient β (k) as β (k) = γ1, and outputs it (s350, s355).
15-04-2019
22
[0079]
The processing of the determination unit 143 of the vowel / consonant determination unit 140
and the processing of the relaxation coefficient determination unit 350 can be expressed by the
following equation.
[0080]
[0081]
<Effects> With this configuration, the same effects as those of the first embodiment can be
obtained.
Furthermore, when there is no transmission voice or the transmission voice is very small, the
relaxation coefficient is set to 1, even if the first echo suppression signal D'3 (f, k) has sparsity,
Sufficient echo cancellation is possible without reducing the suppression gain.
As described above, by alleviating the gain using both the determination of the sparsity and the
determination of the call state, it is possible to sufficiently suppress the echo in a section or the
like where there is no need to reduce the transmission speech.
[0082]
The echo cancellation method of the present invention can be used for hands-free speech, handsfree speech recognition and the like.
[0083]
100, 200, 300 echo cancellation apparatus 11 adaptive filter section 13, 17 frequency domain
conversion section 15 noise suppression section 19 time domain conversion section 130 first
residual echo suppression section 140 vowel sound / consonance determination section 150,
250, 350 relaxation coefficient determination section 160 Second residual echo suppressor
15-04-2019
23
Документ
Категория
Без категории
Просмотров
0
Размер файла
38 Кб
Теги
description, jp2011254420
1/--страниц
Пожаловаться на содержимое документа