close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2009302599

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009302599
An acoustic echo canceller is provided that can effectively remove echoes and noises that can not
be eliminated only by an echo canceller using an adaptive filter. An adaptive filter generates a
pseudo echo sound signal FE't based on a sound emission sound signal FEt. The adder 60
subtracts the pseudo echo sound signal FE't from the collected sound signal NEt to generate a
first corrected sound signal NE't. The echo spectrum estimation unit 301 uses the spectrum S
(FE'n) of the current pseudo echo sound signal and the frequency spectrum S (FE "n) of the
previous echo echo to the frequency spectrum S (FE" n) of the current echo echo. Are estimated
and calculated, and corrected with a correction coefficient F that gives a frequency characteristic
according to the sound emission and collection state. The adder 70 subtracts the frequency
spectrum S (FEC n) of the corrected reverberation echo and the frequency spectrum S (NE ′ ′
n) of stationary noise from the spectrum S (NE′n) of the first corrected speech signal. [Selected
figure] Figure 1
Acoustic echo canceller
[0001]
The present invention relates to an acoustic echo canceler that removes acoustic echoes
including reverberant echoes and stationary noises based on installation environment and the
like from a collected speech signal.
[0002]
Conventionally, in an audio conference apparatus or the like in which a speaker and a
15-04-2019
1
microphone are installed in one case, acoustic echo is likely to occur because the speaker and the
microphone are close to each other.
For this reason, various types of echo cancellation devices have been devised to remove such
acoustic echoes. For example, Patent Document 1 discloses an echo cancellation apparatus
including an echo canceller having an adaptive filter and an echo suppression unit that
suppresses an echo by calculation in a frequency domain. Patent No. 3420705
[0003]
By the way, since the sound level of the original low-pitched sound is higher than that of the
high-pitched sound and the sound level of the looping sound is also high, the echo suppressor as
in Patent Document 1 estimates based on the looped speech. In the echo spectrum, the bass level
is higher.
[0004]
On the other hand, the human voice or vocal sound is mainly composed of the low frequency
range.
For this reason, in the so-called W-talk state in which the voice of the speaker of the own device
is picked up at the same time as the voice emission by the voice emitting voice signal from the
other party device is performed, the voice voice of the speaker is low. Even the sound range is
suppressed by the echo suppressor. As a result, the voice of the speaker on the own device side
emitted by the other party's device becomes a voice that turns over.
[0005]
Therefore, an object of the present invention is to realize an acoustic echo canceler that does not
unnecessarily suppress a bass region while effectively removing echoes and noise that can not be
eliminated only by an echo canceller using an adaptive filter. It is in.
[0006]
The present invention relates to an acoustic echo canceler that removes sounds other than the
15-04-2019
2
target voice included in the collected voice signal.
This acoustic echo canceller comprises an adaptive filter, a first difference means, a reverberation
echo spectrum estimation means, and a second difference means. The adaptive filter generates a
pseudo echo sound signal based on the sound emission sound signal. The first difference means
subtracts the pseudo echo sound signal from the collected sound signal to generate a first
corrected sound signal. The reverberation echo spectrum estimation means estimates a
reverberation echo spectrum included in the first corrected speech signal using the spectrum of
the pseudo echo sound signal, and performs different weighting by weighting each frequency
component of the reverberation echo spectrum differently. Set the echo spectrum. The second
difference means subtracts the corrected reverberation echo spectrum from the frequency
spectrum of the first corrected audio signal and outputs it.
[0007]
In this configuration, a pseudo echo sound signal is generated by the adaptive filter, and the
frequency spectrum of reverberation echo that can not be handled by the adaptive filter is
estimated. The acoustic echo canceller according to the present invention performs first-stage
echo cancellation of a linear component by first differentiating the pseudo echo sound signal
with respect to the collected sound signal by the sound collection means, and further, The second
stage echo cancellation is performed by subtracting the frequency spectrum of the reverberation
echo from the frequency spectrum of the first corrected speech signal after the echo cancellation
has been performed. At this time, the frequency spectrum of the reverberation echo is weighted
differently for each frequency component. Thereby, different echo removal levels are set for each
frequency component.
[0008]
Further, the acoustic echo canceller according to the present invention further performs the state
determination of the collected sound based on the sound emission sound signal and the first
correction sound signal, and gives the state determination result to the reverberation echo
spectrum estimation means. A determination means is provided. Then, the reverberation echo
spectrum estimation means of the acoustic echo canceller changes the weighting in accordance
with the state determination result.
15-04-2019
3
[0009]
In this configuration, appropriate weighting is set in accordance with the situation of sound
emission and sound collection.
[0010]
Further, the state determination means of the acoustic echo canceller according to the present
invention determines the presence state of only the sound emission and the presence state of the
sound emission state.
The reverberation echo spectrum estimation means estimates that the estimated level of the low
frequency band in the low frequency region is lower in the emitted sound presence state than in
the sound emission only state.
[0011]
In this configuration, when the voice of the speaker is included in the collected voice signal as in
the case of W talk, the level of the bass component of the reverberation echo spectrum
corresponding to the range of the main component of the voice is suppressed low. Therefore, the
low frequency component of the speaker's voice is not deducted to the surplus.
[0012]
Further, the acoustic echo canceller according to the present invention further comprises: band
dividing means for separating the collected voice signal into a bass range component and a high
range component and outputting the collected voice signal bass range component to the first
difference means; And an attenuator for attenuating the high-tone component of the collected
voice signal output from the band dividing means in accordance with the result of the state
determination.
[0013]
In this configuration, the low frequency component and the high frequency component of the
collected voice signal are separated, and the processing described above is performed only with
the low frequency component, thereby reducing the calculation load of the echo cancellation
processing and the echo removal processing, and speeding up Is also possible.
15-04-2019
4
At this time, the level of the high range component of the speech signal uttered by human beings
is smaller than the level of the low range component.
Furthermore, the level of the high range component that is originally echoed to the microphone
and collected is smaller than the level of the low range component. Therefore, even if the treble
range component is merely subjected to attenuation processing by a simple attenuator, the
influence on the sound quality is small. In addition, by changing the process according to the
state determination result, it is possible to perform appropriate echo removal according to the
sound emission and collection state also in the high sound range. At this time, effective echo
cancellation and echo cancellation as described above are efficiently performed while
maintaining a predetermined sound quality.
[0014]
Further, the reverberation echo spectrum estimation means of the acoustic echo canceller
according to the present invention estimates the reverberation echo spectrum by performing a
correction operation according to the acoustic environment parameter based on the installation
environment on the spectrum of the pseudo echo sound signal.
[0015]
In this configuration, the reverberation echo spectrum is corrected according to the acoustic
environment, as well as the optimization of the reverberation echo spectrum by the correction to
the above-mentioned bass range component.
Thereby, a more appropriate reverberant echo spectrum is estimated according to the difference
in acoustic environment caused by the size of the installed room or the like.
[0016]
The acoustic echo canceller according to the present invention further includes noise spectrum
estimation means for estimating a stationary noise spectrum based on the frequency spectrum of
the first corrected speech signal.
15-04-2019
5
[0017]
In this configuration, a stationary noise spectrum such as background noise existing with the
above-described reverberation echo spectrum is estimated.
Thereby, a more effective echo removal process is performed.
[0018]
According to the present invention, a reverberation echo that can not be eliminated only by an
echo canceller using an adaptive filter can be removed with high accuracy, and excessive removal
of specific frequency components in the removal of the reverberation echo can be prevented.
Can. Thereby, when performing the first correction by the adaptive filter and the second
correction by reverberation echo removal, the state closer to the original sound without deleting
too much the specific frequency component of the voice of the speaker on the own device side
Can be output with
[0019]
An acoustic echo canceller according to a first embodiment of the present invention will be
described with reference to the drawings. In the following description, the signal in the time
domain indicates the end symbol by t, and the signal in the frequency domain indicates the end
symbol by n. FIG. 1 is a block diagram showing a schematic configuration of main elements of
the acoustic echo canceller of the present embodiment. As shown in FIG. 1, the acoustic echo
canceller 1 includes a speaker SP, a microphone MIC, a state determination unit 10, an adaptive
filter 20, a disturbance spectrum estimation unit 30, an adder 60 corresponding to a first
difference unit of the present invention, The adder 70 corresponds to the second difference
means of the present invention.
[0020]
The state determination unit 10 “a state with both sound emission and sound collection (W talk
state)”, based on the signal levels of the sound emission sound signal FEt, the collected sound
15-04-2019
6
signal NEt, and the first corrected sound signal NE′t, It detects that there is either "no sound
emission only state," "no sound emission, there is no collected sound signal state," or "no sound,
no sound collection state (silence state)." Then, the detection state is given to the adaptive filter
20 and the disturbance spectrum estimation unit 30. FIG. 2 is a diagram showing the state
determination of the state determination unit 10 shown in FIG. 1, the learning process, and the
determination concept of the frequency correction pattern.
[0021]
Specifically, when the state determination unit 10 detects that the sound emission sound signal
FEt, the collected sound signal NEt, and the first correction sound signal NE't are all at a level
equal to or higher than a preset threshold value, the state determination unit 10 releases the
sound. It is determined that both the sound and the speaker's speech are being performed, and
the "W talk" state is determined. In addition, the state determination unit 10 determines that
“only the sound emission of the sound emission signal is generated, if the sound emission signal
FEt is at the level equal to or higher than the threshold and the first corrected sound signal NE′t
is less than the threshold. It is determined that there is a state. In addition, if the collected voice
signal NEt and the first corrected voice signal NE't are at a level equal to or higher than the
threshold and the voice emission voice signal FEt is less than the threshold, the state
determination unit 10 It is determined that "a collected sound signal is present". Furthermore,
when it is detected that all of the sound emission sound signal FEt, the collected sound signal
NEt, and the first correction sound signal NE't are less than the threshold value, the state
determination unit 10 determines that the state is "silence".
[0022]
The speaker SP emits sound based on the sound emission audio signal FEt input from the
outside. This sound emission voice signal (far end signal) FEt is also input to the FFT 911.
[0023]
The FFT 911 is a high-speed Fourier transform circuit, which converts the noise emitting voice
signal FEt which is a function of time domain into the noise emitting voice signal FEn which is a
function of frequency domain, and gives it to the adaptive filter 20.
[0024]
15-04-2019
7
The adaptive filter 20 includes a pseudo echo sound signal generation unit 201 and a pseudo
echo sound signal estimation unit 202.
The pseudo echo sound signal generation unit 201 is, for example, an FIR filter including a
predetermined number of taps, and is set by the coefficient given from the pseudo echo sound
signal estimation unit 202. The pseudo echo sound signal generation unit 201 generates a
pseudo echo sound signal FE'n based on the sound emission audio signal FEn. The generated
pseudo echo sound signal FE ′ n is input to the IFFT 921 and the echo spectrum estimation unit
301 of the disturbance spectrum estimation unit 30.
[0025]
The pseudo echo sound signal estimation unit 202 estimates the pseudo echo sound signal FE ′
n from the frequency spectrum S (NE ′ n) of the first correction signal NE ′ n after echo
cancellation described later using an adaptive algorithm such as LMS. presume. The pseudo echo
sound signal estimation unit 202 estimates coefficients for causing the pseudo echo sound signal
generation unit 201 to generate the pseudo echo sound signal FE′n, and supplies these
coefficients to the pseudo echo sound signal generation unit 201. At this time, the pseudo echo
sound signal estimation unit 202 performs the learning based on the above estimation only when
the information on the “only the emission of the emission sound signal is present” state is
acquired from the state determination unit 10. Note that such estimation, generation of the
pseudo echo sound signal FE ′ n and learning are repeatedly performed during the operation of
the acoustic echo canceller 1.
[0026]
The IFFT 921 is an inverse fast Fourier transform circuit, converts the pseudo echo sound signal
FE'n which is a function of the frequency domain into the pseudo echo sound signal FE't which is
a function of the time domain, and outputs the pseudo echo sound signal FE't to the adder 60.
[0027]
The microphone MIC picks up from the surroundings where the acoustic echo canceller 1 is
installed, and generates a picked up voice signal (near end signal) NEt.
15-04-2019
8
If there is sound emitted from the speaker SP, the collected sound signal NEt includes a
component (reverberation echo) of a reverberation sound in which the emitted sound is echoed
based on the installation environment. If the speaker around the microphone MIC speaks, the
collected voice signal NEt contains the component of the speaker voice. In addition, in the Wtalking state in which both the sound emitted from the speaker SP and the sound generated by
the speaker of the apparatus itself are in the W-talk state, both the speaker voice component and
the reverberation echo are included. Furthermore, if there is a steady noise specific to the
installation environment such as a conference room, the collected speech signal NEt also includes
a component of the steady noise.
[0028]
The adder 60 subtracts the pseudo echo sound signal FE't from the collected sound signal NEt to
generate and output a first corrected sound signal NE't. Thereby, as the first stage correction,
adaptive echo cancellation processing by the pseudo echo sound signal is executed.
[0029]
The FFT 912 is a fast Fourier transform circuit, and transforms and outputs a first corrected
speech signal NE't that is a function of time domain into a first corrected speech signal NE'n that
is a function of frequency domain. The frequency spectrum S (NE'n) of the first corrected speech
signal NE'n is input to the above-described pseudo echo sound signal estimation unit 202 and the
noise spectrum estimation unit 302 of the disturbance spectrum estimation unit 30.
[0030]
The disturbance spectrum estimation unit 30 includes an echo spectrum estimation unit 301 and
a noise spectrum estimation unit 302. Briefly, the echo spectrum estimation unit 301 is an
operation unit that estimates reverberation echo that can not be removed only by the pseudo
echo sound signal FE ′ n, and the noise spectrum estimation unit 302 is an operation unit that
estimates stationary noise. .
15-04-2019
9
[0031]
The echo spectrum estimation unit 301 sequentially acquires and temporarily stores the
frequency spectrum S (FE'n) of the pseudo echo sound signal FE'n at each sampling timing. The
echo spectrum estimation unit 301 is configured to update the frequency spectrum S (FE'n) of
the pseudo echo sound signal FE'n that has been acquired and temporarily stored, the
reverberation echo spectrum S (FE "n) estimated last time, and the preset update. Based on the
coefficient β, the current reverberation echo spectrum S (FE ′ ′ n) is estimated, and the
estimated reverberation echo spectrum S (FE ′ ′ n) is stored.
[0032]
For example, let the reverberation echo spectrum at a certain sampling timing N be S (FE ′ ′ n
(N)), and let the frequency spectrum of the pseudo echo sound signal at the sampling timing N be
S (FE ′ n (N)). The reverberation echo spectrum at the sampling timing N-1 of the above is S (FE
′ ′ n (N−1)). Also, let β be the update coefficient.
[0033]
Then, in this setting, the reverberation echo spectrum S (FE ′ ′ n (N)) is expressed by the
following equation. Based on this equation, the echo spectrum estimation unit 301 calculates the
reverberation echo spectrum S (FE ′ ′ n (N)).
[0034]
S (FE ′ ′ n (N)) = (1−β) S (FE ′ ′ n (N−1)) + β S (FE ′ n (N)) −−−− computing
equation (1) The echo spectrum estimating unit 301 The correction reverberation echo spectrum
S (FEC n (N)) is calculated by multiplying each frequency component of the calculated
reverberation echo spectrum by the correction coefficient F.
[0035]
S (FEC n (N)) = F · S (FE ′ ′ n (N)) Here, the correction coefficient F is represented by a plurality
of correction functions F k (ω) having different frequency characteristics.
15-04-2019
10
Here, k is a unique identifier of each correction function.
[0036]
That is, the echo spectrum estimation unit 301 sets each frequency component of the
reverberation echo spectrum S (FE ′ ′ n (N)) to FE ′ ′ n, ω (N), and each frequency of the
correction reverberation echo spectrum S (FE Cn (N)) Assuming that the component is FECn, ω
(N), an operation consisting of FECn, ω (N) = Fk (ω) · FE ′ ′ n, ω (N) is executed for each
frequency component. Thus, the corrected reverberation echo spectrum S (FEC n (N)) is
calculated.
[0037]
At this time, the echo spectrum estimation unit 301 selects the correction function Fk (ω) based
on the state determination result. For example, as shown in FIG. 2, when acquiring the state
determination result “W talk”, the echo spectrum estimation unit 301 selects the correction
function F1 (ω), and the state determination result “only the emission of the emitted sound
signal” Once acquired, the correction function F2 (ω) is selected.
[0038]
FIG. 3 is a diagram showing frequency characteristics of F1 (ω) and F2 (ω) as an example of the
correction function Fk (ω). As shown in FIG. 3, the correction function F1 (ω) is set such that the
correction coefficient F value in the bass range is lower than that of the correction function F2
(ω). Further, in the correction function F1 (ω), the frequency at which the correction coefficient
F reaches “1” is set higher than that of the correction function F2 (ω).
[0039]
By performing such setting, the corrected reverberation echo spectrum S (FECn) in which the
level of the bass component is suppressed is calculated in the “W talk”, and in the “only
15-04-2019
11
emission of the emitted voice signal”, the bass component is A corrected reverberation echo
spectrum S (FECn) whose level is not suppressed much is calculated. The echo spectrum
estimation unit 301 outputs the calculated corrected reverberation echo spectrum S (FECn) to
the adder 70.
[0040]
Thus, by estimating the reverberation echo spectrum S (FE ′ ′ n) based on the frequency
spectrum S (FE ′ n) of the pseudo echo sound signal FE ′ n, reverberation echoes that can not
be eliminated by the adaptive filter 20 can be eliminated. A frequency spectrum can be obtained.
That is, the adaptive filter 20 is composed of an FIR filter or the like, and the pseudo echo sound
signal FE'n that can be represented by the specifications such as the number of taps is limited. As
a result, when restored on the time axis, a difference occurs between the pseudo echo sound
signal FE't and the real wraparound sound. However, by estimating the reverberation echo from
the pseudo echo sound signal FE'n in the frequency domain, it is possible to remove this
limitation on the time axis and estimate the reverberation echo that can not be eliminated by the
pseudo echo sound signal FE't. be able to.
[0041]
Furthermore, by using the correction coefficient F, it is possible to set a corrected reverberation
echo spectrum S (FECn) composed of frequency characteristics according to the state
determination result. That is, at the time of "W talk" in which the speaker voice on the own
apparatus side is included in the collected voice signal, it is possible to suppress the bass
component of the estimated reverberation echo to be subtracted from the first corrected voice
signal. As a result, it is possible to output a speaker's voice close to the original sound without
excessive removal of the low-frequency component which is the main component of the
speaker's voice at the time of "W talk". On the other hand, when there is no speaker voice on the
own device side and only the sound emission sound signal from the other device side is "only
sound emission of the sound emission sound signal", subtraction from the first correction sound
signal The bass component of the estimated reverberant echo can be left largely unsuppressed.
As a result, it is possible to effectively remove the low frequency range component of the
wraparound sound at the time of "only the emission of the emitted sound signal".
[0042]
In addition, the echo spectrum estimation unit 301 performs the above-described timing at the
15-04-2019
12
time of switching between the “W talk” time and the “only emission of emitted sound
signal” or the timing according to the pseudo echo sound signal estimation unit 202 of the
adaptive filter 20. Repeat estimation learning.
[0043]
Further, in the present embodiment, the correction coefficient F at the time of “only the
collected sound signal” or at the time of “silence” is not specified.
This is because there is no looped-in voice when "collected voice signal only" or "silence", the
pseudo echo sound signal FE't becomes almost 0 and the reverberation echo also becomes
almost 0, so the influence of the correction factor F Because they receive almost no Therefore,
for example, at the time of "only the collected sound signal" or "silence", processing such as
maintaining the immediately preceding correction coefficient F may be performed.
[0044]
The noise spectrum estimation unit 302 sequentially acquires and temporarily stores the
frequency spectrum S (NE'n) of the first corrected speech signal NE'n. The noise spectrum
estimation unit 302 estimates the noise spectrum S (NE ′ ′ n) based on the frequency
spectrum S (NE ′ n) of the plurality of first corrected speech signals NE ′ n acquired and
stored.
[0045]
For example, let the noise spectrum at a certain sampling timing N be S (NE′′n (N)), and let the
frequency spectrum of the first corrected speech signal at the same sampling timing N be S
(NE′n (N)). The frequency spectrum of the first corrected speech signal at the sampling timing
N-1 of S is denoted by S (NE'n (N-1)). Also, let α and γ be constants.
[0046]
Then, in this setting, the noise spectrum S (NE ′ ′ n (N)) is expressed by the following equation
and calculated.
15-04-2019
13
[0047]
S (NE ′ ′ n (N)) = α S (NE ′ n (N−1)) + γ S (NE ′ n (N)) Thus, the first corrected voice signal
NE ′ n which is a signal after echo cancellation By estimating the noise spectrum S (NE′′n)
based on the frequency spectrum S (NE′n), stationary noise such as background noise other
than echo can be calculated.
At this time, the noise spectrum estimation unit 302 performs learning based on the abovedescribed estimation only when the information on the "silence" state is acquired from the state
determination unit 10. Such estimation and learning are also repeatedly performed during the
operation of the acoustic echo canceller 1.
[0048]
The adder 70 is an adder that performs calculation in the frequency domain, and from the
frequency spectrum S (NE'n) of the first corrected speech signal NE'n, the corrected
reverberation echo spectrum S (FECn) and the noise spectrum S (NE) ' By subtracting n), the
second corrected audio signal S (NOn) is generated and output. In addition, this process is
performed so that each spectrum synchronizes. That is, the synchronization referred to here is
calculation using each spectrum formed at the same sampling timing. For example, in the case of
sampling timing N, S (NOn (N)) = S (NE'n (N) )) -S (FEC n (N))-S (NE "n (N)) It means performing
arithmetic processing. Thereby, as the second stage correction, reverberation echo and stationary
noise are removed by a method different from adaptive echo cancellation.
[0049]
The IFFT 922 is an inverse fast Fourier transform circuit that converts the second corrected
speech signal NOn, which is a function of the frequency domain, into a second corrected speech
signal NOt, which is a function of the time domain, and outputs it.
[0050]
By performing the above-described configuration and processing, echo removal processing can
15-04-2019
14
be performed according to the state of the emitted and collected sound.
Specifically, during "W talk", excessive removal of the low frequency range component of the
speaker's voice can be prevented, and output can be performed without changing the sound
quality of the speaker's voice. At times, wraparound speech can be effectively eliminated.
[0051]
Next, an acoustic echo canceller according to a second embodiment will be described with
reference to the drawings. FIG. 4 is a block diagram showing the main configuration of the
acoustic echo canceller 1 'of this embodiment. The acoustic echo canceller 1 'of this embodiment
is the acoustic echo canceller 1 shown in the first embodiment, added with a configuration and
processing for estimating reverberation echo according to the acoustic environment such as
installation environment . Therefore, in the following description, only portions different from the
acoustic echo canceller 1 of the first embodiment will be described.
[0052]
The acoustic echo canceller 1 ′ further includes an acoustic environment update unit 11, an
operation unit 12, a display unit 13, and an acoustic environment detection unit 14 in addition to
the acoustic echo canceller 1 shown in FIG. 1 of the first embodiment. It is a thing. Further, the
state determination unit 10 also gives the state determination result to the acoustic environment
detection unit 14. Further, in the present embodiment, the update coefficient β is not a constant
value, but is selected according to the acoustic environment.
[0053]
When the acoustic environment update unit 11 receives an instruction to set an acoustic
environment parameter from the operation unit 12 or the acoustic environment detection unit
14, the echo spectrum estimation unit 301 of the disturbance spectrum estimation unit 30 has
an update coefficient β corresponding to the designated acoustic environment parameter. Give
to. FIG. 5 is a diagram showing an example of the concept of setting parameters of the update
coefficient β. For example, as shown in FIG. 5, the acoustic environment update unit 11 gives β
= 1 to the echo spectrum estimation unit 301 when the information of “echo minimum” is
15-04-2019
15
obtained as the acoustic environment parameter. Further, when the acoustic environment update
unit 11 obtains information of “echoing” as the acoustic environment parameter, the acoustic
environment update unit 11 gives β = 0.6 to the echo spectrum estimation unit 301.
Furthermore, the acoustic environment update unit 11 gives β = 0.2 to the echo spectrum
estimation unit 301 when the information of “echoic” is obtained as the acoustic environment
parameter. The setting value of the update coefficient β shown here is an example, and may be
set appropriately according to the device specification and environment, and the update
coefficient β may be set in multiple stages.
[0054]
The operation unit 12 is a user interface with a user including a speaker, and has various
operators (not shown). When the operation unit 12 receives an operation input for setting the
sound environment from the user, the operation unit 12 outputs a sound environment parameter
setting instruction corresponding to the sound environment input to the sound environment
update unit 11.
[0055]
The display unit 13 includes a display element such as a liquid crystal display, and displays an
operation menu and the like according to display control from the acoustic environment update
unit 11.
[0056]
The user manually sets the acoustic environment parameter by means of the operation unit 12
and the display unit 13.
That is, when the user receives an instruction to change the setting of the sound environment
parameter from the operation unit 12, a screen for setting the sound environment parameter, for
example, “large” or “medium” indicating the size is shown “Small” is displayed on the
display unit 13. The user inputs the size etc. of the room in which the device having the acoustic
echo canceller 1 'is installed according to the display screen. The operation unit 12 provides the
acoustic environment update unit 11 with a sound environment parameter setting instruction
(for example, "echo minimum", "echo", "echo size" in FIG. 5) based on the operation input result.
The acoustic environment update unit 11 provides the echo spectrum estimation unit 301 with
15-04-2019
16
the update coefficient β according to the acoustic environment parameter as described above.
[0057]
The acoustic environment detection unit 14 acquires an impulse response signal formed by
performing an inverse Fourier transform on the impulse response according to each tap
coefficient of the pseudo echo sound signal estimation unit 202 with the IFFT 141, and detects
an envelope characteristic. The acoustic environment detection unit 14 obtains the reverberation
echo time by detecting the amplitude and attenuation characteristics of the envelope waveform,
and gives an acoustic environment parameter setting instruction to the acoustic environment
update unit 11 based on the reverberation echo time. For example, if the time constant of the
envelope waveform is extremely short and the echo time is "substantially absent", the acoustic
environment parameter "echo minimum" is given to the acoustic environment update unit 11. If
the time constant of the envelope waveform is short and the echo time is "short", the acoustic
environment parameter "echoing" is given to the acoustic environment update unit 11. If the time
constant of the envelope waveform is long and the echo time is "long", the acoustic environment
parameter "resonance large" is given to the acoustic environment update unit 11. By performing
such processing, acoustic environment parameters can be automatically set without manual
input. Furthermore, by performing this processing for each detection of silence, acoustic
environment parameters are dynamically changed taking into account acoustic environment
changes caused by, for example, changes in the number of users or changes in the position of the
user. It can be done.
[0058]
The echo spectrum estimation unit 301 calculates the reverberation echo spectrum by the
operation equation (1) using the given update coefficient β. Then, as in the first embodiment, the
correction according to the sound emitting and collecting state is performed.
[0059]
As described above, by using the update coefficient β, it is possible to more optimally adjust the
estimation algorithm (the above-described arithmetic expression (1)) according to the installation
environment of the device provided with the acoustic echo canceller 1. As a result, the
reverberation echo spectrum S (FE ′ ′ n (N)) can be accurately estimated according to the
15-04-2019
17
acoustic environment.
[0060]
As described above, by using the configuration and processing of the present embodiment, it is
possible to estimate the optimum reverberation echo that can not be eliminated by the adaptive
filter in consideration of both the acoustic environment and the sound emission state. Echo
cancellation can be performed more effectively.
[0061]
Next, an acoustic echo canceller 1 ′ ′ according to the third embodiment will be described with
reference to the drawings.
[0062]
FIG. 6 is a block diagram showing the main configuration of the acoustic echo canceller 1 ′ ′
of the present embodiment.
The acoustic echo canceller 1 ′ ′ of this embodiment further includes an echo suppressor 40, a
band dividing unit 50, and an adder 80 in addition to the acoustic echo canceller 1 ′ shown in
FIG. 4 of the second embodiment. .
The state determination unit 10 also gives the state determination result to the echo suppressor
40. The other configuration is the same as that of the second embodiment, but the content of the
signal processing is different from the first embodiment and the second embodiment, so only
different parts will be described below.
[0063]
The state determination unit 10 of the acoustic echo canceller 1 ′ ′ is the first based on the
signal levels of the sound emission voice signal FEt, the collected sound signal bass region
component NLEt to be described later, and the bass region first corrected voice signal NLE′t. In
the same manner as in the second embodiment, "both sound emitting and collecting state (W talk
state)", "only sound emitting state for sound emitting sound signal", "no sound emitting and no
15-04-2019
18
sound collecting signal Is detected, and the detection state is detected as the adaptive filter 20,
the disturbance spectrum estimation unit 30, the acoustic environment detection unit 14, and
The echo suppressor 40 is supplied. At this time, the collected sound signal bass region
component NLEt (NLEn) of this embodiment corresponds to the collected voice signal NEt (NEn)
in the first and second embodiments, and the first correction of the bass region component of
this embodiment The audio signal NLE't (NLE'n) corresponds to the first corrected audio signal
NE't (NE'n) in the first and second embodiments.
[0064]
The adaptive filter 20 of the acoustic echo canceller 1 '' generates the pseudo echo sound signal
FE'n from the frequency spectrum of the bass component first corrected speech signal NLE'n
using the above-mentioned adaptation algorithm.
[0065]
A band dividing unit 50 is provided between the microphone MIC and the adder 60.
The band dividing unit 50 separates the collected sound signal NEt into a low sound range
component NLEt and a high sound range component NHEt. Here, the threshold frequency for
dividing the low range and the high range is set to 8 kHz, for example, and the low range
component of 8 kHz or less, which is the main component of human voice, is given to the adder
60 and High treble range components are provided to the echo suppressor 40.
[0066]
The adder 60 subtracts the pseudo reflection sound signal FE't from the collected sound signal
bass area component NLEt to thereby generate and output a first bass area component corrected
voice signal NLE't.
[0067]
The FFT 912 is a high speed Fourier transform circuit, which converts a bass component first
corrected audio signal NLE't which is a function of time domain into a bass component first
corrected audio signal NLE'n which is a function of frequency domain and outputs it. Do.
15-04-2019
19
The frequency spectrum S (NLE'n) of the bass component first corrected speech signal NLE'n is
input to the above-described pseudo echo sound signal estimation unit 202 and the noise
spectrum estimation unit 302 of the disturbance spectrum estimation unit 30.
[0068]
The noise spectrum estimation unit 302 of the disturbance spectrum estimation unit 30
sequentially acquires and temporarily stores the frequency spectrum S (NLE'n) of the bass
component first corrected speech signal NLE'n. The noise spectrum estimation unit 302
estimates the noise spectrum S (NLE ′ ′ n) based on the frequency spectrum S (NLE′n) of the
plurality of low frequency range component first corrected speech signals NLE ′ n acquired and
stored. .
[0069]
The adder 70 is an adder that performs calculation in the frequency domain, and from the
frequency spectrum S (NLE'n) of the bass component first corrected voice signal NLE'n, the
corrected reverberation echo spectrum S (FECn) and the noise spectrum S By subtracting (NLE ′
′ n), the bass component second corrected audio signal NLOn is generated and output. In
addition, this process is performed so that each spectrum synchronizes. The synchronization
processing by the adder 70 is the same as the synchronization processing performed in the first
and second embodiments.
[0070]
The IFFT 922 is an inverse fast Fourier transform circuit that converts the bass component
second corrected audio signal NLOn, which is a function of the frequency domain, into a bass
component second corrected audio signal NLOt, which is a function of the time domain, and adds
the adder 80. Give to.
[0071]
The echo suppressor 40 comprises an attenuator 401 and a delay circuit 402.
15-04-2019
20
The attenuator 401 adjusts the attenuation amount of the high frequency range component NHEt
of the collected sound signal NEt based on the state determination result from the state
determination unit 10, and outputs the attenuated high frequency range component NHE't.
[0072]
FIG. 7 is a diagram showing the attenuation amount of the attenuator 401 of the echo suppressor
40. As shown in FIG. The attenuator 401 intercepts the attenuation amount to infinity, that is,
cuts off the high range component NHEt when information on the "only the emission of the
emission sound signal" state or the "silence" state is acquired (NHE't = 0 ). This is because in the
case of only the sound emission sound signal or in the case of silence, since the speaker sound is
not included, it is possible to more reliably remove the echo and the stationary noise by blocking
the high frequency range component. It is.
[0073]
In addition, when the attenuator 401 obtains the information of “no sound emission and there
is a collected sound signal”, the attenuation amount is “0”, that is, the high frequency range
component NHEt is allowed to pass without being attenuated (NHE't = NHEt). As described above,
in the case of only the collected voice signal, the high frequency range component is dominated
by the speaker's voice, so that the speaker's voice can be more accurately output by not
attenuating the high frequency range component.
[0074]
Furthermore, when acquiring information on the “W talk” state, the attenuator 401 sets the
attenuation amount to a preset predetermined value. This is set to a certain attenuation amount
at the time of W-talk, since the component of the sound emission voice signal FEt to be removed
and the speaker voice component are mixed. In this setting, the speaker voice is slightly
attenuated in the high range. However, the signal level is higher in the high-frequency range of
the speaker voice that is directly picked up by the microphone as compared to the echo
component of the high-frequency range due to the low sound emission voice signal FEt that has a
small amount of sneaking. , Little affected by some damping. Therefore, it is possible to attenuate
the echo component by the sound emission voice signal FEt while slightly sacrificing the speaker
15-04-2019
21
voice.
[0075]
The attenuated high-frequency component NHE′t output from the attenuator 401 is input to
the delay circuit 402. The delay circuit 402 adds the high-frequency component NHEt (NHE't)
which is easy to process and fast and time-synchronously adds the low-frequency component
second corrected audio signal NLOt subjected to the above-described echo cancellation and echo
removal processing. To perform delay processing. By this delay processing, a delay-attenuated
high-tone component NHE′′t is generated and applied to the adder 80. The adder 80 adds the
bass component second corrected voice signal NLOt and the delayed attenuated high tone
component NHE ′ ′ t synchronized in time to this to generate an output voice signal NO′t,
and outputs it to the outside .
[0076]
As described above, the amount of processing calculation can be reduced by performing the echo
cancellation and the echo removal processing on only the low frequency range component and
the attenuation process on the high frequency range component. At this time, even if the abovedescribed echo cancellation and echo removal processing are not performed on the highfrequency range component, the main component of human voice is on the low-frequency range
component side, and the high-frequency range component is compared with the low-frequency
range component. As the sound quality is reduced, the deterioration of the sound quality can be
suppressed. Further, as described above, by changing the attenuation amount for each sound
emitting and receiving state, high frequency echo is more effectively removed along with the low
frequency band shown in the above embodiment, and the voice quality of the speaker voice is
improved. Deterioration can be suppressed.
[0077]
In the above description, although the disturbance spectrum estimation unit includes the echo
spectrum estimation unit and the noise spectrum estimation unit, reverberation echoes are
removed with high accuracy even with only the echo spectrum estimation unit. be able to. Also,
in the above description, an example is shown in which the adaptive filter is realized by
frequency domain arithmetic, but an adaptive filter in the time domain may be used.
15-04-2019
22
Furthermore, in the above description, although the example in which the state determination
unit 10 performs the state determination based on only the signal level is shown, the state
determination may be performed based on the correlation of each signal.
[0078]
In the above description, an acoustic echo canceller having a speaker and a microphone is shown
as an example, but an output terminal to a sound emitting element such as a speaker and an
input terminal from a sound collecting element such as a microphone are provided. The sound
element and the sound collection element may be separated.
[0079]
In the above description, the state determination unit 10 gives an example of the state
determination result to each unit. However, the state determination unit 10 stores the learning
timing conditions of each unit, and the state determination unit 10 learns from each unit You
may give timing.
[0080]
It is a block diagram which shows the main structures of the acoustic echo canceller of 1st
Embodiment.
It is the figure which showed the state determination of the acoustic echo canceller shown in FIG.
1, the learning process, and the judgment concept of a frequency correction pattern.
It is the figure which showed the frequency characteristic of F1 ((omega)) and F2 ((omega)) as an
example of correction function Fk ((omega)). It is a block diagram which shows the main
structures of the acoustic echo canceller of 2nd Embodiment. It is a figure which shows an
example of the concept of the setting parameter of update coefficient (beta). It is a block diagram
which shows the main structures of the acoustic echo canceller of 3rd Embodiment. FIG. 7 is a
diagram showing the attenuation amount of the attenuator 401 of the echo suppressor 40.
Explanation of sign
15-04-2019
23
[0081]
1-Acoustic echo canceller, 10-State determination unit, 11-Acoustic environment update unit, 12Operation unit, 13-Display unit, 14-Acoustic environment detection unit, 20-Adaptive filter, 201Pseudo echo sound signal generation unit , 202-pseudo echo sound signal estimation unit, 30disturbance spectrum estimation unit, 301-echo spectrum estimation unit, 302-noise spectrum
estimation unit, 40-echo suppressor, 401-attenuator, 402-delay circuit, 50-band division , 60, 70,
80-adder, 900-frequency domain operation unit, 911, 912-FFT operation unit, 921, 922-IFFT
operation unit, SP-speaker, MIC-microphone
15-04-2019
24
Документ
Категория
Без категории
Просмотров
0
Размер файла
38 Кб
Теги
description, jp2009302599
1/--страниц
Пожаловаться на содержимое документа