close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JPH10150343

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH10150343
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an
echo cancellation method and apparatus for erasing or suppressing an echo signal which is a
cause of howling and impaired hearing in, for example, a two-line four-line conversion system or
a speech communication system.
[0002]
2. Description of the Related Art First, the cause of such howling and an echo signal which causes
hearing impairment will be described with reference to a speech communication system shown
in FIG.
[0003]
In FIG. 8, 1 and 3 are transmitting microphones, 2 and 4 are receiving speakers, 5 and 7 are
transmitting signal amplifiers, 6 and 8 are receiving signal amplifiers, 9 is a transmission line, 10
is a talker, and 11 is a transmitting speaker. Each represents a receiver.
The transmitting voice uttered by the transmitter 10 is transmitted to the receiver 11 through
the transmitting microphone 1, the transmitting signal amplifier 5, the transmission path 9, the
receiving signal amplifier 8, and the receiving speaker 4. Since this speakerphone system does
not need to have a handset in hand like the conventional telephone system, it has the advantages
15-04-2019
1
of being able to make calls while working, and of being able to realize natural face-to-face
communication, It has been widely used for teleconferencing, videophones, and loud telephones.
[0004]
However, as a drawback of this communication system, the existence of echo has become a
problem. That is, in FIG. 8, the voice transmitted from the speaker 4 to the receiving side is
received by the microphone 3 and reproduced to the transmitting side via the transmitting signal
amplifier 7, the transmission path 9, the receiving signal amplifier 6, and the speaker 2. . For the
talker 10, this phenomenon is a reverberation phenomenon in which the voice uttered by oneself
is reproduced from the speaker 2, and is called an acoustic echo or the like. This echo
phenomenon causes an adverse effect such as disturbance or discomfort in the speech in the
speech communication system. Furthermore, the sound reproduced from the speaker 2 is
received by the microphone 1 to form a closed loop of the signal. When the loop gain is larger
than 1, a howling phenomenon occurs and the call becomes impossible.
[0005]
In order to overcome such problems of the speech communication system, an echo canceler is
used. As a typical configuration method of the echo canceler, a full band method and a sub band
method are known.
[0006]
FIG. 9 is a block diagram showing an example of a conventional full-band echo canceller. In this
figure, 21 represents an echo canceller, 22 represents a pseudo echo path, 23 represents an echo
path estimation circuit, and 24 represents a subtractor. Further, x (n) 25 is a reception signal, h
(n) 26 is an echo transmission characteristic (impulse response) between the reception speaker 4
and the transmitting microphone 3, y (n) 27 is an echo signal, y ^ ( n) 28 is a pseudo echo signal,
h ^ (n) 29 is an estimate of echo path impulse response, e (n) 30 is an error signal, s (n) 31 is a
speech signal of a near end speaker, z (n) ) 32 represents a microphone output signal.
[0007]
15-04-2019
2
In the echo canceler 21, the echo path estimation circuit 23 first estimates the impulse response
of the echo path, and transfers the estimated value h ^ (n) 29 to the pseudo echo path 22. Next,
in the pseudo echo path 22, a convolution operation of h ^ (n) 29 and the reception signal x (n)
25 is executed to synthesize the pseudo echo signal y ^ (n) 28. Then, in the subtractor 24, the
pseudo echo signal y ^ (n) 28 is subtracted from the output signal z (n) of the microphone 3. If
the echo path impulse response h (n) 26 is estimated well, the echo signal y (n) 27 and the
pseudo echo signal y ^ (n) 28 become substantially equal, and the result of this subtraction is The
echo signal y (n) 27 contained in the microphone output is canceled.
[0008]
Here, the pseudo echo path 22 needs to follow the time-dependent fluctuation of the echo path
impulse response h (n) 26. Therefore, the echo path estimation circuit 23 estimates an echo path
impulse response using an adaptive algorithm. This estimation operation is performed when it is
considered that the receiving state, that is, s (n) ≒ 0 and z (n) ≒ y (n). In the listening state, the
error signal e (n) 30 can be regarded as the cancellation residual y (n) -y ^ (n) of the echo signal.
In the following description, this reception state is assumed. The adaptive algorithm is an
algorithm that uses the incoming signal x (n) 25 and the error signal e (n) 30 to determine the
estimated value h ^ (n) of the impulse response so that the power of the error signal is
minimized. , LMS method, learning identification method, ES method, etc. are known. Here, a
state in which the value of the pseudo echo path 22 is close to the value of the true echo path
and the pseudo echo signal y ^ (n) 28 becomes substantially equal to the echo signal y (n) 27 is
called convergence. Further, the pseudo echo path 22 and the echo path estimation circuit 23 are
collectively referred to as an adaptive filter here.
[0009]
FIG. 10 is a block diagram showing an example of a conventional sub-band echo canceller, and
the same reference numerals as in FIGS. The received signal x (n) 25 and the microphone output
signal z (n) 32 are each divided into N frequency bands. 41 and 42 are frequency band division
circuits, 43 is a frequency band synthesis circuit, 44-1 to 44-N are adaptive filters, 45-1x1 (m) to
45-NxN (m) are received signals after frequency division, 46-1z1 (m) to 46-NzN (m) are
microphone output signals after frequency division, and 47-1e1 (m) to 74-NeN (m) are error
signals after frequency division. Further, m represents the discrete time of the signal after
decimation by the frequency division circuit, and when the decimation ratio is R, there is a
relation of n = R × m. Thus, the subband echo canceller suppresses echo signals in the adaptive
15-04-2019
3
filter for each frequency band.
[0010]
In general, in an adaptive filter, when the tap length indicating the number of filter coefficients is
prepared for the (true) impulse response length of the echo path, complete echo cancellation can
be realized. However, in general, the echo signal (acoustic echo) has an impulse response
duration of several hundreds of ms in reverberation time. Therefore, when it is going to cancel
this reverberator signal, the number of taps becomes very long, resulting in an increase in
hardware scale. Therefore, it is conceivable to determine the required echo suppression amount
based on the tolerance for human echo and prepare a tap length suitable for the required
suppression amount from the impulse response length of the echo path. In the conventional fullband echo canceler, the number of taps is determined as follows based on the average
reverberation time in a room. The tap length LFullBand is given by the following equation as a
required echo suppression amount (Desired Loss) DL, an average reverberation time TR and a
sampling interval TS (s).
[0011]
As can be seen from this equation (1), the tap length has been determined by the required echo
suppression amount obtained in all frequency bands and the reverberation time in a room
similarly calculated in all frequency bands.
[0012]
However, considering human auditory characteristics, the required echo suppression amount
also differs for each frequency band because the audible level differs for each frequency band.
Furthermore, it is known that the reverberation time in the room is longer in the low band and
shorter as the high band compared to the value obtained by averaging the entire frequency band.
As described above, the conventional determination of the tap length does not take into
consideration that the required echo suppression amount is different for each frequency band.
Therefore, if the signal is divided into a plurality of bands and the values obtained in all
frequency bands are applied as they are to the subband-type echo canceler that performs echo
suppression processing for each band, the tap length allocation of the apparatus There is a waste
in that point.
15-04-2019
4
[0013]
In addition, returning echoes are more easily detected as the transmission delay increases. That
is, when the transmission delay is small, the echo is masked to its own utterance and is hard to
hear like a side tone. However, as the transmission delay increases, the echo will be more audible
because it will exceed the masking range of its own utterance on the time axis. Therefore, there is
a problem that the required amount of echo cancellation changes depending on the size of the
transmission delay. Furthermore, the change on the required echo suppression amount due to
this delay has not been studied on the frequency axis so far.
[0014]
Thus, in addition to the fact that the conventional tap length is determined by the required echo
suppression amount and reverberation time in all bands without considering the auditory
characteristics on the frequency axis, the required tap length is determined by the magnitude of
the transmission delay. It does not correspond to the change in the amount of echo suppression.
[0015]
As described above, in the conventional echo canceler, the tap length of the adaptive filter is
determined by the value of the required echo suppression amount in the entire band and the
reverberation time averaged in the entire band. It was decided.
For this reason, in the subband echo cancellation apparatus, there is a problem that waste occurs
in the allocation of tap lengths for each band.
[0016]
Further, the value of the required echo suppression amount changes according to the magnitude
of the transmission delay. Therefore, when a echo canceler is used in a line having a transmission
delay significantly different from the assumed value, echoes can not be sufficiently erased,
resulting in a problem of deterioration in speech quality.
15-04-2019
5
[0017]
The present invention has been made in view of the above, and an object thereof is to provide an
echo cancellation method and apparatus capable of sufficiently erasing an echo signal without
causing deterioration in speech quality due to the magnitude of transmission delay. It is.
[0018]
In order to achieve the above object, the present invention according to claim 1 divides the
transmission signal to the echo path into a plurality of frequency bands, and the transmission
signal passes through the echo path. The echo signal of is divided into a plurality of frequency
bands to generate a pseudo echo path of each frequency band, and a plurality of frequencies
obtained by inputting transmission signals of a plurality of frequency bands to the pseudo echo
path of each frequency band An echo cancellation method for subtracting a pseudo echo signal
of a band from echo signals of a plurality of frequency bands to erase the echo signal, wherein
the pseudo echo path of each frequency band is configured by an adaptive filter, and the
cancellation error of the echo signal is The average power level of echo speech in which the filter
coefficient of each adaptive filter is sequentially corrected by an algorithm that operates to
minimize, and the masking effect by the speech for each frequency band is considered And the
gist determining the tap length that indicates the number of filter coefficients of each adaptive
filter on the basis of the amount of suppression required sufficient echo to be determined from
the audible level relative to human sound in each frequency band.
[0019]
According to the present invention as set forth in claim 1, the filter coefficients of the respective
adaptive filters constituting the pseudo echo path of each frequency band are sequentially
corrected to minimize the elimination error of the echo signal, and the masking effect by the
speech is realized. The tap length indicating the number of filter coefficients of each adaptive
filter is determined based on the average power level of echo sound considered and the
necessary and sufficient amount of echo suppression determined from the audible level for
human sound.
[0020]
Also, in the present invention according to claim 2, in the invention according to claim 1, the step
of determining the tap length measures the magnitude of the transmission delay, and this
measured transmission delay is based on the speech voice for each frequency band. From the
average power level of echo sound considering the masking effect and the audible level for
human sound in each frequency band, the necessary and sufficient echo suppression amount is
determined, and the reverberation time for each frequency band and the determined required
15-04-2019
6
echo The gist is to calculate the tap length from the suppression amount.
[0021]
In the present invention according to claim 2, the necessary and sufficient amount of echo
suppression is determined from the transmission delay, the average power level of the echo
sound considering the masking effect by the speech sound, and the audible level for human
sound, and this determination is made. The tap length is calculated from the required echo
suppression amount and reverberation time.
[0022]
Furthermore, in the present invention according to claim 3, in the invention according to claim 2,
the step of determining the required echo suppression amount is performed for each frequency
band when the measured transmission delay is equal to or less than a predetermined value. Using
the difference between the average power level of speech and the average masking level of
speech as the average power level of echo speech considering the masking effect by speech,
using this difference and the audibility level for human sound in each frequency band It is
important to determine the required amount of echo suppression.
[0023]
In the present invention according to claim 3, when the transmission delay is equal to or less
than a predetermined value, the average power level of speech and the average masking level of
speech speech are taken as the average power level of echo speech taking into account the
masking effect by speech speech. And the audible level for human sound is used to determine the
required amount of echo suppression.
[0024]
According to a fourth aspect of the present invention, in the second aspect of the present
invention, in the step of determining the required echo suppression amount, when the measured
transmission delay is equal to or greater than a predetermined value, the voiced speech for each
frequency band Determine the required echo suppression amount using the average power level
of voice as the average power level of echo voice taking into account the masking effect by using
the average power level and the audible level for human sound in each frequency band As the
abstract.
[0025]
In the present invention according to claim 4, when the transmission delay is equal to or more
15-04-2019
7
than a predetermined value, the average power level of voice is used as the average power level
of echo voice in consideration of the masking effect by the voice, and this average power level
The required echo suppression amount is determined using the audibility levels for human and
human sounds.
[0026]
Also, in the present invention according to claim 5, in the invention according to claim 3 or 4, the
predetermined value of the transmission delay is 60 ms.
[0027]
Furthermore, the present invention according to claim 6 is a first frequency band dividing circuit
for dividing a transmission signal to an echo path into a plurality of frequency bands, and a
plurality of echo signals after the transmission signal passes through the echo path. A second
frequency band dividing circuit for dividing into frequency bands, and pseudo echo paths of
respective frequency bands divided by the frequency band dividing circuit are generated, and
transmission signals of the plurality of frequency bands are generated for each of the plurality of
frequency bands. And an echo canceler that cancels the echo signal by subtracting pseudo echo
signals of a plurality of frequency bands obtained by using the pseudo echo path input as the
echo echo signal of the plurality of frequency bands, from the echo signals of the plurality of
frequency bands. An adaptive fill with filter coefficients that are sequentially modified by an
algorithm operating to configure each of the pseudo echo paths of the signal to minimize the
cancellation error of the echo signal And the above-mentioned respective frequencies based on
the necessary and sufficient echo suppression amount determined from the average power level
of the echo sound in consideration of the masking effect by the speech sound for each frequency
band and the audible level for human sound for each frequency band. A summary of the present
invention is to have a tap length assigning means for determining a tap length indicating the
number of filter coefficients of the adaptive filter in the band.
[0028]
According to the present invention as set forth in claim 6, the filter coefficients of the respective
adaptive filters constituting the pseudo echo path of each frequency band are sequentially
corrected to minimize the elimination error of the echo signal, and the masking effect by the
speech sound is obtained. The tap length indicating the number of filter coefficients of each
adaptive filter is determined based on the average power level of echo sound considered and the
necessary and sufficient amount of echo suppression determined from the audible level for
human sound.
15-04-2019
8
[0029]
The present invention according to claim 7 relates to the invention according to claim 6, wherein
the tap length allocation unit is a transmission delay determination unit that measures the
magnitude of transmission delay, the measured transmission delay, and the frequency band for
each frequency band. A required echo suppression amount determining means for determining a
necessary and sufficient amount of echo suppression from the average power level of echo sound
considering the masking effect by the speech and the audible level for human sound in each
frequency band, reverberation in the room A reverberation time storage means for storing time
for each frequency band, and a tap length calculation means for calculating a tap length from the
stored reverberation time for each frequency band and the determined required echo
suppression amount As the abstract.
[0030]
In the present invention according to claim 7, the necessary and sufficient amount of echo
suppression is determined from the transmission delay, the average power level of the echo
sound considering the masking effect by the speech sound, and the audible level for human
sound, and this determination is made. The tap length is calculated from the required echo
suppression amount and reverberation time.
[0031]
Further, in the present invention according to claim 8, in the invention according to claim 7,
when the required echo suppression amount determining means determines that the
transmission delay measured by the transmission delay judging means is equal to or less than a
predetermined value, The difference between the average power level of speech and the average
masking level of speech is used as the average power level of echo speech taking into account
the masking effect of speech in each frequency band, and this difference and human sound for
each frequency band It is important to determine the required amount of echo suppression using
the audibility level for.
[0032]
In the present invention according to claim 8, when the transmission delay is equal to or less
than a predetermined value, the average power level of speech and the average masking level of
speech speech are taken as the average power level of echo speech considering the masking
effect by speech speech. And the audible level for human sound is used to determine the required
echo suppression amount.
[0033]
15-04-2019
9
Furthermore, in the present invention according to claim 9, in the invention according to claim 7,
when the required echo suppression amount determining means determines that the
transmission delay measured by the transmission delay judging means is equal to or more than a
predetermined value, Using the average power level of voice as the average power level of echo
voice taking into account the masking effect by the speech in each frequency band, using this
average power level and the audible level for human sound in each frequency band, The gist is to
determine the required echo suppression amount.
[0034]
In the present invention according to claim 9, when the transmission delay is equal to or more
than a predetermined value, the average power level of voice is used as the average power level
of echo voice in consideration of the masking effect by the voice, and this average power level
And the audible level for human sound are used to determine the required echo suppression
amount.
[0035]
According to a tenth aspect of the present invention, in the invention according to the eighth or
ninth aspect, the predetermined value of the transmission delay is 60 ms.
[0036]
The present invention according to claim 11 is the invention according to claim 3, wherein, in the
step of determining the required amount of echo suppression, an average power level of echo
sound in consideration of a masking effect by speech sound for each frequency band. The
average masking level of the uttered voice for determining the is set to different values based on
the magnitude of the reverberation time in the room which is the echo path.
[0037]
According to the present invention as set forth in claim 11, as the average masking level of the
speech voice for obtaining the average power level of the echo speech in consideration of the
masking effect by the speech voice, different values based on the magnitude of the reverberation
time of the echo path Is set.
[0038]
Further, in the present invention according to claim 12, in the invention according to claim 8, in
the required echo suppression amount determining means, an average power level of echo sound
in consideration of a masking effect by speech sound for each frequency band is determined. The
15-04-2019
10
present invention is characterized in that the average masking level of the uttered voice is set to
different values based on the magnitude of the reverberation time in the room which is the echo
path.
[0039]
According to the present invention as set forth in claim 12, based on the magnitude of the
reverberation time in the room, which is the echo path, as the average masking level of the
speech voice for obtaining the average power level of the echo speech in consideration of the
masking effect by the speech speech. It has different values.
[0040]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS Before describing the
embodiments of the present invention, changes in the required echo suppression amount
depending on the magnitude of transmission delay will be described using experiments.
[0041]
Heretofore, the required echo suppression amount which is the basis for determining the tap
length is determined uniformly in all frequency bands.
In addition, it was known that the detection limit of echo increases (the required echo
suppression amount increases) as the transmission delay increases, but at that time it is clear
how it changes for each frequency band. It is not done.
[0042]
Therefore, it was investigated by subjective evaluation using a simulation system that how the
required echo suppression amount changes for each frequency band depending on the
magnitude of the transmission delay.
[0043]
The simulation system assumed an opposite voice communication system of 4-wire circuit
configuration having a 7 kHz band.
15-04-2019
11
FIG. 3 shows a schematic diagram of the system.
[0044]
In this evaluation experiment, the voice uttered on the evaluation side passes through a real-time
convoluting device that simulates the room transfer characteristic on the other side, and
determines the required echo suppression amount for the echo that is returned again.
Here, the echo signal used for the evaluation of the required amount passes a band pass filter
similar to that used in the subband echo canceller, and is already limited to each band at the time
of the loudspeaker.
A loss is inserted into the echo for each band by a variable resistor, and echo suppression is
simulated to determine the required echo suppression amount.
The transmission / reception sensitivity is defined in ITU-T recommendation P. It is set according
to No. 34, and the required loss amount of 0 dB is a state where the insertion loss amount is 0
dB.
[0045]
The evaluation category for determining the required echo suppression amount is in mind to
determine the most appropriate required echo suppression amount in the design of the actual
device, and "when listening carefully to hear the echo, there is a feeling of residual echo
somewhat.
"About.
This category corresponds to the middle range between the detection limit and the allowance in
the conventional evaluation.
15-04-2019
12
[0046]
The evaluation parameters are 32 bands (250 Hz bandwidth) corresponding to the subband echo
canceller to be designed, and transmission delay time.
Note that the acoustic coupling in the real-time convolution device was -2 dB.
[0047]
Thus, we examined how statistically significant the number of assessors affected the required
amount of each band depending on the size of the transmission delay.
[0048]
As for the delay time which is an evaluation parameter, no transmission delay and transmission
delay (about 200 ms) were simulated.
Here, even in the case of no transmission delay, one round processing delay (about 28 ms) of the
division and synthesis filter of the subband echo canceller is taken into consideration.
In addition, the echo path impulse response continuation time of the evaluation side and the
other side was 200 ms.
The required echo suppression amount for each frequency band thus obtained is shown in FIG.
[0049]
The horizontal axis in FIG. 4 represents the frequency (Hz), the vertical axis represents the
required echo suppression amount (dB), and the solid line represents the result when the
transmission delay is small, and the broken line represents the result when the transmission
delay is large.
15-04-2019
13
[0050]
From this experimental result, it can be seen that the required echo suppression amount in the
low band increases as the transmission delay increases.
The change of the frequency characteristic of the required echo suppression amount according
to the magnitude of the transmission delay can be qualitatively understood as follows.
[0051]
Since the speech voice and its echo can be heard almost simultaneously when the transmission
delay is small, the echo is equally masked on the frequency axis to the speech speech having
substantially the same frequency characteristics.
That is, in the low frequency band where the echo sound power is large, the masking level by the
speech sound is large, and as the high frequency band is the small echo sound power, the
masking level by the speech sound decreases.
Therefore, the average power level of echo sound becomes linear on the frequency axis.
Therefore, the required echo suppression amount has a maximum value in the middle range (2 to
4 kHz) where the audibility sensitivity is high.
[0052]
On the other hand, when the transmission delay is increased, a temporal shift occurs between the
speech and the echo thereof, and the echo is less evenly masked on the frequency axis.
In some cases, the echo returned to the unvoiced interval is not masked at all.
15-04-2019
14
Therefore, since the average power level of the echo sound follows the frequency distribution of
the average power level of the sound, the required echo suppression amount is larger in the low
band.
The validity of these interpretations was examined by quantitative analysis.
[0053]
FIG. 5 is a diagram showing an example of the determination of the required echo suppression
amount in the case where the transmission delay is large. It is the average power spectrum of the
voice according to the transmission / reception sensitivity of 34, and Zf 62 is a figure showing
the audibility level in the evaluation room of ambient noise less than 30 dBA in which the
evaluation experiment was performed.
FIG. 6 is a diagram showing an example of the determination of the required echo suppression
amount when the transmission delay is small, and Yf 63 is the difference between the average
power level of echo speech and the masking level of speech speech, that is, masked to speech
speech It is an average power spectrum of the echo sound after.
[0054]
First, according to the interpretation in the case where the transmission delay is large, the
required echo suppression amount DL can be determined from the average power spectrum Xf
61 of the voice of FIG. 5 and the audible level Zf 62 as in the following equation.
[0055]
DL = A. (Xf-Zf) (2) Here, when the weighting factor A in the equation (2) is the above-mentioned
evaluation experiment condition and the average power spectrum Xf 61 of the speech of FIG. 5
and the audible level Zf 62 In addition, A = 1 is standardized.
Actually, the calculation result of the required echo suppression amount is shown in DLlong 71
of FIG.
15-04-2019
15
This result agrees very well with the result when the transmission delay in FIG. 4 which is the
result of the subjective evaluation is large, that the above interpretation was correct, and the
required echo suppression amount can be obtained by the equation (2) Is shown.
[0056]
Next, according to the interpretation when the transmission delay is small, the required echo
suppression amount DL is determined from the average power spectrum Yf 63 of the echo voice
masked in the utterance voice of FIG. can do.
[0057]
DL = A. (Yf-Zf) (3) As in the case of a large delay, DLshort 72 in FIG. 7 shows the result of actually
calculating the required echo suppression amount.
This result agrees very well with the result when the transmission delay in FIG. 4 which is the
result of the subjective evaluation is small, that the above interpretation was correct, and the
required echo suppression amount can be obtained by the equation (3) Is shown.
[0058]
The above results can be summarized as follows.
[0059]
(1) When the transmission delay is small; The required echo suppression amount is
approximated by the average power level and the audible level of the echo voice masked to the
speech on the frequency axis.
[0060]
(2) When the transmission delay is large; The required echo suppression amount is approximated
by the characteristics of the average power level and the audible level of the voice with respect to
the frequency axis.
15-04-2019
16
[0061]
The method for determining the tap length of the adaptive filter of the subband echo echo
canceller for each frequency band from the required echo suppression amount thus obtained will
be shown below.
The tap length LSubBand for each band can be obtained from the relationship between the
required echo suppression amount DLf for each band and the impulse response continuation
time TRf (s) for each band in the room.
Here, M is a decimation number.
[0062]
As described above, it is not possible in the full band system by carrying out the case of the
frequency characteristic of the required echo suppression amount based on the auditory
characteristic based on the magnitude of the transmission delay and considering the
reverberation time of each band. It can be seen that the optimal tap length assignment can be
determined.
[0063]
In the present invention, the required echo suppression amount is thus obtained based on the
auditory characteristics, and the allocation of tap lengths for each frequency band is determined.
For this reason, compared to the conventional method in which all frequency bands are
determined uniformly, the auditory echo signal is harder to hear.
On the other hand, based on the fact that the frequency characteristic of the required echo
suppression amount changes according to the size of the transmission delay, the tap assignment
for each band is switched.
Therefore, it is possible to prevent deterioration of the speech quality even under use conditions
15-04-2019
17
with different transmission delays, and it is possible to perform efficient assignment of tap
lengths, which is the object of the present invention.
[0064]
Next, an echo canceller according to an embodiment of the present invention will be described
with reference to FIG.
In FIG. 1, the same parts as those in FIG. 10 will be assigned the same reference numerals and
descriptions thereof will be omitted.
[0065]
In FIG. 1, 51 is a tap length allocation circuit, 52 is a tap length calculation circuit, 53 is a
required echo suppression amount determination circuit, 54 is a transmission delay
determination circuit, and 55 is a reverberation time storage circuit.
[0066]
The tap length for each frequency band of the adaptive filters 44-1 to 44-N is determined by the
tap length assignment circuit 51 and transferred.
The allocation of the tap length can be calculated from the equation (1) from the required echo
suppression amount DL for each band obtained by the required echo suppression amount
determination circuit 53 and the reverberation time TRf for each band stored in the
reverberation time storage circuit 55. It is calculated by the tap length calculation circuit 52
based on.
[0067]
First, the determination of the required echo suppression amount will be described.
[0068]
15-04-2019
18
From the above experimental results, it is necessary to switch the frequency characteristics
depending on the required echo suppression amount depending on the magnitude of the
transmission delay.
Therefore, the transmission delay determination circuit 54 determines whether the transmission
delay is greater than or equal to the threshold value DTth, and the required echo suppression
amount determination circuit 53 determines the required echo suppression amount DL based on
the magnitude of the transmission delay.
[0069]
Here, the transmission delay threshold value DTth is set to DTth = 60 ms because the maximum
value of successive masking is about 60 ms.
When the value of the transmission delay in the line where the echo canceler is used is known,
the average power spectrum of the echo sound for determining the required echo suppression
amount is designated from the beginning as Xf 61 or Y f 63. .
[0070]
In the required echo suppression amount determination circuit 53 in the tap length allocation
circuit 51, the required echo suppression amount DL in equation (4) is calculated by the average
power spectrum Xf, Yf of the echo sound and the audible level Zf.
The required echo suppression amount determination circuit 53 stores the values of the echo
sound power spectrums Xf and Yf and the audible limit Zf.
[0071]
First, when the transmission delay determination circuit 54 determines that the transmission
delay is larger than the threshold value, the required echo suppression amount determination
15-04-2019
19
circuit 53 determines the required echo suppression amount DLf as the average power spectrum
Xf 61 of the voice shown in FIG. And an audible level Zf 62 using equation (2).
Thus, when the transmission delay is large, the required echo suppression amount DLf having a
frequency characteristic as DLlong 71 in FIG. 7 is obtained.
[0072]
The weighting coefficient A in the equation (2) uses the transmission / reception sensitivity as a
reference value.
Therefore, by using linear weighting in dB units according to the transmission / reception
sensitivity of the speech system to be used, it is possible to cope with the usage conditions of
various echo cancelers.
[0073]
Next, when the transmission delay judgment circuit 54 judges that the transmission delay is
smaller than the threshold value, the required echo suppression amount determination circuit 53
masks the required echo suppression amount DLf into the speech voice of FIG. The average
power spectrum Yf 63 of the echo sound and the audible level Zf 62 are obtained using equation
(2).
Thus, when the transmission delay is small, the required echo suppression amount DLf having a
frequency characteristic as DLshort 72 in FIG. 7 is obtained. The above is the procedure for
determining the required echo suppression amount DLf.
[0074]
On the other hand, the reverberation time TRf for each band in the equation (4) is transferred
from the reverberation time storage circuit 55 to the tap length calculation circuit 52. The value
for each frequency band of the reverberation time in the room is assumed to be stored in the
15-04-2019
20
reverberation time storage circuit 55 from the beginning.
[0075]
Using the required echo suppression amount DLf for each band and the reverberation time TRf
for each band, the tap length calculation circuit 52 calculates the tap length for each band
according to the equation (4).
[0076]
An example of the required tap length of each band obtained by such a procedure is shown in
FIG.
Here, for the impulse response duration time of each band in the equation (4), a laboratory
measurement of a room volume of 87 m 3 and a reverberation time of 300 ms was used. Black
represents the allocation of tap lengths when the whiteout is large when the transmission delay
is small.
[0077]
As shown in FIG. 2, when the transmission delay is small, the number of taps may be allocated
more to the midrange where the audio sensitivity is good, and may be reduced as it becomes
lower and higher. Also, when the transmission delay is large, it is necessary to allocate more in
the low band part where the sound pressure is large, and although it increases somewhat in the
mid band part, the allocation may be reduced as it becomes the high band part.
[0078]
Note that FIG. 2 is a diagram in which two numerical values are displayed in a column format, so
the frequency of the pair of two columns of black and white is shown on the x-axis. That is, the
frequency value of the x-axis is stepwise, for example, when x = 4500 (Hz), the number of taps of
y is 2 with yL ≒ 33.3 and yH ≒ 56.3 for small transmission delay and large transmission delay,
respectively. It will be street.
15-04-2019
21
[0079]
The tap length for each band obtained is transferred from the tap length assigning device 51 to
the adaptive filters 44-1 to 44-N for each band as described above.
[0080]
As described above, in the echo canceler, the required echo suppression amount is determined in
consideration of the auditory characteristics on the frequency axis, the necessary and sufficient
tap length is determined for each frequency band, and the efficient allocation is performed. And
according to the required echo suppression amount which changes with the size of the
transmission delay, the allocation of the tap length is switched, and the deterioration of the
speech quality is prevented by the necessary amount of calculation.
[0081]
FIG. 11 is a block diagram showing the configuration of an echo canceller according to another
embodiment of the present invention.
The echo canceler shown in FIG. 11 is the echo canceller shown in FIG. 1 configured of the
required echo suppression amount determination circuit 53 with the required echo suppression
amount calculation circuit 56 and the reverberation component calculation circuit 57, whereby
transmission delay The echo canceler of FIG. 1 is different from the echo canceler of FIG. 1 in
that the frequency characteristic of the required echo suppression amount is switched depending
on the magnitude of the reverberation time even in the case where S is small. The same
components are given the same reference numerals.
[0082]
That is, in FIG. 11, the reverberation component calculation circuit 57 constituting the required
echo suppression amount determination circuit 53 calculates the reverberation component Wf of
the speech from the reverberation time of the echo path supplied from the reverberation time
storage circuit 55, The required echo suppression amount DL is calculated in the required echo
suppression amount calculation circuit 56 using the value and the audible level Zf, and the
calculated required echo suppression amount DL is supplied to the tap length calculation circuit
52.
15-04-2019
22
[0083]
More specifically, in addition to the influence of transmission delay on the required amount of
echo suppression, in order to investigate the influence of reverberation time in the other side
room that is the echo path, a subjective evaluation experiment is conducted, the experimental
results are analyzed, and more efficient Assign a random tap length.
[0084]
The experiment system of this subjective evaluation experiment used the thing similar to the
above-mentioned experimental system used in order to investigate the influence of a
transmission delay.
The evaluation parameters are the reverberation time of each of the above-mentioned 32 divided
frequency bands and the echo path (the other side room).
[0085]
The reverberation time which is an evaluation parameter simulates two kinds of cases: small
(about 110 ms) and large (about 450 ms).
Actual values are used for these values.
And the experimental result which calculated | required the required echo suppression amount
about each when the transmission delay is small (28 ms) and large (300 ms) with respect to each
reverberation time is shown in FIG.
[0086]
In FIG. 12, black squares and triangles have delays when the reverberation time is small (about
110 ms), and white circles and inverted triangles (O,)) have delays when the reverberation time is
large (about 450 ms). The results at 28 ms and 300 ms are shown.
[0087]
15-04-2019
23
From the results of FIG. 12, it can be seen that the reverberation time is generally smaller in
magnitude than the transmission delay.
In particular, when the transmission delay is large, the reverberation time hardly affects the
required echo suppression amount. The reason for this is that around 300 ms of delay time, the
effect of masking by the speech is completely eliminated, and the required echo suppression
amount is determined only by the average power level and the audible level of the speech, and
the reverberation effect becomes less noticeable. Conceivable.
[0088]
On the other hand, even when the transmission delay is small, the required echo suppression
amount in the low band increases as the reverberation time increases. This means that even if the
delay is small, if reverberation is added to the echo returned to the evaluation side, if the
reverberation component is beyond the range of timed masking of the speech, that amount is
detected as an offending echo. It is thought to be due to Also, the frequency characteristic of the
required echo suppression amount at that time can be interpreted as being similar to the
frequency characteristic of the required echo suppression amount when the transmission delay
exceeds the successive masking. The reason why the required echo suppression amount is
reduced as a whole is considered to be that the reverberation component of the echo is smaller
than the direct sound component.
[0089]
Here, the validity of the interpretation for the required echo suppression amount was examined
by a quantitative analysis when the reverberation time is large even if the transmission delay is
small.
[0090]
FIG. 13 is a diagram showing an example of the determination of the required echo suppression
amount when the transmission delay is small and the reverberation time is large.
15-04-2019
24
Wf 64 in the figure indicates the average power spectrum of the reverberation component of the
voice Xf 61 described above. Also, Zf 62 indicates the above-mentioned audible level.
[0091]
The required echo suppression amount DL when the reverberation time is large even if the
transmission delay is small can be determined from the average power spectrum Wf 64 of the
reverberation component of FIG. 13 and the audible level Zf 61 as in the following equation.
[0092]
DL = A. (Wf-Zf) (5) The value of Wf is generally determined from the reverberation
characteristics in the room of the above-mentioned evaluation experiment, using the property
that the reverberation component of speech decays exponentially, and the above-mentioned The
following relationship exists with voice Xf 61:
[0093]
Here, TRf is the reverberation time of the echo path (the partner's side room) for each frequency
band, and Tthf is the influence range (time) of successive masking for each frequency band.
[0094]
Actually, the result of calculation of the required echo suppression amount is shown in DLRT 73
of FIG.
This result is in good agreement with the result when the transmission delay in FIG. 12, which is
the result of the subjective evaluation, is small and the reverberation time is large, the above
interpretation was correct, and the required echo suppression amount is Eq. It shows that it can
be determined by equation (6).
[0095]
From the above results, when the transmission delay is small and the reverberation time is large,
the required echo suppression amount is approximated by the characteristics of the average
power spectrum and the audible level of the reverberation component of the voice with respect
to the frequency axis.
15-04-2019
25
[0096]
From the required echo suppression amount obtained in this manner, the tap length can be
obtained from the above equation (4).
In the present invention, even if the transmission delay is small in this way, the required echo
suppression amount is divided into cases depending on the magnitude of the reverberation time
on the echo path (the other party's room), and the taps for each frequency band Switch the
assignment of the length.
Therefore, it is possible to prevent deterioration of the speech quality even under use conditions
with different reverberation times, and it is possible to perform more efficient assignment of tap
lengths, which is the object of the present invention.
[0097]
As described above, according to the present invention, in the subband echo canceller, the
required echo suppression amount is determined from the auditory characteristics, and the tap
length of the adaptive filter is efficiently allocated from that value. It will be possible to do.
For this reason, in the conventional frequency band, it is possible to reduce the amount of
calculation compared to the method which has been allocated uniformly, and it is also possible to
achieve the economics of the apparatus by eliminating the waste of the mounted tap length. Also,
by making the assignment of the tap length for each frequency band variable according to the
size of the transmission delay, the echo signal can be suppressed regardless of the channel
condition, and the improvement of the speech quality can be realized.
[0098]
Further, according to the present invention, the assignment of the tap length for each frequency
band is made variable according to the magnitude of the reverberation time of the other party, so
that the echo signal is not dependent on the acoustic environment of the speech system using the
echo canceler. It is possible to suppress effectively and to improve the speech quality.
15-04-2019
26
[0099]
Brief description of the drawings
[0100]
1 is a diagram showing the configuration of the echo canceler according to an embodiment of the
present invention.
[0101]
FIG. 2 is a diagram showing an example of tap length assignment which varies depending on
transmission time in the echo canceller shown in FIG. 1;
[0102]
3 is a diagram showing the configuration of a simulation system for determining the required
echo suppression amount.
[0103]
4 is a graph of experimental results showing an example of the influence of the transmission
delay time on the required echo suppression amount.
[0104]
5 is a graph showing an example of the determination of the required echo suppression amount
when the transmission delay is large.
[0105]
6 is a graph showing an example of the determination of the required echo suppression amount
when the transmission delay is small.
[0106]
7 is a graph showing an example of the difference in the required echo suppression amount
according to the size of the transmission delay.
[0107]
15-04-2019
27
8 is a diagram showing the configuration of a speech communication system.
[0108]
9 is a block diagram showing an example of a conventional echo canceler.
[0109]
FIG. 10 is a block diagram showing an example of a conventional sub-band echo canceller.
[0110]
<Figure 11> It is the block diagram which shows the constitution of the echo canceler which
relates to the other execution form of this invention.
[0111]
12 is a graph of experimental results showing an example of the influence of the reverberation
time on the required echo suppression amount.
[0112]
13 is a graph showing an example of the determination of the required echo suppression amount
when the delay time is short and the reverberation time is long.
[0113]
14 is a graph showing an example of the required echo suppression amount when the delay time
is short and the reverberation time is long.
[0114]
Explanation of sign
[0115]
Reference Signs List 3 microphone 4 speaker 41, 42 frequency band division circuit 43
frequency band synthesis circuit 44-1 to 44-N adaptive filter 51 tap length allocation circuit 53
required echo suppression amount determination circuit 54 transmission delay determination
circuit 55 reverberation time storage circuit 56 required echo Suppression amount calculation
circuit 57 Reverberation component calculation circuit
15-04-2019
28
15-04-2019
29
Документ
Категория
Без категории
Просмотров
0
Размер файла
42 Кб
Теги
description, jph10150343
1/--страниц
Пожаловаться на содержимое документа