close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2014075756

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2014075756
Abstract: To provide a delay estimation method for performing highly accurate delay estimation
with a small amount of operation. SOLUTION: A frequency analysis of 2N / M points smaller than
N is performed, where N is a number of samples taken from a pre-delay signal of a time width to
find a cross-correlation, and the pre-delay signal and the post-delay signal Are converted to
signals in the frequency domain and stored in the first and second buffer units, respectively. The
cross spectrum calculation unit multiplies, for each of the same frequency, the signal obtained by
taking the conjugate of the signal stored in the first buffer unit and the signal stored in the
second buffer unit, and the M multiplication results are obtained. Calculate the cross spectrum
added for. The time shift and addition unit converts the whitened cross spectrum into a time
domain, adds it while shifting time, and stores it. The maximum peak position search unit obtains
the maximum peak position of the signal stored by the time shift addition unit as an estimated
delay amount. [Selected figure] Figure 1
Delay estimation method and echo cancellation method using the method, devices and programs
therefor and recording media therefor
[0001]
The present invention relates to a delay estimation method that can be used, for example, in an
echo cancellation apparatus and the like, an echo cancellation method using the method, those
apparatuses and programs, and a recording medium therefor.
[0002]
15-04-2019
1
An echo canceler outputs a reception signal as an acoustic signal from a speaker, estimates an
echo signal component coming around a transmission signal picked up by a microphone by an
adaptive filter, and cancels an echo signal component from the transmission signal It is.
In devices such as digital televisions and smart phones, buffers may be inserted in each of a
reception signal and a transmission signal in order to prevent interruption of sound. There is also
known a buffer or the like used in a lip sync function for adjusting the shift between video and
audio generated by a video codec of digital television.
[0003]
FIG. 20 shows a schematic functional configuration example of an echo canceller 900 in which
buffers are inserted in both a reception signal and a transmission signal. The echo canceller 900
comprises an adaptive filter 901, an adder circuit 902, a received signal buffer 903 and a
transmit signal buffer 904. The reception signal buffer 903 stores the reception signal from the
far-end speaker for several 10 ms to several 100 ms, and outputs the reception signal to the
speaker in the order of storage. The transmission signal buffer 904 temporarily stores the
transmission signal from the near end talker picked up by the microphone, and outputs the
transmission signal to the addition circuit 902 in the order of storage. In FIG. 20, the description
of functional units such as an A / D converter for converting a digital signal into an analog signal
and a D / A converter for converting the analog signal is omitted.
[0004]
The reception signal buffer 903 and the transmission signal buffer 904 are provided, for
example, in order to prevent interruption of voice even when the load on the CPU becomes heavy
in order to execute a plurality of applications in parallel and the echo cancellation processing is
not in time. It is Therefore, a delay occurs in this buffer portion. In general, the adaptive filter of
the echo canceler has a length of about 100 ms, and it is not possible to cancel echo components
of delay times exceeding the length of the adaptive filter.
[0005]
Therefore, conventionally, an echo canceler 950 using delay estimation as shown in FIG. 21 has
15-04-2019
2
been considered. The echo canceler 950 differs from the above-described echo canceler 900 in
that it includes a delay estimation unit 951 and a delay insertion unit 952. The delay estimation
unit 951 estimates a delay time between a pre-delay signal which is a received signal before
being input to the received signal buffer 903 and a delayed signal which is a transmission signal
output from the transmission signal buffer 904. By applying the estimated delay time to the
delay insertion unit 952 before the adaptive filter, the adaptive filter 901 enables cancellation of
the echo signal component even if there is a large delay time.
[0006]
FIG. 22 shows a functional block diagram of a conventional delay estimating device 960
disclosed in Patent Document 1, and the operation thereof will be briefly described. The FFT
units 1111 to 111M convert the collected sound signals from the microphones 111 to 11M into
the frequency domain. The whitening units 1121 to 112M whiten (flatten) the collected sound
signal converted into the frequency domain in the frequency spectrum. Next, the microphone
pair selection unit 113 selects two of the output signals of the whitening units 1121 to 112M.
The multiplication unit 114 takes a conjugate of only one of the signals selected by the
microphone pair selection unit 113, multiplies the two signals for each frequency component,
and obtains a cross spectrum. The output signal of the multiplication unit 114 is converted to the
time domain by the IFFT unit 115 to obtain a whitening cross correlation. Next, the maximum
peak detection unit 116 detects the maximum peak of the cross-correlation of the output of the
IFFT unit 115, and outputs the point of the maximum peak as a delay time difference between
collected signals.
[0007]
When the idea of the delay estimation device 960 is applied to the above-described delay
estimation unit 951, the pre-delay signal is input to, for example, the FFT unit 1111, and the
post-delay signal is input to, for example, the FFT unit 1112. Hereinafter, the operation principle
of the delay estimation device 960 will be further described using formulas.
[0008]
First, let x (t) be the time domain signal of the predelayed signal, y (t) be the time domain signal
of the delayed signal, and h (t) be the impulse response of the delay path that you want to
estimate. To establish.
15-04-2019
3
[0009]
[0010]
Here, * represents a convolution operation.
[0011]
Once the impulse response h (t) of the delay path is determined, the delay can be estimated from
the position of the peak.
In order to obtain h (t), the equation (1) can be obtained for h (t) using the time domain signal x
(t) of the observable predelayed signal and the time domain signal y (t) of the delayed signal. You
just have to figure it out.
However, since equation (1) is a convolution operation, it requires an operation of the order of
the square of the number of signal points to solve.
Therefore, the delay estimation device 960 reduces the amount of calculation by converting the
signal of equation (1) into a signal in the frequency domain and solving it.
[0012]
Assuming that the frequency domain signal of the pre-delay signal is X (ω), the frequency
domain signal of the delayed signal is Y (ω), and the impulse response of the frequency domain
of the delay path to be estimated is H (ω), equation (1) is It is transformed to (2).
[0013]
[0014]
If equation (2) is solved for H (ω), equation (3) is obtained.
15-04-2019
4
[0015]
[0016]
If H (ω) is obtained by equation (3) and H (ω) is subjected to inverse FFT, h (t) can be obtained.
Since the order of the amount of calculation of FFT is N · log 2 N, the amount of calculation is
smaller than solving in the time domain.
[0017]
A specific calculation method will be described.
The above-described N is the number of data points for FFT processing of the pre-delay signal
and the post-delay signal, and is a signal length for obtaining a whitening cross correlation.
The multiplication unit 114 multiplies the conjugate of the frequency domain signal X (ω) of the
pre-delay signal and the frequency domain signal Y (ω) of the delayed signal for each frequency
bin to obtain the numerator of the equation (3). Calculate (cross spectrum).
The cross spectrum is multiplied by the reciprocal of the square (the denominator of the equation
(3)) of the norm of the frequency domain signal X (ω) of the predelayed signal calculated by the
whitening unit 112. If the output signal of the multiplication unit 114 is subjected to N-point
inverse FFT in the IFFT unit 115, a whitening cross correlation is obtained, which becomes an
impulse response h (t) of the delay path. The maximum peak detection unit 116 obtains the
maximum peak position of the impulse response h (t) of the delay path, and outputs it as an
estimated delay amount.
[0018]
As described above, the conventional delay estimation device 960 can reduce the amount of
operation by calculating using N-point FFT and inverse FFT, compared with calculation in the
time domain.
15-04-2019
5
[0019]
JP 2007-81455 A
[0020]
However, in order to estimate the delay accurately, it is necessary to calculate the cross
correlation with a signal length of several hundred ms or more, and even if using FFT, a large
amount of calculation (the specific calculation amount will be described later) is required. It
becomes.
[0021]
The present invention has been made in view of this problem, and provides a delay estimation
method for realizing accurate delay estimation with a small amount of operation, an echo
cancellation method using the method, and devices, programs, and recording media therefor.
Intended to be provided.
[0022]
The delay estimation method of the present invention comprises a first frequency analysis
process, a second frequency analysis process, a first buffer process, a second buffer process, a
cross spectrum calculation process, and a whitening coefficient calculation process , Whitening
process, time domain conversion process, time shift addition process, and maximum peak
position search process.
In the first frequency analysis process, when the number of samples taken from the predelayed
signal of the time width for which whitening cross correlation is determined is N, the frequency
analysis of the 2N / M points whose number is less than the N points The process of converting
to the frequency domain is repeated in units of M times.
The second frequency analysis process adds K frame, which is a range for estimating the delay,
to the process of converting the delayed signal obtained by delaying the signal before delay into
the frequency domain by frequency analysis of 2N / M points. Repeat M + K-1 times.
15-04-2019
6
In the first buffer process, the M pre-delay signals converted into signals in the frequency domain
in the first frequency analysis process are stored.
The second buffering process stores the M + K-1 pieces of delayed signals converted into signals
in the frequency domain in the second frequency analysis process. In the cross spectrum
calculation process, the signal obtained by taking the conjugate of the predelayed signal stored in
the first buffer process and the delayed signal stored in the second buffer stage are multiplied for
each same frequency, and the multiplication result is obtained. Are added for the frame to
calculate the cross spectrum up to the K element. The whitening coefficient calculation process
calculates the inverse square of the norm of the predelayed signal stored in the first buffer
process as the whitening coefficient. The whitening process multiplies the cross spectrum by the
whitening factor. The time domain conversion process converts the whitened cross spectrum in
the whitening process into the time domain. In the time shift addition process, the output of the
time domain conversion process is time-shifted and added from the first element to the K
element and stored. In the maximum peak position search process, the maximum peak position
of the signal stored in the time shift addition process is obtained as an estimated delay amount.
[0023]
Further, the echo cancellation method of the present invention includes a delay insertion process,
a frequency domain adaptive filter process, and an addition process in addition to the delay
estimation method shown above. In the delay insertion process, the reception signal converted
into the frequency domain signal in the first frequency analysis process described above is
delayed in time by an estimated delay amount estimated by the delay estimation method. The
frequency domain adaptive filter process generates a pseudo echo signal with a reception signal
into which a delay is inserted in the delay insertion process as an input. The summing process
removes spurious echo signals from the delayed signal in the frequency domain.
[0024]
According to the delay estimation method of the present invention, frequency analysis of 2N / M
points whose number is smaller than the number N of samples obtained from the time length for
determining whitening cross correlation and calculation of time domain conversion, that is,
calculation less than the prior art The amount of estimated delay can be determined by the
amount.
[0025]
15-04-2019
7
Also, according to the echo cancellation method of the present invention, it is possible to cancel
echo components even with a long delay by giving the delay time estimated by the delay
estimation method of the present invention before the adaptive filter. .
In addition, by sharing the frequency analysis process, the amount of computation can be
significantly reduced.
[0026]
The figure which shows the function structural example of the delay estimation apparatus 100 of
this invention. FIG. 6 is a diagram showing an operation flow of the delay estimation device 100.
The figure explaining the relationship between the sample score N and M. FIG. The figure which
shows the more concrete operation | movement flow of the delay estimation apparatus 100. FIG.
FIG. 6 is a diagram showing an example of a functional configuration of a whitening coefficient
calculation unit 40. The figure which shows calculation of the 1st element of cross spectrum X ''
<H> ((omega)) Y '' ((omega)). The figure which shows calculation of the 2nd element of cross
spectrum X '' <H> ((omega)) Y '' ((omega)). FIG. 8 is a diagram showing an operation of the time
shift and addition unit 70. FIG. 7 is a diagram showing an example of a functional configuration
of a whitening coefficient calculation unit 240. The figure which shows the operation |
movement flow of the whitening coefficient calculation part 240. FIG. FIG. 2 shows an exemplary
functional configuration of a delay estimation apparatus 300. FIG. 7 shows an example of a
functional configuration of an estimated interval setting unit 301; The figure which shows the
operation | movement flow of the presumed area setting part 301. FIG. The figure which
represented typically the level of the collection signal picked up with a microphone. The figure
which shows the example of the estimation result of the conventional delay estimation apparatus
(4096 FFT points). The figure which shows the example of the estimation result of the
conventional delay estimation apparatus (1024 FFT points). The figure which shows the example
of the estimation result of the delay estimation apparatus of this invention. The figure which
shows the comparison of the correct answer rate of delay estimation. FIG. 2 is a diagram showing
an example of a functional configuration of an echo cancellation apparatus 400 according to the
present invention. The figure which shows the function structure of the conventional echo
cancellation apparatus 900. FIG. The figure which shows the function structure of the
conventional echo cancellation apparatus 950. FIG. The figure which shows the function
structure of the conventional delay estimation apparatus 960. FIG.
15-04-2019
8
[0027]
Hereinafter, embodiments of the present invention will be described with reference to the
drawings. The same reference numerals are given to the same components in the drawings, and
the description will not be repeated.
[0028]
FIG. 1 shows an example of the functional configuration of the delay estimation apparatus 100 of
the present invention. The operation flow is shown in FIG. The delay estimation apparatus 100
includes a first frequency analysis unit 10, a second frequency analysis unit 12, a first buffer unit
20, a second buffer unit 22, a cross spectrum calculation unit 30, and a whitening coefficient. A
calculation unit 40, a whitening unit 50, a time domain conversion unit 60, a time shift addition
unit 70, a maximum peak position search unit 80, and a control unit 90 are provided. The delay
estimation apparatus 100 is realized by, for example, a predetermined program being read into a
computer including a ROM, a RAM, a CPU and the like, and the CPU executing the program.
[0029]
The first frequency analysis unit 10 converts the signal before delay into the frequency domain
by frequency analysis of 2N / M points smaller than the above N points, where N is the number
of sample points of the time width for determining the whitening cross correlation Are repeated
M times (step S10). The first buffer unit 20 stores M pre-delay signals converted to the frequency
domain by the first frequency analysis unit 10 (step S20).
[0030]
The second frequency analysis unit 12 converts the delayed signal obtained by delaying the predelayed signal into a frequency domain by frequency analysis of 2N / M points, and a K frame
which is a range for estimating the delay in the M frame. It repeats by the unit of M + K-1 times
added (step S12). The second buffer unit 22 stores M + K−1 pieces of delayed signals that have
been converted to the frequency domain by the second frequency analysis unit 12 (step S22).
15-04-2019
9
[0031]
The cross spectrum calculation unit 30 multiplies, for each identical frequency, the pre-delay
signal obtained by taking the conjugate of the signal stored in the first buffer unit 20 and the
delayed signal stored in the second buffer unit 22; A cross spectrum is calculated by adding the
multiplication results to K elements (step S30). The whitening coefficient calculation unit 40
calculates, as a whitening coefficient, the reciprocal of the square of the norm of the pre-delay
signal stored in the first buffer unit 20 (step S40).
[0032]
The whitening unit 50 multiplies the cross spectrum by the whitening coefficient (step S50). The
time domain conversion unit 60 converts the cross spectrum whitened by the whitening unit 50
into the time domain (step S60). The time shift and addition unit 70 adds and stores the first to K
elements while temporally shifting the output signal of the time domain conversion unit 60 (step
S70). The maximum peak position searching unit 80 obtains the maximum peak position of the
signal stored by the time shift adding unit 70 as an estimated delay amount (step S80). The
control unit 90 controls the operation of each unit so as to repeat the processing of steps S10 to
S80 until the delay estimation operation is finished or the pre-delay signal disappears.
[0033]
Here, the sample points N and M to be subjected to the frequency analysis processing in the first
frequency analysis unit 10 and the second frequency analysis unit 12 will be described with
reference to FIG. As a frequency analysis method, any known method such as FFT (Fast Fourier
Transform), DFT (Discrete Fourier Transformation), wavelet transform, etc. can be used. In the
following description, an example using FFT will be described.
[0034]
It is assumed that the whitening cross-correlation signal time width is, for example, 512 ms, and
the sampling frequency at which the continuous signal is discrete is 8 kHz. The number of
samples obtained with a time width of 512 ms is 0.512 × 8 k = 4096. In the delay estimation
method of the prior art, the number of samples N = 4096 FFT processing was performed under
15-04-2019
10
that assumption.
[0035]
In contrast to the prior art, the present invention estimates the delay amount in units of (N / M)
frames in which N is divided by M. That is, it is different from the prior art in that FFT processing
with a shift amount N / M is repeated M times (M = 64 in this example) with 2N / M samples.
[0036]
As described above, the delay estimation apparatus 100 converts the pre-delayed signal and the
delayed signal into signals in the frequency domain by computing FFT of (2N / M) points shorter
than the signal length N M times. Here, M is an integer, and N and M are set such that (2N / M)
has the same number of sample points as the FFT used in the echo canceller incorporating the
delay estimation apparatus 100. The pre-delay signal converted to the frequency domain in the
m-th time (m is an arbitrary integer smaller than M) is represented as X 'm (ω), and the delayed
signal converted to the frequency domain in the m-th time Y' m (ω If the impulse response of the
delay path is divided into (N / M) and converted into the frequency domain is expressed as H'm
(ω), the following relationship is established.
[0037]
[0038]
Here, K is an integer that determines a frame range for estimating the delay, and (KN / M) is the
number of samples of the estimated range of the delay.
If each matrix of Formula (4) is expressed by Formula (5)-Formula (7),
[0039]
15-04-2019
11
[0040]
Equation (4) can be expressed by equation (8).
[0041]
[0042]
If equation (8) is solved for H ′ ′ (ω), equation (9) is obtained.
[0043]
[0044]
<H> represents conjugate transposition of a matrix.
[0045]
Pre-delayed signals X ′ 1 (ω),..., X ′ M (ω) converted to M frequency regions and delayed
signals Y ′ 1 (ω), converted to M + K−1 frequency regions H ′ ′ (ω) can be obtained from Y
′ M (ω) using equation (9).
However, for the calculation of {X ′ ′ <H> (ω) X ′ ′ (ω)} <− 1>, it is necessary to calculate
an inverse matrix of M rows and M columns, and the amount of calculation is large.
Therefore, calculation is omitted using approximation.
If each element of X ′ ′ <H> (ω) X ′ ′ ′ (ω) is calculated, Expression (10) is obtained.
[0046]
[0047]
15-04-2019
12
Here, all the diagonal components of the equation (10) have the same value X '<*> 1 (ω) X'1 (ω)
+... + X' <*> M (ω) X'M (ω).
This value is equal to the square of the norm of the vector {X'1 (ω) +... + X'M (ω)}.
On the other hand, for components other than diagonal, it is multiplication of signals not at the
same time.
[0048]
Assuming that the autocorrelation N / M samples or more of the pre-delay signal x (t) is
sufficiently smaller than the signal power, values other than the diagonal components in equation
(10) are sufficiently smaller than the diagonal components. Thus, as shown in equation (11),
approximation can be made only with diagonal components.
Generally, in speech, the autocorrelation becomes smaller in a few tens of ms, so approximation
is possible.
[0049]
[0050]
Using equation (11), the inverse matrix of equation (9) can be represented by a product of
scalars and approximated by equation (12).
[0051]
[0052]
By using the equation (12), the response H ′ ′ (ω) of the delay path can be obtained by largely
reducing the amount of calculation.
15-04-2019
13
In order to return the response H ′ ′ (ω) of the delay path to the time domain, perform inverse
FFT of (2N / M) points on K elements of H ′ ′ (ω) respectively (N / M) It is sufficient to shift
and add each point.
[0053]
A concrete calculation method of the response H ′ ′ (ω) of the delay path of the equation (12)
and a method of converting the response H ′ ′ (ω) of the delay path into the time domain are
shown in FIG. It will be specifically explained while making it correspond.
The first frequency analysis unit 10, which converts the pre-delay signal into a signal in the
frequency domain, converts the pre-delay signal into the frequency domain with an FFT of 2N /
M points (step S10, FIG. 4).
The shift amount of the signal is N / M samples, and M times of FFT are performed.
The first buffer unit 20 stores M times of the pre-delay signal converted to the frequency domain
(step S20). By these processes, the pre-delay signals X ′ 1 (ω),..., X ′ M (ω) converted to the
frequency domain are stored in the first buffer unit 20.
[0054]
The second frequency analysis unit 12, which converts the delayed signal into the frequency
domain, converts the delayed signal into a frequency domain signal by FFT at 2N / M points (step
S12). The shift amount of the signal is N / M samples, and M + K-1 FFTs are performed. The
second buffer unit 22 stores M + K-1 times of the delayed signal converted to the frequency
domain (step S22). By these processes, the delayed signals Y ′ 1 (ω),..., Y ′ M + K−1 (ω)
converted to the frequency domain are stored in the second buffer unit 22.
[0055]
15-04-2019
14
The whitening coefficient calculation unit 40 calculates a whitening coefficient from the predelay signals X′1 (ω),..., X′M (ω) stored in the first buffer unit 20 (step S40). A more specific
functional configuration example of the whitening coefficient calculation unit 40 is shown in FIG.
The whitening coefficient calculation unit 40 includes a norm calculation unit 41 and an inverse
number calculation unit 42. The norm calculation means 41 calculates the square of the norm of
the signal X'1 (.omega.),..., X'M (.omega.) Before delaying X '<*> 1 (.omega.) X'1 (.omega.) +... + X'
<* > M (ω) X ′ M (ω) is calculated. The reciprocal calculation means 42 calculates the
reciprocal of the square of the norm and outputs it as a whitening coefficient.
[0056]
The cross spectrum calculation unit 30 is configured to transmit the pre-delay signals X ′ 1
(ω),..., X ′ M (ω) stored in the first buffer unit 20 and the delay stored in the second buffer unit
22. The cross spectrum X ′ ′ <H> (ω) Y ′ ′ (ω) is calculated from the post signals Y′1
(ω),..., Y′M + K−1 (ω).
[0057]
The method of calculating the cross spectrum will be described with reference to FIGS. 6 and 7.
FIG. 6 shows the calculation of the first element of the cross spectrum X ′ ′ <H> (ω) Y ′ ′
(ω). The first element is the case where the read positions of the first buffer unit 20 and the
second buffer unit 22 are set at the beginning (step S31). The signals of the same frame number
in the first buffer unit 20 and the second buffer unit 22 are multiplied for each frequency bin
(step S32), and the frame numbers are added (step S33). The data in the first buffer unit 20 is
multiplied after taking the conjugate <*>. The frame number is a number indicating the number
of the FFT result. FIG. 7 shows the calculation of the second element of the cross spectrum X ′
′ <H> (ω) Y ′ ′ (ω). In the second element, the readout position of the second buffer unit 22
is shifted by one frame without changing the readout position of the first buffer unit 20,
multiplication is performed for each frequency bin, and addition is performed for frame numbers
It is. The process of shifting the reading position of the second buffer unit 22 by one frame is
repeated until the frame is shifted by M + K-1.
[0058]
The whitening unit 50 generates a whitening coefficient 1 / {X ′ <*> 1 (ω) X′1 (ω) for the
cross spectrum X ′ ′ <H> (ω) Y ′ ′ (ω) output from the cross spectrum calculation unit 30.
15-04-2019
15
+... + X '<*> M (.omega.) X'M (.omega.)} To calculate the response H "(.omega.) Of the delay path
(step S50).
[0059]
The time domain conversion unit 60 converts each element of the response H ′ ′ (ω) of the
delay path into a time domain signal by inverse FFT of 2N / M points (step S60).
The time shift and addition unit 70 adds and saves the output signal of the time domain
conversion unit 60 while time shifting (step S70). FIG. 8 shows the operation of the time shift
adder 70. The time shift and addition unit 70 shifts the first element to the K th element of the
cross spectrum output from the time domain conversion unit 60 by N / M samples and adds
them, and adds whitening cross correlation h (t) in the time domain (estimated) (Impulse
response of the delay path)
[0060]
The processing is repeated while advancing the reading position of the second buffer unit 22 by
one frame (step S34) until the processing of steps S32 to S70 for the K frame which is the range
for estimating the delay is completed (NO in step S90) ). The calculation of the cross spectrum X
′ ′ <H> (ω) Y ′ ′ (ω) is repeated until the M + K−1th element of the frame (YES in step
S90).
[0061]
The maximum peak position search unit 80 searches for the maximum peak position of the
whitening cross correlation h (t) in the time domain, and outputs the position as an estimated
delay amount (step S80).
[0062]
As described above, in the delay estimation method of the present invention, M + K FFT
processing of 2N / M sample points in the first frequency analysis unit 10 and M + K− of 2N / M
sample points in the second frequency analysis unit 12 are performed. The estimated delay
15-04-2019
16
amount can be obtained by the product-sum operation of one FFT process and K inverse FFTs of
2N / M sample points in the time domain transform unit 60.
For example, when N = 4096, M = 128, K = 16, the number of product-sum operations calculated
in each part is 49152 times in the first frequency analysis unit 10 and 54912 times in the
second frequency analysis unit 12 When the time domain conversion unit 60 adds 6144 times,
the sum becomes 110208 times of product-sum operations.
[0063]
On the other hand, in the delay estimation method of the prior art, since two N-point FFTs and
one N-point inverse FFT are required, the number of product-sum operations is 147456 (3N log
2 N). Thus, the delay estimation method of the present invention can reduce the number of
product-sum operations by about 25%.
[0064]
FIG. 9 shows a functional configuration of the whitening coefficient calculation unit 240 of the
delay estimation apparatus 200 according to the second embodiment of the present invention.
The delay estimation apparatus 200 is different from the delay estimation apparatus 100 only in
the configuration of the whitening coefficient calculation unit 240, and has improved tolerance
to noise.
[0065]
Among the frequency components of the pre-delay signal, components with small power are
susceptible to noise, which causes a reduction in estimation accuracy of delay amount estimation.
The whitening coefficient calculation unit 240 improves the resistance to noise by setting the
whitening coefficient to such a small frequency component with a small power.
[0066]
15-04-2019
17
The whitening coefficient calculation unit 240 includes a norm calculation unit 41, a full band
power calculation unit 241, a coefficient multiplication unit 242, an addition unit 243, and an
inverse number calculation unit 42. FIG. 10 shows the operation steps of each means. The norm
calculation means 41 calculates the square of the norm X '<*> 1 (ω) of the predelayed signals
X′1 (ω),..., X′M (ω) converted to the frequency domain as in the first embodiment. X'1
(.omega.) +... + X '<*> M (.omega.) X'M (.omega.) Is calculated (step S41).
[0067]
The full band power calculation means 241 calculates the full band average power obtained by
averaging the squares of the norm for the frequencies (step S241). The coefficient multiplying
unit 242 multiplies the output of the full band power calculating unit 241 by a preset constant α
(step S 242). When the constant α is reduced, whitening is strictly performed, and when the
constant α is increased, the resistance to noise can be improved. The constant α needs to be set
to an appropriate value that is balanced between the effect of whitening and noise immunity.
[0068]
The addition means 243 adds the output of the norm calculation means 41 and the output of the
coefficient multiplication unit (step S243). The reciprocal calculation means 42 calculates the
reciprocal of the output of the addition means 243 and outputs it as a whitening coefficient (step
S42).
[0069]
The whitening coefficient β (ω) calculated by the whitening coefficient calculation unit 240 can
be expressed by equation (13).
[0070]
[0071]
Here, W is the maximum value of the frequency to be used.
[0072]
15-04-2019
18
In equation (13), by adding the average power of the full band to the denominator, the
denominator does not become an extremely small value even at a frequency of small power.
As a result, it is possible to prevent the whitening coefficient from becoming very large for
frequency components with low power, and to improve noise immunity.
[0073]
FIG. 11 shows an example of the functional configuration of the delay estimation apparatus 300
according to the third embodiment of the present invention.
The delay estimation device 300 is obtained by adding an estimation interval setting unit 301 to
the configuration of the delay estimation devices 100 and 200.
[0074]
Estimation interval setting section 301 receives a pre-delay signal and sets a signal interval
suitable for delay amount estimation.
By estimating the delay amount in a signal section suitable for delay amount estimation,
estimation errors can be reduced and estimation accuracy can be increased.
[0075]
First, it will be described which signal section is a section suitable for estimation of the delay
amount. In the estimation of the delay amount, factors that reduce the estimation accuracy
include ambient noise collected by the microphone and reverberation in the room. In order to
reduce these influences, it is sufficient to select a section in which the output sound volume from
the speaker is large and reverberant sound is not picked up by the microphone as much as
possible.
15-04-2019
19
[0076]
FIG. 12 schematically shows the power of the sound collection signal collected by the
microphone. The horizontal axis in FIG. 12 is elapsed time, and the vertical axis is power. The
power of ambient noise is picked up by the microphone at a nearly constant level (black paint). In
order to find a place where the influence of ambient noise is small, it is sufficient to detect a large
part of the amplitude. Therefore, it is possible to obtain a section that is less susceptible to
ambient noise by estimating a noise level and extracting a section that exceeds a threshold value
multiplied by a coefficient.
[0077]
The reverberant sound (fine broken line) is sent slightly from the direct sound (thin solid line)
coming out of the speaker and directly reaching the microphone. Therefore, a section with few
reverberation components exists at the rising portion of the voice. The section is a section whose
level is larger than the level of the reverberation sound by estimating the level of the
reverberation sound from the amplitude of the pre-delay signal.
[0078]
A section with few reverberation components and less susceptible to ambient noise, ie, a section
with high delay estimation accuracy, is a section in which the power of the collected signal is
larger than the noise level and larger than the power of the reverberation signal. The estimated
segment setting unit 301 sets the segment.
[0079]
FIG. 13 shows an example of the functional configuration of the estimated segment setting unit
301. The operation flow is shown in FIG. The estimation interval setting unit 301 includes a noise
level estimation unit 3010, a reverberation level estimation unit 3011, a threshold setting unit
3012, and an interval detection unit 3013.
15-04-2019
20
[0080]
The noise level estimation means 3010 dip-holds the amplitude of the pre-delay signal to
estimate the noise level with the pre-delay signal as an input (step S3010). The noise level
estimation means 3010 estimates the noise level N (t) by dip-holding the level P (t) of the predelay signal. This is done, for example, using equation (14).
[0081]
[0082]
If the past estimated noise level N (t-1) is larger than the level P (t) of the pre-delay signal, the
level of the pre-delay signal is substituted for the estimated noise level.
If the past estimated noise level N (t-1) is smaller than the level P (t) of the pre-delay signal, the
past estimated noise level N (t-1) is multiplied by a constant u of 1 or more, Raise the noise level.
Here, u is a constant of 1 or more and is set in advance. u is a rising coefficient of the estimated
noise level, and the closer it is to 1, the gentle rising, and the dip hold effect can be obtained.
[0083]
The reverberation level estimation unit 3011 receives the predelayed signal, performs peak hold
of the amplitude of the predelayed signal, and estimates the reverberation sound level R (t) (step
S3011). The estimation of the reverberation sound level R (t) is performed by peak-holding the
level P (t) of the pre-delay signal (equation (15)).
[0084]
[0085]
15-04-2019
21
If the past estimated reverberation sound level R (t-1) is larger than the level P (t) of the predelay signal, the reverberation sound is obtained by multiplying the estimated reverberation
sound level R (t-1) by a constant v less than 1 Decrease the level.
When the past estimated reverberation sound level R (t-1) is smaller than the level P (t) of the
pre-delay signal, the level of the pre-delay signal is substituted for the reverberation sound level.
Here, v is an attenuation coefficient of the estimated reverberation level, and a value
corresponding to the reverberation time is set.
[0086]
The threshold setting means 3012 compares the value of the estimated noise level N (t)
multiplied by a constant of 1 or more with the value of the estimated reverberation sound level R
(t) multiplied by a constant of less than 1 Are set (step S3012). The section detection unit 3013
compares the threshold with the amplitude of the predelay signal, and detects a section in which
the amplitude of the predelay signal is larger than the threshold as a section for delay estimation
(step S3013).
[0087]
The delay estimation devices 100 and 200 perform delay estimation on the estimated segment
output from the estimated segment setting unit 301. As described above, since the delay
estimation apparatus 300 performs delay estimation from the pre-delay signal suitable for delay
estimation, the delay estimation apparatus 300 can perform delay estimation with high accuracy
without being affected by ambient noise and reverberation. [Evaluation Experiment] An
evaluation experiment was conducted for the purpose of confirming the effect of the delay
estimation method of the present invention. FIG. 15 shows the result of delay estimation with
4096 FFT points in the conventional delay estimation apparatus. The horizontal axis is time (ms),
and the vertical axis is delay estimation result (ms). The actual delay at this time is 65 ms, and
many delay estimation results (+) are concentrated around 65 ms, but there are also many
estimation errors.
[0088]
15-04-2019
22
FIG. 16 shows the result of delay estimation by the conventional delay estimation method with
the number of FFT points being 1024. It can be seen that the delay estimation results (+) are
uniformly distributed and estimation is not possible at all. The cause of this is that the signal
length used for correlation calculation is too short for the reverberation time. As described above,
in the conventional delay estimation method, in order to perform accurate estimation, a longpoint FFT is required, and many operations are required.
[0089]
FIG. 17 shows the result of delay estimation by calculating correlation for 35 frames at 128
points of FFT by the delay estimation method described in the third embodiment of the present
invention. All delay estimation results are around 65 ms which is the actual delay, and it can be
seen that the delay estimation can be made correctly.
[0090]
FIG. 18 shows the result of comparing the accuracy rate of delay estimation with the prior art.
The vertical axis represents the accuracy rate, and the horizontal axis represents the delay
estimation method. The delay estimation method is, from the left, the conventional method of
4096 points of FFT, the conventional method of 1024 points of FFT, 35 frames for 128 points of
FFT points in common, and the first embodiment, the second embodiment and the third
embodiment are in order Lined up.
[0091]
Also in the delay estimation method described in the first embodiment of the present invention,
the accuracy rate of 69% is obtained which exceeds the accuracy rate of 65% of the 4096 points
of FFT in the prior art. In addition, 75% was obtained in the delay estimation method of the
second embodiment, and about 100% in the third embodiment. As described above, according to
the delay estimation method of the present invention, accurate delay estimation can be
performed even if the number of data for frequency analysis is small.
[0092]
15-04-2019
23
[Application Example] The delay estimation devices 100, 200, and 300 of the present invention
can be used for an echo cancellation device. FIG. 19 shows an example of the functional
configuration of an echo cancellation apparatus 400 using the delay estimation apparatus of the
present invention. The echo cancellation apparatus 400 includes a delay estimation apparatus
100, a delay insertion unit 952, a frequency domain adaptive filter 401, and an addition circuit
902. The delay estimation apparatus 100 may be the delay estimation apparatus 200 or 300
described above.
[0093]
Delay estimation apparatus 100 receives a pre-delay signal in the time domain, which is a
reception signal, and a delayed signal in the time domain, which is a transmission signal, as an
input, and obtains an estimated delay amount between the two signals. The delay insertion unit
952 is converted into a signal in the frequency domain by the frequency analysis of 2N / M
points smaller than N in the first frequency analysis unit 10 inside the delay estimation device
100, where N is the number of sample points. The pre-delay signal is delayed in time by the
estimated amount of delay estimated by the delay estimating device.
[0094]
The frequency domain adaptive filter unit 401 receives the pre-delay signal whose delay has
been inserted by the delay insertion unit 952 and generates a pseudo echo signal. The addition
unit 902 removes the pseudo echo signal from the delayed signal that has been converted into
the frequency domain signal by the frequency analysis of 2N / M points in the second frequency
analysis unit 12 inside the delay estimation apparatus 100. The delayed signal in the frequency
domain from which the pseudo echo signal has been removed is converted into a transmission
signal in the time domain by performing 2N / M points inverse FFT K times in the 2N / M point
IFFT unit 402.
[0095]
The echo cancellation apparatus 400 is characterized in that the frequency analysis unit for the
echo canceller and the frequency analysis unit for delay estimation are commonly used. With this
configuration, in the echo cancellation apparatus using the conventional delay estimation
15-04-2019
24
apparatus, two FFTs at N points and one inverse FFT at N points are required. The product-sum
operation of this portion is 147456 times.
[0096]
On the other hand, in the echo canceller 400 of the present invention, the amount of calculation
required for signal conversion is only to perform the inverse FFT of 2N / M points K times. Here,
under the condition of N = 4096, M = 64, and K = 16, the amount of calculation of FFT to be
added in the echo canceller 400 of the present invention is 14336 times. As described above, the
echo canceller 400 of the present invention can reduce the amount of calculation of signal
conversion processing to the frequency domain to about 1/10.
[0097]
When the processing means in the above apparatus is realized by a computer, the processing
content of the function that each apparatus should have is described by a program. Then, by
executing this program on a computer, the processing means in each device is realized on the
computer.
[0098]
The program describing the processing content can be recorded in a computer readable
recording medium. As the computer readable recording medium, any medium such as a magnetic
recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory,
etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk drive, a
flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R
(Recordable) / RW (Rewritable), etc., magneto-optical recording medium, MO (Magneto Optical
disc), etc., semiconductor memory EEP-ROM (Electronically Erasable and Programmable Only
Read Memory) etc. It can be used.
[0099]
15-04-2019
25
Further, the distribution of this program is carried out, for example, by selling, transferring,
lending, etc. a portable recording medium such as a DVD, a CD-ROM, etc. in which the program is
recorded. Furthermore, the program may be stored in a storage device of a server computer, and
the program may be distributed by transferring the program from the server computer to
another computer via a network.
[0100]
Further, each means may be configured by executing a predetermined program on a computer,
or at least a part of the processing content may be realized as hardware.
15-04-2019
26
Документ
Категория
Без категории
Просмотров
0
Размер файла
39 Кб
Теги
description, jp2014075756
1/--страниц
Пожаловаться на содержимое документа