Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2014075756 Abstract: To provide a delay estimation method for performing highly accurate delay estimation with a small amount of operation. SOLUTION: A frequency analysis of 2N / M points smaller than N is performed, where N is a number of samples taken from a pre-delay signal of a time width to find a cross-correlation, and the pre-delay signal and the post-delay signal Are converted to signals in the frequency domain and stored in the first and second buffer units, respectively. The cross spectrum calculation unit multiplies, for each of the same frequency, the signal obtained by taking the conjugate of the signal stored in the first buffer unit and the signal stored in the second buffer unit, and the M multiplication results are obtained. Calculate the cross spectrum added for. The time shift and addition unit converts the whitened cross spectrum into a time domain, adds it while shifting time, and stores it. The maximum peak position search unit obtains the maximum peak position of the signal stored by the time shift addition unit as an estimated delay amount. [Selected figure] Figure 1 Delay estimation method and echo cancellation method using the method, devices and programs therefor and recording media therefor [0001] The present invention relates to a delay estimation method that can be used, for example, in an echo cancellation apparatus and the like, an echo cancellation method using the method, those apparatuses and programs, and a recording medium therefor. [0002] 15-04-2019 1 An echo canceler outputs a reception signal as an acoustic signal from a speaker, estimates an echo signal component coming around a transmission signal picked up by a microphone by an adaptive filter, and cancels an echo signal component from the transmission signal It is. In devices such as digital televisions and smart phones, buffers may be inserted in each of a reception signal and a transmission signal in order to prevent interruption of sound. There is also known a buffer or the like used in a lip sync function for adjusting the shift between video and audio generated by a video codec of digital television. [0003] FIG. 20 shows a schematic functional configuration example of an echo canceller 900 in which buffers are inserted in both a reception signal and a transmission signal. The echo canceller 900 comprises an adaptive filter 901, an adder circuit 902, a received signal buffer 903 and a transmit signal buffer 904. The reception signal buffer 903 stores the reception signal from the far-end speaker for several 10 ms to several 100 ms, and outputs the reception signal to the speaker in the order of storage. The transmission signal buffer 904 temporarily stores the transmission signal from the near end talker picked up by the microphone, and outputs the transmission signal to the addition circuit 902 in the order of storage. In FIG. 20, the description of functional units such as an A / D converter for converting a digital signal into an analog signal and a D / A converter for converting the analog signal is omitted. [0004] The reception signal buffer 903 and the transmission signal buffer 904 are provided, for example, in order to prevent interruption of voice even when the load on the CPU becomes heavy in order to execute a plurality of applications in parallel and the echo cancellation processing is not in time. It is Therefore, a delay occurs in this buffer portion. In general, the adaptive filter of the echo canceler has a length of about 100 ms, and it is not possible to cancel echo components of delay times exceeding the length of the adaptive filter. [0005] Therefore, conventionally, an echo canceler 950 using delay estimation as shown in FIG. 21 has 15-04-2019 2 been considered. The echo canceler 950 differs from the above-described echo canceler 900 in that it includes a delay estimation unit 951 and a delay insertion unit 952. The delay estimation unit 951 estimates a delay time between a pre-delay signal which is a received signal before being input to the received signal buffer 903 and a delayed signal which is a transmission signal output from the transmission signal buffer 904. By applying the estimated delay time to the delay insertion unit 952 before the adaptive filter, the adaptive filter 901 enables cancellation of the echo signal component even if there is a large delay time. [0006] FIG. 22 shows a functional block diagram of a conventional delay estimating device 960 disclosed in Patent Document 1, and the operation thereof will be briefly described. The FFT units 1111 to 111M convert the collected sound signals from the microphones 111 to 11M into the frequency domain. The whitening units 1121 to 112M whiten (flatten) the collected sound signal converted into the frequency domain in the frequency spectrum. Next, the microphone pair selection unit 113 selects two of the output signals of the whitening units 1121 to 112M. The multiplication unit 114 takes a conjugate of only one of the signals selected by the microphone pair selection unit 113, multiplies the two signals for each frequency component, and obtains a cross spectrum. The output signal of the multiplication unit 114 is converted to the time domain by the IFFT unit 115 to obtain a whitening cross correlation. Next, the maximum peak detection unit 116 detects the maximum peak of the cross-correlation of the output of the IFFT unit 115, and outputs the point of the maximum peak as a delay time difference between collected signals. [0007] When the idea of the delay estimation device 960 is applied to the above-described delay estimation unit 951, the pre-delay signal is input to, for example, the FFT unit 1111, and the post-delay signal is input to, for example, the FFT unit 1112. Hereinafter, the operation principle of the delay estimation device 960 will be further described using formulas. [0008] First, let x (t) be the time domain signal of the predelayed signal, y (t) be the time domain signal of the delayed signal, and h (t) be the impulse response of the delay path that you want to estimate. To establish. 15-04-2019 3 [0009] [0010] Here, * represents a convolution operation. [0011] Once the impulse response h (t) of the delay path is determined, the delay can be estimated from the position of the peak. In order to obtain h (t), the equation (1) can be obtained for h (t) using the time domain signal x (t) of the observable predelayed signal and the time domain signal y (t) of the delayed signal. You just have to figure it out. However, since equation (1) is a convolution operation, it requires an operation of the order of the square of the number of signal points to solve. Therefore, the delay estimation device 960 reduces the amount of calculation by converting the signal of equation (1) into a signal in the frequency domain and solving it. [0012] Assuming that the frequency domain signal of the pre-delay signal is X (ω), the frequency domain signal of the delayed signal is Y (ω), and the impulse response of the frequency domain of the delay path to be estimated is H (ω), equation (1) is It is transformed to (2). [0013] [0014] If equation (2) is solved for H (ω), equation (3) is obtained. 15-04-2019 4 [0015] [0016] If H (ω) is obtained by equation (3) and H (ω) is subjected to inverse FFT, h (t) can be obtained. Since the order of the amount of calculation of FFT is N · log 2 N, the amount of calculation is smaller than solving in the time domain. [0017] A specific calculation method will be described. The above-described N is the number of data points for FFT processing of the pre-delay signal and the post-delay signal, and is a signal length for obtaining a whitening cross correlation. The multiplication unit 114 multiplies the conjugate of the frequency domain signal X (ω) of the pre-delay signal and the frequency domain signal Y (ω) of the delayed signal for each frequency bin to obtain the numerator of the equation (3). Calculate (cross spectrum). The cross spectrum is multiplied by the reciprocal of the square (the denominator of the equation (3)) of the norm of the frequency domain signal X (ω) of the predelayed signal calculated by the whitening unit 112. If the output signal of the multiplication unit 114 is subjected to N-point inverse FFT in the IFFT unit 115, a whitening cross correlation is obtained, which becomes an impulse response h (t) of the delay path. The maximum peak detection unit 116 obtains the maximum peak position of the impulse response h (t) of the delay path, and outputs it as an estimated delay amount. [0018] As described above, the conventional delay estimation device 960 can reduce the amount of operation by calculating using N-point FFT and inverse FFT, compared with calculation in the time domain. 15-04-2019 5 [0019] JP 2007-81455 A [0020] However, in order to estimate the delay accurately, it is necessary to calculate the cross correlation with a signal length of several hundred ms or more, and even if using FFT, a large amount of calculation (the specific calculation amount will be described later) is required. It becomes. [0021] The present invention has been made in view of this problem, and provides a delay estimation method for realizing accurate delay estimation with a small amount of operation, an echo cancellation method using the method, and devices, programs, and recording media therefor. Intended to be provided. [0022] The delay estimation method of the present invention comprises a first frequency analysis process, a second frequency analysis process, a first buffer process, a second buffer process, a cross spectrum calculation process, and a whitening coefficient calculation process , Whitening process, time domain conversion process, time shift addition process, and maximum peak position search process. In the first frequency analysis process, when the number of samples taken from the predelayed signal of the time width for which whitening cross correlation is determined is N, the frequency analysis of the 2N / M points whose number is less than the N points The process of converting to the frequency domain is repeated in units of M times. The second frequency analysis process adds K frame, which is a range for estimating the delay, to the process of converting the delayed signal obtained by delaying the signal before delay into the frequency domain by frequency analysis of 2N / M points. Repeat M + K-1 times. 15-04-2019 6 In the first buffer process, the M pre-delay signals converted into signals in the frequency domain in the first frequency analysis process are stored. The second buffering process stores the M + K-1 pieces of delayed signals converted into signals in the frequency domain in the second frequency analysis process. In the cross spectrum calculation process, the signal obtained by taking the conjugate of the predelayed signal stored in the first buffer process and the delayed signal stored in the second buffer stage are multiplied for each same frequency, and the multiplication result is obtained. Are added for the frame to calculate the cross spectrum up to the K element. The whitening coefficient calculation process calculates the inverse square of the norm of the predelayed signal stored in the first buffer process as the whitening coefficient. The whitening process multiplies the cross spectrum by the whitening factor. The time domain conversion process converts the whitened cross spectrum in the whitening process into the time domain. In the time shift addition process, the output of the time domain conversion process is time-shifted and added from the first element to the K element and stored. In the maximum peak position search process, the maximum peak position of the signal stored in the time shift addition process is obtained as an estimated delay amount. [0023] Further, the echo cancellation method of the present invention includes a delay insertion process, a frequency domain adaptive filter process, and an addition process in addition to the delay estimation method shown above. In the delay insertion process, the reception signal converted into the frequency domain signal in the first frequency analysis process described above is delayed in time by an estimated delay amount estimated by the delay estimation method. The frequency domain adaptive filter process generates a pseudo echo signal with a reception signal into which a delay is inserted in the delay insertion process as an input. The summing process removes spurious echo signals from the delayed signal in the frequency domain. [0024] According to the delay estimation method of the present invention, frequency analysis of 2N / M points whose number is smaller than the number N of samples obtained from the time length for determining whitening cross correlation and calculation of time domain conversion, that is, calculation less than the prior art The amount of estimated delay can be determined by the amount. [0025] 15-04-2019 7 Also, according to the echo cancellation method of the present invention, it is possible to cancel echo components even with a long delay by giving the delay time estimated by the delay estimation method of the present invention before the adaptive filter. . In addition, by sharing the frequency analysis process, the amount of computation can be significantly reduced. [0026] The figure which shows the function structural example of the delay estimation apparatus 100 of this invention. FIG. 6 is a diagram showing an operation flow of the delay estimation device 100. The figure explaining the relationship between the sample score N and M. FIG. The figure which shows the more concrete operation | movement flow of the delay estimation apparatus 100. FIG. FIG. 6 is a diagram showing an example of a functional configuration of a whitening coefficient calculation unit 40. The figure which shows calculation of the 1st element of cross spectrum X '' <H> ((omega)) Y '' ((omega)). The figure which shows calculation of the 2nd element of cross spectrum X '' <H> ((omega)) Y '' ((omega)). FIG. 8 is a diagram showing an operation of the time shift and addition unit 70. FIG. 7 is a diagram showing an example of a functional configuration of a whitening coefficient calculation unit 240. The figure which shows the operation | movement flow of the whitening coefficient calculation part 240. FIG. FIG. 2 shows an exemplary functional configuration of a delay estimation apparatus 300. FIG. 7 shows an example of a functional configuration of an estimated interval setting unit 301; The figure which shows the operation | movement flow of the presumed area setting part 301. FIG. The figure which represented typically the level of the collection signal picked up with a microphone. The figure which shows the example of the estimation result of the conventional delay estimation apparatus (4096 FFT points). The figure which shows the example of the estimation result of the conventional delay estimation apparatus (1024 FFT points). The figure which shows the example of the estimation result of the delay estimation apparatus of this invention. The figure which shows the comparison of the correct answer rate of delay estimation. FIG. 2 is a diagram showing an example of a functional configuration of an echo cancellation apparatus 400 according to the present invention. The figure which shows the function structure of the conventional echo cancellation apparatus 900. FIG. The figure which shows the function structure of the conventional echo cancellation apparatus 950. FIG. The figure which shows the function structure of the conventional delay estimation apparatus 960. FIG. 15-04-2019 8 [0027] Hereinafter, embodiments of the present invention will be described with reference to the drawings. The same reference numerals are given to the same components in the drawings, and the description will not be repeated. [0028] FIG. 1 shows an example of the functional configuration of the delay estimation apparatus 100 of the present invention. The operation flow is shown in FIG. The delay estimation apparatus 100 includes a first frequency analysis unit 10, a second frequency analysis unit 12, a first buffer unit 20, a second buffer unit 22, a cross spectrum calculation unit 30, and a whitening coefficient. A calculation unit 40, a whitening unit 50, a time domain conversion unit 60, a time shift addition unit 70, a maximum peak position search unit 80, and a control unit 90 are provided. The delay estimation apparatus 100 is realized by, for example, a predetermined program being read into a computer including a ROM, a RAM, a CPU and the like, and the CPU executing the program. [0029] The first frequency analysis unit 10 converts the signal before delay into the frequency domain by frequency analysis of 2N / M points smaller than the above N points, where N is the number of sample points of the time width for determining the whitening cross correlation Are repeated M times (step S10). The first buffer unit 20 stores M pre-delay signals converted to the frequency domain by the first frequency analysis unit 10 (step S20). [0030] The second frequency analysis unit 12 converts the delayed signal obtained by delaying the predelayed signal into a frequency domain by frequency analysis of 2N / M points, and a K frame which is a range for estimating the delay in the M frame. It repeats by the unit of M + K-1 times added (step S12). The second buffer unit 22 stores M + K−1 pieces of delayed signals that have been converted to the frequency domain by the second frequency analysis unit 12 (step S22). 15-04-2019 9 [0031] The cross spectrum calculation unit 30 multiplies, for each identical frequency, the pre-delay signal obtained by taking the conjugate of the signal stored in the first buffer unit 20 and the delayed signal stored in the second buffer unit 22; A cross spectrum is calculated by adding the multiplication results to K elements (step S30). The whitening coefficient calculation unit 40 calculates, as a whitening coefficient, the reciprocal of the square of the norm of the pre-delay signal stored in the first buffer unit 20 (step S40). [0032] The whitening unit 50 multiplies the cross spectrum by the whitening coefficient (step S50). The time domain conversion unit 60 converts the cross spectrum whitened by the whitening unit 50 into the time domain (step S60). The time shift and addition unit 70 adds and stores the first to K elements while temporally shifting the output signal of the time domain conversion unit 60 (step S70). The maximum peak position searching unit 80 obtains the maximum peak position of the signal stored by the time shift adding unit 70 as an estimated delay amount (step S80). The control unit 90 controls the operation of each unit so as to repeat the processing of steps S10 to S80 until the delay estimation operation is finished or the pre-delay signal disappears. [0033] Here, the sample points N and M to be subjected to the frequency analysis processing in the first frequency analysis unit 10 and the second frequency analysis unit 12 will be described with reference to FIG. As a frequency analysis method, any known method such as FFT (Fast Fourier Transform), DFT (Discrete Fourier Transformation), wavelet transform, etc. can be used. In the following description, an example using FFT will be described. [0034] It is assumed that the whitening cross-correlation signal time width is, for example, 512 ms, and the sampling frequency at which the continuous signal is discrete is 8 kHz. The number of samples obtained with a time width of 512 ms is 0.512 × 8 k = 4096. In the delay estimation method of the prior art, the number of samples N = 4096 FFT processing was performed under 15-04-2019 10 that assumption. [0035] In contrast to the prior art, the present invention estimates the delay amount in units of (N / M) frames in which N is divided by M. That is, it is different from the prior art in that FFT processing with a shift amount N / M is repeated M times (M = 64 in this example) with 2N / M samples. [0036] As described above, the delay estimation apparatus 100 converts the pre-delayed signal and the delayed signal into signals in the frequency domain by computing FFT of (2N / M) points shorter than the signal length N M times. Here, M is an integer, and N and M are set such that (2N / M) has the same number of sample points as the FFT used in the echo canceller incorporating the delay estimation apparatus 100. The pre-delay signal converted to the frequency domain in the m-th time (m is an arbitrary integer smaller than M) is represented as X 'm (ω), and the delayed signal converted to the frequency domain in the m-th time Y' m (ω If the impulse response of the delay path is divided into (N / M) and converted into the frequency domain is expressed as H'm (ω), the following relationship is established. [0037] [0038] Here, K is an integer that determines a frame range for estimating the delay, and (KN / M) is the number of samples of the estimated range of the delay. If each matrix of Formula (4) is expressed by Formula (5)-Formula (7), [0039] 15-04-2019 11 [0040] Equation (4) can be expressed by equation (8). [0041] [0042] If equation (8) is solved for H ′ ′ (ω), equation (9) is obtained. [0043] [0044] <H> represents conjugate transposition of a matrix. [0045] Pre-delayed signals X ′ 1 (ω),..., X ′ M (ω) converted to M frequency regions and delayed signals Y ′ 1 (ω), converted to M + K−1 frequency regions H ′ ′ (ω) can be obtained from Y ′ M (ω) using equation (9). However, for the calculation of {X ′ ′ <H> (ω) X ′ ′ (ω)} <− 1>, it is necessary to calculate an inverse matrix of M rows and M columns, and the amount of calculation is large. Therefore, calculation is omitted using approximation. If each element of X ′ ′ <H> (ω) X ′ ′ ′ (ω) is calculated, Expression (10) is obtained. [0046] [0047] 15-04-2019 12 Here, all the diagonal components of the equation (10) have the same value X '<*> 1 (ω) X'1 (ω) +... + X' <*> M (ω) X'M (ω). This value is equal to the square of the norm of the vector {X'1 (ω) +... + X'M (ω)}. On the other hand, for components other than diagonal, it is multiplication of signals not at the same time. [0048] Assuming that the autocorrelation N / M samples or more of the pre-delay signal x (t) is sufficiently smaller than the signal power, values other than the diagonal components in equation (10) are sufficiently smaller than the diagonal components. Thus, as shown in equation (11), approximation can be made only with diagonal components. Generally, in speech, the autocorrelation becomes smaller in a few tens of ms, so approximation is possible. [0049] [0050] Using equation (11), the inverse matrix of equation (9) can be represented by a product of scalars and approximated by equation (12). [0051] [0052] By using the equation (12), the response H ′ ′ (ω) of the delay path can be obtained by largely reducing the amount of calculation. 15-04-2019 13 In order to return the response H ′ ′ (ω) of the delay path to the time domain, perform inverse FFT of (2N / M) points on K elements of H ′ ′ (ω) respectively (N / M) It is sufficient to shift and add each point. [0053] A concrete calculation method of the response H ′ ′ (ω) of the delay path of the equation (12) and a method of converting the response H ′ ′ (ω) of the delay path into the time domain are shown in FIG. It will be specifically explained while making it correspond. The first frequency analysis unit 10, which converts the pre-delay signal into a signal in the frequency domain, converts the pre-delay signal into the frequency domain with an FFT of 2N / M points (step S10, FIG. 4). The shift amount of the signal is N / M samples, and M times of FFT are performed. The first buffer unit 20 stores M times of the pre-delay signal converted to the frequency domain (step S20). By these processes, the pre-delay signals X ′ 1 (ω),..., X ′ M (ω) converted to the frequency domain are stored in the first buffer unit 20. [0054] The second frequency analysis unit 12, which converts the delayed signal into the frequency domain, converts the delayed signal into a frequency domain signal by FFT at 2N / M points (step S12). The shift amount of the signal is N / M samples, and M + K-1 FFTs are performed. The second buffer unit 22 stores M + K-1 times of the delayed signal converted to the frequency domain (step S22). By these processes, the delayed signals Y ′ 1 (ω),..., Y ′ M + K−1 (ω) converted to the frequency domain are stored in the second buffer unit 22. [0055] 15-04-2019 14 The whitening coefficient calculation unit 40 calculates a whitening coefficient from the predelay signals X′1 (ω),..., X′M (ω) stored in the first buffer unit 20 (step S40). A more specific functional configuration example of the whitening coefficient calculation unit 40 is shown in FIG. The whitening coefficient calculation unit 40 includes a norm calculation unit 41 and an inverse number calculation unit 42. The norm calculation means 41 calculates the square of the norm of the signal X'1 (.omega.),..., X'M (.omega.) Before delaying X '<*> 1 (.omega.) X'1 (.omega.) +... + X' <* > M (ω) X ′ M (ω) is calculated. The reciprocal calculation means 42 calculates the reciprocal of the square of the norm and outputs it as a whitening coefficient. [0056] The cross spectrum calculation unit 30 is configured to transmit the pre-delay signals X ′ 1 (ω),..., X ′ M (ω) stored in the first buffer unit 20 and the delay stored in the second buffer unit 22. The cross spectrum X ′ ′ <H> (ω) Y ′ ′ (ω) is calculated from the post signals Y′1 (ω),..., Y′M + K−1 (ω). [0057] The method of calculating the cross spectrum will be described with reference to FIGS. 6 and 7. FIG. 6 shows the calculation of the first element of the cross spectrum X ′ ′ <H> (ω) Y ′ ′ (ω). The first element is the case where the read positions of the first buffer unit 20 and the second buffer unit 22 are set at the beginning (step S31). The signals of the same frame number in the first buffer unit 20 and the second buffer unit 22 are multiplied for each frequency bin (step S32), and the frame numbers are added (step S33). The data in the first buffer unit 20 is multiplied after taking the conjugate <*>. The frame number is a number indicating the number of the FFT result. FIG. 7 shows the calculation of the second element of the cross spectrum X ′ ′ <H> (ω) Y ′ ′ (ω). In the second element, the readout position of the second buffer unit 22 is shifted by one frame without changing the readout position of the first buffer unit 20, multiplication is performed for each frequency bin, and addition is performed for frame numbers It is. The process of shifting the reading position of the second buffer unit 22 by one frame is repeated until the frame is shifted by M + K-1. [0058] The whitening unit 50 generates a whitening coefficient 1 / {X ′ <*> 1 (ω) X′1 (ω) for the cross spectrum X ′ ′ <H> (ω) Y ′ ′ (ω) output from the cross spectrum calculation unit 30. 15-04-2019 15 +... + X '<*> M (.omega.) X'M (.omega.)} To calculate the response H "(.omega.) Of the delay path (step S50). [0059] The time domain conversion unit 60 converts each element of the response H ′ ′ (ω) of the delay path into a time domain signal by inverse FFT of 2N / M points (step S60). The time shift and addition unit 70 adds and saves the output signal of the time domain conversion unit 60 while time shifting (step S70). FIG. 8 shows the operation of the time shift adder 70. The time shift and addition unit 70 shifts the first element to the K th element of the cross spectrum output from the time domain conversion unit 60 by N / M samples and adds them, and adds whitening cross correlation h (t) in the time domain (estimated) (Impulse response of the delay path) [0060] The processing is repeated while advancing the reading position of the second buffer unit 22 by one frame (step S34) until the processing of steps S32 to S70 for the K frame which is the range for estimating the delay is completed (NO in step S90) ). The calculation of the cross spectrum X ′ ′ <H> (ω) Y ′ ′ (ω) is repeated until the M + K−1th element of the frame (YES in step S90). [0061] The maximum peak position search unit 80 searches for the maximum peak position of the whitening cross correlation h (t) in the time domain, and outputs the position as an estimated delay amount (step S80). [0062] As described above, in the delay estimation method of the present invention, M + K FFT processing of 2N / M sample points in the first frequency analysis unit 10 and M + K− of 2N / M sample points in the second frequency analysis unit 12 are performed. The estimated delay 15-04-2019 16 amount can be obtained by the product-sum operation of one FFT process and K inverse FFTs of 2N / M sample points in the time domain transform unit 60. For example, when N = 4096, M = 128, K = 16, the number of product-sum operations calculated in each part is 49152 times in the first frequency analysis unit 10 and 54912 times in the second frequency analysis unit 12 When the time domain conversion unit 60 adds 6144 times, the sum becomes 110208 times of product-sum operations. [0063] On the other hand, in the delay estimation method of the prior art, since two N-point FFTs and one N-point inverse FFT are required, the number of product-sum operations is 147456 (3N log 2 N). Thus, the delay estimation method of the present invention can reduce the number of product-sum operations by about 25%. [0064] FIG. 9 shows a functional configuration of the whitening coefficient calculation unit 240 of the delay estimation apparatus 200 according to the second embodiment of the present invention. The delay estimation apparatus 200 is different from the delay estimation apparatus 100 only in the configuration of the whitening coefficient calculation unit 240, and has improved tolerance to noise. [0065] Among the frequency components of the pre-delay signal, components with small power are susceptible to noise, which causes a reduction in estimation accuracy of delay amount estimation. The whitening coefficient calculation unit 240 improves the resistance to noise by setting the whitening coefficient to such a small frequency component with a small power. [0066] 15-04-2019 17 The whitening coefficient calculation unit 240 includes a norm calculation unit 41, a full band power calculation unit 241, a coefficient multiplication unit 242, an addition unit 243, and an inverse number calculation unit 42. FIG. 10 shows the operation steps of each means. The norm calculation means 41 calculates the square of the norm X '<*> 1 (ω) of the predelayed signals X′1 (ω),..., X′M (ω) converted to the frequency domain as in the first embodiment. X'1 (.omega.) +... + X '<*> M (.omega.) X'M (.omega.) Is calculated (step S41). [0067] The full band power calculation means 241 calculates the full band average power obtained by averaging the squares of the norm for the frequencies (step S241). The coefficient multiplying unit 242 multiplies the output of the full band power calculating unit 241 by a preset constant α (step S 242). When the constant α is reduced, whitening is strictly performed, and when the constant α is increased, the resistance to noise can be improved. The constant α needs to be set to an appropriate value that is balanced between the effect of whitening and noise immunity. [0068] The addition means 243 adds the output of the norm calculation means 41 and the output of the coefficient multiplication unit (step S243). The reciprocal calculation means 42 calculates the reciprocal of the output of the addition means 243 and outputs it as a whitening coefficient (step S42). [0069] The whitening coefficient β (ω) calculated by the whitening coefficient calculation unit 240 can be expressed by equation (13). [0070] [0071] Here, W is the maximum value of the frequency to be used. [0072] 15-04-2019 18 In equation (13), by adding the average power of the full band to the denominator, the denominator does not become an extremely small value even at a frequency of small power. As a result, it is possible to prevent the whitening coefficient from becoming very large for frequency components with low power, and to improve noise immunity. [0073] FIG. 11 shows an example of the functional configuration of the delay estimation apparatus 300 according to the third embodiment of the present invention. The delay estimation device 300 is obtained by adding an estimation interval setting unit 301 to the configuration of the delay estimation devices 100 and 200. [0074] Estimation interval setting section 301 receives a pre-delay signal and sets a signal interval suitable for delay amount estimation. By estimating the delay amount in a signal section suitable for delay amount estimation, estimation errors can be reduced and estimation accuracy can be increased. [0075] First, it will be described which signal section is a section suitable for estimation of the delay amount. In the estimation of the delay amount, factors that reduce the estimation accuracy include ambient noise collected by the microphone and reverberation in the room. In order to reduce these influences, it is sufficient to select a section in which the output sound volume from the speaker is large and reverberant sound is not picked up by the microphone as much as possible. 15-04-2019 19 [0076] FIG. 12 schematically shows the power of the sound collection signal collected by the microphone. The horizontal axis in FIG. 12 is elapsed time, and the vertical axis is power. The power of ambient noise is picked up by the microphone at a nearly constant level (black paint). In order to find a place where the influence of ambient noise is small, it is sufficient to detect a large part of the amplitude. Therefore, it is possible to obtain a section that is less susceptible to ambient noise by estimating a noise level and extracting a section that exceeds a threshold value multiplied by a coefficient. [0077] The reverberant sound (fine broken line) is sent slightly from the direct sound (thin solid line) coming out of the speaker and directly reaching the microphone. Therefore, a section with few reverberation components exists at the rising portion of the voice. The section is a section whose level is larger than the level of the reverberation sound by estimating the level of the reverberation sound from the amplitude of the pre-delay signal. [0078] A section with few reverberation components and less susceptible to ambient noise, ie, a section with high delay estimation accuracy, is a section in which the power of the collected signal is larger than the noise level and larger than the power of the reverberation signal. The estimated segment setting unit 301 sets the segment. [0079] FIG. 13 shows an example of the functional configuration of the estimated segment setting unit 301. The operation flow is shown in FIG. The estimation interval setting unit 301 includes a noise level estimation unit 3010, a reverberation level estimation unit 3011, a threshold setting unit 3012, and an interval detection unit 3013. 15-04-2019 20 [0080] The noise level estimation means 3010 dip-holds the amplitude of the pre-delay signal to estimate the noise level with the pre-delay signal as an input (step S3010). The noise level estimation means 3010 estimates the noise level N (t) by dip-holding the level P (t) of the predelay signal. This is done, for example, using equation (14). [0081] [0082] If the past estimated noise level N (t-1) is larger than the level P (t) of the pre-delay signal, the level of the pre-delay signal is substituted for the estimated noise level. If the past estimated noise level N (t-1) is smaller than the level P (t) of the pre-delay signal, the past estimated noise level N (t-1) is multiplied by a constant u of 1 or more, Raise the noise level. Here, u is a constant of 1 or more and is set in advance. u is a rising coefficient of the estimated noise level, and the closer it is to 1, the gentle rising, and the dip hold effect can be obtained. [0083] The reverberation level estimation unit 3011 receives the predelayed signal, performs peak hold of the amplitude of the predelayed signal, and estimates the reverberation sound level R (t) (step S3011). The estimation of the reverberation sound level R (t) is performed by peak-holding the level P (t) of the pre-delay signal (equation (15)). [0084] [0085] 15-04-2019 21 If the past estimated reverberation sound level R (t-1) is larger than the level P (t) of the predelay signal, the reverberation sound is obtained by multiplying the estimated reverberation sound level R (t-1) by a constant v less than 1 Decrease the level. When the past estimated reverberation sound level R (t-1) is smaller than the level P (t) of the pre-delay signal, the level of the pre-delay signal is substituted for the reverberation sound level. Here, v is an attenuation coefficient of the estimated reverberation level, and a value corresponding to the reverberation time is set. [0086] The threshold setting means 3012 compares the value of the estimated noise level N (t) multiplied by a constant of 1 or more with the value of the estimated reverberation sound level R (t) multiplied by a constant of less than 1 Are set (step S3012). The section detection unit 3013 compares the threshold with the amplitude of the predelay signal, and detects a section in which the amplitude of the predelay signal is larger than the threshold as a section for delay estimation (step S3013). [0087] The delay estimation devices 100 and 200 perform delay estimation on the estimated segment output from the estimated segment setting unit 301. As described above, since the delay estimation apparatus 300 performs delay estimation from the pre-delay signal suitable for delay estimation, the delay estimation apparatus 300 can perform delay estimation with high accuracy without being affected by ambient noise and reverberation. [Evaluation Experiment] An evaluation experiment was conducted for the purpose of confirming the effect of the delay estimation method of the present invention. FIG. 15 shows the result of delay estimation with 4096 FFT points in the conventional delay estimation apparatus. The horizontal axis is time (ms), and the vertical axis is delay estimation result (ms). The actual delay at this time is 65 ms, and many delay estimation results (+) are concentrated around 65 ms, but there are also many estimation errors. [0088] 15-04-2019 22 FIG. 16 shows the result of delay estimation by the conventional delay estimation method with the number of FFT points being 1024. It can be seen that the delay estimation results (+) are uniformly distributed and estimation is not possible at all. The cause of this is that the signal length used for correlation calculation is too short for the reverberation time. As described above, in the conventional delay estimation method, in order to perform accurate estimation, a longpoint FFT is required, and many operations are required. [0089] FIG. 17 shows the result of delay estimation by calculating correlation for 35 frames at 128 points of FFT by the delay estimation method described in the third embodiment of the present invention. All delay estimation results are around 65 ms which is the actual delay, and it can be seen that the delay estimation can be made correctly. [0090] FIG. 18 shows the result of comparing the accuracy rate of delay estimation with the prior art. The vertical axis represents the accuracy rate, and the horizontal axis represents the delay estimation method. The delay estimation method is, from the left, the conventional method of 4096 points of FFT, the conventional method of 1024 points of FFT, 35 frames for 128 points of FFT points in common, and the first embodiment, the second embodiment and the third embodiment are in order Lined up. [0091] Also in the delay estimation method described in the first embodiment of the present invention, the accuracy rate of 69% is obtained which exceeds the accuracy rate of 65% of the 4096 points of FFT in the prior art. In addition, 75% was obtained in the delay estimation method of the second embodiment, and about 100% in the third embodiment. As described above, according to the delay estimation method of the present invention, accurate delay estimation can be performed even if the number of data for frequency analysis is small. [0092] 15-04-2019 23 [Application Example] The delay estimation devices 100, 200, and 300 of the present invention can be used for an echo cancellation device. FIG. 19 shows an example of the functional configuration of an echo cancellation apparatus 400 using the delay estimation apparatus of the present invention. The echo cancellation apparatus 400 includes a delay estimation apparatus 100, a delay insertion unit 952, a frequency domain adaptive filter 401, and an addition circuit 902. The delay estimation apparatus 100 may be the delay estimation apparatus 200 or 300 described above. [0093] Delay estimation apparatus 100 receives a pre-delay signal in the time domain, which is a reception signal, and a delayed signal in the time domain, which is a transmission signal, as an input, and obtains an estimated delay amount between the two signals. The delay insertion unit 952 is converted into a signal in the frequency domain by the frequency analysis of 2N / M points smaller than N in the first frequency analysis unit 10 inside the delay estimation device 100, where N is the number of sample points. The pre-delay signal is delayed in time by the estimated amount of delay estimated by the delay estimating device. [0094] The frequency domain adaptive filter unit 401 receives the pre-delay signal whose delay has been inserted by the delay insertion unit 952 and generates a pseudo echo signal. The addition unit 902 removes the pseudo echo signal from the delayed signal that has been converted into the frequency domain signal by the frequency analysis of 2N / M points in the second frequency analysis unit 12 inside the delay estimation apparatus 100. The delayed signal in the frequency domain from which the pseudo echo signal has been removed is converted into a transmission signal in the time domain by performing 2N / M points inverse FFT K times in the 2N / M point IFFT unit 402. [0095] The echo cancellation apparatus 400 is characterized in that the frequency analysis unit for the echo canceller and the frequency analysis unit for delay estimation are commonly used. With this configuration, in the echo cancellation apparatus using the conventional delay estimation 15-04-2019 24 apparatus, two FFTs at N points and one inverse FFT at N points are required. The product-sum operation of this portion is 147456 times. [0096] On the other hand, in the echo canceller 400 of the present invention, the amount of calculation required for signal conversion is only to perform the inverse FFT of 2N / M points K times. Here, under the condition of N = 4096, M = 64, and K = 16, the amount of calculation of FFT to be added in the echo canceller 400 of the present invention is 14336 times. As described above, the echo canceller 400 of the present invention can reduce the amount of calculation of signal conversion processing to the frequency domain to about 1/10. [0097] When the processing means in the above apparatus is realized by a computer, the processing content of the function that each apparatus should have is described by a program. Then, by executing this program on a computer, the processing means in each device is realized on the computer. [0098] The program describing the processing content can be recorded in a computer readable recording medium. As the computer readable recording medium, any medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory, etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk drive, a flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (Rewritable), etc., magneto-optical recording medium, MO (Magneto Optical disc), etc., semiconductor memory EEP-ROM (Electronically Erasable and Programmable Only Read Memory) etc. It can be used. [0099] 15-04-2019 25 Further, the distribution of this program is carried out, for example, by selling, transferring, lending, etc. a portable recording medium such as a DVD, a CD-ROM, etc. in which the program is recorded. Furthermore, the program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network. [0100] Further, each means may be configured by executing a predetermined program on a computer, or at least a part of the processing content may be realized as hardware. 15-04-2019 26

1/--страниц