Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2018032931 Abstract: The present invention provides a sensitivity calibration gain calculation method capable of accurately calculating sensitivity calibration gain even in the presence of disturbing sounds, and setting threshold values more easily. An acoustic signal processing apparatus 10 converts a plurality of input signals s1 (n) and s2 (n) obtained from a plurality of microphones m_1 and m_2 into a plurality of frequency domain signals converted from time domain to frequency domain. A front suppression signal generation unit 12 that generates a front suppression signal having a blind spot on the front, a coherence calculation unit 13 that calculates coherence based on signals obtained from a plurality of input signals, and a front suppression signal; Based on the feature amount calculating unit 14 that calculates a feature amount representing the relationship with the coherence, a target sound section not affected by the interference sound is detected based on the feature amount, and a calibration gain for each input signal of the target sound section And a calibration gain multiplication unit 16 or 17 that calibrates each corresponding input signal with each calibration gain. [Selected figure] Figure 1 Acoustic signal processing apparatus, program and method [0001] The present invention relates to an audio signal processing apparatus, program, and method, and can be applied to audio signal processing used in, for example, a communication device or communication software used for telephones, video phones, etc., or for preprocessing of speech recognition processing. is there. [0002] 03-05-2019 1 BACKGROUND In recent years, devices equipped with various voice processing functions such as a voice call function and a voice recognition function, such as smartphones and car navigation, have become widespread. However, with the widespread use of these devices, voice processing functions are being used under harsher noise environments than before, such as in a busy city or in a moving car. Therefore, there is a growing demand for signal processing technology that can maintain call sound quality and speech recognition performance even in a noisy environment. [0003] JP, 2014-68052, A [0004] Hiraoka Kazuyuki, Hori Gen, "Probability Statistics for Programming", published by Ohmsha Co., Ltd., October 23, 2009, p. 178-p.１７９ [0005] In recent years, acoustic signal processing technology using a multi-channel microphone has been realized, but even with microphones of the same model number, there is a sensitivity difference, and accurate acoustic feature values can not be calculated unless the sensitivity difference is calibrated. Until now, measure the sensitivity of the microphone in advance, set the correction gain according to the sensitivity difference, compare the input level for each channel, and automatically set the correction gain to match the average value It copes with. However, since the former takes time and the latter fills not only the sensitivity difference of the microphone but also the difference between the acquired input signals, there is a problem that the accuracy of the acoustic feature quantity calculated in the later stage can not be ensured. [0006] 03-05-2019 2 One of the methods for solving this problem is to calculate the calibration gain by comparing the input level only in the section of the signal component coming from the front of the microphone among the input signals. This is because if the signal coming from the front is the same distance between each microphone and the sound source, the acoustic characteristic difference between the signal components reaching the microphone is small, and the characteristic difference occurring between the two is only the microphone sensitivity. It is premised that what can be expected. One solution based on this is the method described in Patent Document 1. This focuses on the fact that the magnitude of the feature amount of coherence fluctuates depending on whether the voice of the target speaker arrives from the front, and calculates the microphone sensitivity difference calibration gain in the signal section where the voice comes from the front. It is a technology. Note that even if there is a difference in the sensitivity of the microphone, the behavior in which the magnitude fluctuates depending on whether the voice comes from the front is maintained, so that the sensitivity difference can be calibrated by this method. (Complement: For the calculation method of the coherence, refer to Equation 7 of Patent Document 1) [0007] However, since the method of Patent Document 1 has a large value of coherence even when the target voice coming from the front of the microphone array simultaneously receives the target voice coming from the left and right from another speaker from another side (interference sound), it comes from the front. Unintended speech components are also reflected in the calibration gain. In addition, since the sensitivity difference of the microphone is random for each microphone array, optimization of the threshold for detecting the signal section coming from the front is difficult, and there is a possibility that the target voice section is erroneously determined. [0008] Therefore, in order to improve the above two problems, there is a need for a sensitivity calibration gain calculation method that can accurately calculate the sensitivity calibration gain even if there is an interference sound and can set the threshold more easily. . [0009] In order to solve the above problems, an acoustic signal processing apparatus according to a first aspect of the present invention comprises: (1) a plurality of frequency domains obtained by converting a plurality of input signals obtained from each of a plurality of microphones from a 03-05-2019 3 time domain to a frequency domain (2) a coherence calculation unit which calculates coherence based on signals obtained from a plurality of input signals; (2) a front suppression signal generation unit which generates a front suppression signal having a dead angle in front based on a difference between signals; ) A feature amount calculation unit that calculates a feature amount that represents the relationship between the front suppression signal and the above-mentioned coherence, and (4) Based on the feature amount, detects a target sound section that is not affected by interference noise, A calibration gain calculation unit that calculates a calibration gain for each input signal of a section, and (5) a calibration unit that calibrates each corresponding input signal with each calibration gain are characterized. [0010] An acoustic signal processing program according to a second aspect of the present invention is based on the difference between a plurality of frequency domain signals obtained by converting a plurality of input signals obtained from each of a plurality of microphones from a time domain to a frequency domain. A front suppression signal generation unit that generates a front suppression signal having a blind spot in front, (2) a coherence calculation unit that calculates coherence based on signals obtained from a plurality of input signals, and (3) a front suppression signal (4) A target sound section not affected by the interference sound is detected based on the characteristic quantity calculation unit that calculates the characteristic quantity representing the relationship with the coherence, and the target sound section with respect to each input signal is detected. A calibration gain calculation unit that calculates a calibration gain, and (5) each calibration gain are characterized as functioning as a calibration unit that calibrates each corresponding input signal. [0011] According to a third aspect of the present invention, there is provided an acoustic signal processing method comprising: (1) a plurality of frequency domain signals obtained by converting a plurality of input signals obtained from each of a plurality of microphones from a time domain to a frequency domain; The frontal suppression signal having a blind spot in front is generated based on the difference of (2), the coherence calculation unit calculates the coherence based on the signals obtained from the plurality of input signals, and (3) the feature amount calculation unit (4) The calibration gain calculation unit detects a target sound section that is not affected by the disturbance sound based on the characteristic quantity, and the target sound section A calibration gain for each input signal of the section is calculated, and (5) a calibration unit calibrates each corresponding input signal with each calibration gain. [0012] According to the present invention, the sensitivity calibration gain can be accurately calculated 03-05-2019 4 even in the presence of an interference sound, and the threshold can be set more easily. [0013] BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram which shows the whole structure of the acoustic signal processing apparatus which concerns on embodiment. It is explanatory drawing which shows the characteristic of the directivity formed of the front suppression signal generation part which concerns on embodiment. It is a block diagram which shows the structure of the correlation calculation part which concerns on embodiment. It is a block diagram showing composition of a calibration gain calculation part concerning an embodiment. It is a flowchart which shows the processing operation in the calibration gain calculation part which concerns on embodiment. [0014] (A) Main Embodiment In the following, embodiments of an acoustic signal processing apparatus, program and method according to the present invention will be described in detail with reference to the drawings. [0015] (A-1) Configuration of Embodiment FIG. 1 is a block diagram showing the overall configuration of an acoustic signal processing apparatus 10 according to this embodiment. [0016] In FIG. 1, the acoustic signal processing apparatus 10 includes a plurality of (the two cases are illustrated in FIG. 1) microphones m_1 and m_2, an FFT unit 11, a front suppression signal generation unit 12, a coherence calculation unit 13, a correlation calculation A unit 14, a 03-05-2019 5 calibration gain calculator 15, a first calibration gain multiplier 16, and a second calibration gain multiplier 17 are provided. [0017] The “feature amount calculation unit” described in the claims includes the correlation calculation unit 14. Further, the “calibration gain calculation unit” includes the calibration gain calculation unit 15. Furthermore, the “calibration unit” includes a first calibration gain multiplication unit 16 and a second calibration gain multiplication unit 17. [0018] In the acoustic signal processing apparatus 10 illustrated in FIG. 1, components other than the microphones m_1 and m_2 can be realized as software (acoustic signal processing program) executed by the CPU, and the function of the acoustic signal processing program is shown in FIG. Can be represented by [0019] The mike m_1 and the mike m_2 are disposed apart by a predetermined distance (or an arbitrary distance), and each of the mike m_1 and the mike m_2 captures surrounding sound. Each acoustic signal (input signal) captured by each of the microphones m_1 and m_2 is converted into an analog / digital (A / D) converter (not shown), and each of the input signals s1 (n) and s2 (n) , FFT unit 11, calibration gain calculation unit 15, first calibration gain multiplication unit 16 and second calibration gain multiplication unit 17. Here, n is an index representing the input order of samples, and is expressed by a positive integer. 03-05-2019 6 In the text, the smaller the value of n, the older the input sample, and the larger the value of n, the newer input sample. [0020] The FFT unit 11 receives input signals s1 (n) and s2 (n) from the microphones m_1 and m_2 and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1 (n) and s2 (n). is there. Thereby, the input signals s1 (n) and s2 (n) can be converted from the time domain to the frequency domain. Note that the FFT unit 11 performs analysis on the analysis frame FRAME 1 (K) and N (an arbitrary integer) samples from the input signals s 1 (n) and s 2 (n) when performing fast Fourier transform. Assume that FRAME 2 (K) is configured. [0021] The example which comprises FRAME1 from the input signal s1 is illustrated to (1) Formula. In the following equation (1), K is an index indicating the order of frames, and is expressed by a positive integer. In the following, the smaller the value of K, the older the analysis frame, and the larger the value of K, the newer the analysis frame. Further, in the following description of the operation, it is assumed that the index K represents the latest analysis frame to be analyzed, unless otherwise noted. [0022] The FFT unit 11 performs fast Fourier transform processing for each analysis frame to input a frequency domain signal X1 (f, K) obtained by performing Fourier transform on an analysis frame FRAME 1 (K) configured from the input signal s1, and A frequency domain signal X2 (f, X) obtained by Fourier transform to an analysis frame FRAME2 (K) composed of the signal s2 is acquired. The FFT unit 11 supplies the front suppression signal generation unit 12 and also supplies the frequency domain signal X 1 (f, K) and the frequency domain signal X 2 (f, X) to the coherence calculation unit 13. [0023] 03-05-2019 7 Here, f is an index representing a frequency. Also, the frequency domain signals X1 (f, K) and X2 (f, K) are not single values, but m (m is any number) of plural frequencies f1 to fm as shown in the following equation (2) (Integer component) (spectral component). [0024] In the above equation (2), X1 (f, K) is a complex number, and is composed of a real part and an imaginary part. The same applies to X2 (f, K) and the front suppression signal N (f, K) appearing in the front suppression signal generator 12 described later. [0025] The front suppression signal generation unit 12 performs processing for suppressing, for each frequency component, a signal component coming from the front direction, for the signal from the FFT unit. In other words, the front suppression signal generation unit 12 functions as a directional filter that suppresses components in the front direction. [0026] For example, as illustrated in FIG. 2, the front suppression signal generation unit 12 suppresses the component in the front direction from the signal from the FFT unit 11 using a bi-directional filter having a dead angle in the front direction of a figure eight. Form a directional filter. [0027] Specifically, the front suppression signal generation unit 12 performs calculation as shown in equation (3) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11 to obtain frequency components. Each front suppression signal N (f, N) is generated. The calculation of equation (3) below corresponds to the process of forming an 8-shaped bidirectional filter having a dead angle in the front direction as shown in FIG. Ｎ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−Ｘ２（ｆ，Ｋ） …（３） 03-05-2019 8 [0028] As described above, the front suppression signal generation unit 12 acquires each frequency component of the frequencies f1 to fm (power for one frame of each frequency band). [0029] Further, the front suppression signal generation unit 12 calculates an average front suppression signal AVE_N (K) obtained by averaging the front suppression signal N (f, K) over all the frequencies f1 to fm according to the equation (4). Do. [0030] The coherence calculation unit 14 calculates a coherence COH (K) by forming a signal having strong directivity in a specific direction included in the frequency domain signals X1 (f, K) and X2 (f, K) from the FFT unit 11. . [0031] Here, the process of calculating the coherence COH (K) in the coherence calculation unit 14 will be described. [0032] The coherence calculation unit 14 processes the signal B1 (f, K) processed by the filter having strong directivity in the first direction (for example, the left direction) from the frequency domain signals X1 (f, K) and X2 (f, K). Also, the coherence calculation unit 14 generates a signal B2 (f which is processed from the frequency domain signals X1 (f, K) and X2 (f, K) in the second direction (for example, the right direction) with a strong directivity. , K). As a method of forming the signals B1 (f) and B2 (f) having strong directivity in a specific direction, the existing method can be applied. Here, the following equation (5) is applied to the first direction An example is shown in which the signal B1 having strong directivity is formed and the signal B2 having strong directivity in the second direction is formed by applying the following equation (6). 03-05-2019 9 [0033] In the above equations (5) and (6), S represents a sampling frequency, N represents an FFT analysis frame length, τ represents a sound wave arrival time difference between the microphone m_1 and the microphone m_2, i represents an imaginary unit, and f represents a frequency. . [0034] Next, the coherence calculation unit 13 performs the following operations (7) and (8) on the signals B1 (f) and B2 (f) obtained as described above. We obtain the coherence COH (K). Here, B2 (f, K) <*> in the equation (7) is a conjugate complex number of B2 (f, K). [0035] coef (f, K) is a coherence at a component of a frame having an index of an arbitrary index K (an arbitrary frequency f (a frequency of any of frequencies f1 to fm) constituting the analysis frames FRAME1 (K) and FRAME2 (K)) It is assumed that [0036] When coef (f, K) is determined, if the directivity of the signal B1 (f) and the directivity of the signal B (f) are different from each other, the signals B1 (f) and B2 ( The directivity direction according to f) may be any direction other than the front direction. Further, the method of calculating coef (f, K) is not limited to the above-described calculation method, and for example, the calculation method described in Patent Document 1 can be applied. [0037] The correlation calculation unit 14 acquires the average front suppression signal AVE_N (K) from the front suppression signal generation unit 12, acquires the coherence COH (K) from the coherence calculation unit 13, and obtains the average front suppression signal AVE_N (K) and the coherence COH. The correlation coefficient cor (K) with (K) is calculated. 03-05-2019 10 [0038] The significance of the correlation calculation unit 14 calculating the correlation coefficient between the front suppression signal (average front suppression signal) having directivity in directions other than the front direction and the coherence will be described. [0039] Here, it is assumed that a sound source emitting a target sound is present in the front direction of the microphone m_1 and the microphone m_2, and the disturbance sound arrives from a direction other than the front direction (for example, the left and right direction of the mike m1 and the m2 . [0040] For example, in the case where “a disturbance sound is not present” and “a target sound is present”, the front suppression signal has a signal value proportional to the magnitude of the target sound component. However, as shown in FIG. 2, the gain in the front direction is smaller than the gain in the lateral direction, and thus has a smaller value than when there is an interference sound. [0041] Further, the coherence COH (K) is a feature having a deep relationship with the incoming direction of the input signal, and can be rephrased as the correlation between two signal components. This is because equation (6) is an equation for calculating the correlation for a certain frequency component, and equation (7) is an equation for calculating the average of the correlation values of all frequency components. Therefore, when the coherence COH (K) is small, the correlation between the two signal 03-05-2019 11 components is small, and conversely, when the coherence COH (K) is large, it can be rephrased that the correlation between the two signal components is large. Can. The input signal in the case where the coherence COH (K) is small is said to be a signal whose arrival direction is largely biased to either the right or the left, and that it comes from a direction other than the front direction. On the other hand, it can be said that the input signal in the case where the coherence COH (K) is large has a small deviation in the arrival direction, and is a signal that has arrived from the front direction. [0042] In this case, the coherence COH (K) has a large value when there is no interference sound and the target sound exists, and the interference sound exists and the target sound exists. , Coherence COH (K) is a small value. [0043] If the above behavior is arranged focusing on the presence or absence of the disturbance sound, the following relationship is obtained. ・ When there is no interference sound and “the target sound exists”, the coherence COH (K) becomes a large value, and the front suppression signal becomes a value proportional to the size of the target sound component ・ “disturbance When there is sound, the coherence COH (K) has a small value, and the front suppression signal has a large value. [0044] By the way, in the case of the above behavior, if the correlation coefficient between the front suppression signal and the coherence COH (K) is introduced, the following can be said. -When there is no interference sound, the correlation coefficient has a positive value. When there 03-05-2019 12 is an interference sound, the correlation coefficient has a negative value. Therefore, the presence or absence of the interference sound can be determined only by observing the positive or negative of the correlation coefficient between the front suppression signal and the coherence. Then, using this behavior, when the value of the correlation coefficient between the front suppression signal and the coherence is “positive”, it can be judged as a section of only the target sound from the front direction, and therefore, it is not affected by the disturbing sound. The calibration gain of the sensitivity difference of the microphones m_1 and m_2 can be calculated. In addition, since the target voice section can be detected only by observing the positive and negative values of the correlation coefficient, threshold setting can be facilitated unlike the prior art. [0045] Hereinafter, the process of calculating the correlation coefficient between the front suppression signal and the coherence in the correlation calculation unit 14 will be described in detail with reference to the drawings. [0046] FIG. 3 is a block diagram showing the configuration of the correlation calculation unit 14 according to the embodiment. [0047] In FIG. 3, the correlation calculation unit 14 includes a front suppression signal / coherence acquisition unit 31, a correlation coefficient calculation unit 32, and a correlation coefficient output unit 33. [0048] The front suppression signal / coherence acquisition unit 31 acquires the average front suppression signal AVE_N (K) and the coherence COH (K), and the correlation coefficient calculation unit 32 calculates the average front suppression signal AVE_N (K) and the coherence COH (K And the correlation coefficient cor (K) is calculated. Then, the correlation coefficient output unit 33 outputs the calculated correlation coefficient cor (K) to the calibration gain calculation unit 15. 03-05-2019 13 [0049] Here, the calculation method of the correlation coefficient cor (K) is not limited, but, for example, the calculation method described in Non-Patent Document 1 can be applied. For example, the correlation coefficient cor (K) is determined for each frame using the following equation (9). In the following equation (9), Cov [AVE_N (K), COH (K)] indicates the covariance of the average front suppression signal AVE_N (K) and the coherence COH (K). Further, in the following equation (9), σ AVE_N (K) represents the standard deviation of the average front suppression signal AVE_N (K), and σ COH (K) represents the standard deviation of the coherence COH (K). The correlation coefficient cor (K) thus obtained takes a value of −1.0 to 1.0. [0050] The calibration gain calculation unit 15 acquires the correlation coefficient cor (K) from the correlation calculation unit 14 and observes the positive and negative of the correlation coefficient cor (K), and in a section where the correlation coefficient cor (K) is “positive”. The calibration gains of the microphone m_1 and the microphone m_2 are calculated using only the input signal. [0051] FIG. 4 is a block diagram showing the configuration of the calibration gain calculator 15 according to the embodiment. [0052] In FIG. 4, the calibration gain calculation unit 15 includes a correlation coefficient and input signal reception unit 41, a calibration gain calculation execution determination unit 42, a calibration gain calculation unit 43, a calibration gain storage unit 44, and a calibration gain output unit 45. [0053] The correlation coefficient and input signal acquisition unit 41 acquires the correlation 03-05-2019 14 coefficient cor (K) and the analysis frames of the input signal FRAME1 (K) and FRAME2 (K) from the correlation calculation unit 14. [0054] The calibration gain calculation execution determination unit 42 determines whether the value of the correlation coefficient cor (K) is “positive” or “negative” in order to determine whether to calculate the calibration gain. . That is, when the value of the correlation coefficient cor (K) is "positive", the calibration gain calculation execution determination unit 42 determines that the input signal includes the target sound section in which the disturbance sound is not included, and calculates the calibration gain. It determines that it is a section to be executed. On the other hand, when the value of the correlation coefficient cor (K) is "negative", the calibration gain calculation execution determination unit 42 determines that the section contains a disturbance sound in the input signal and does not execute the calculation of the calibration gain. It determines that it is a section. [0055] The calibration gain calculation unit 43 calculates calibration gains LEVEL_GAIN_1CH and LEVEL_GAIN_2CH with respect to the sensitivity difference between the microphones m_1 and m_2 according to the determination result by the calibration gain calculation execution determination unit 42. [0056] When the calibration gain calculation execution determination unit 42 determines that the correlation coefficient cor (K) is “positive”, the calibration gain calculation unit 43 calculates calibration gains LEVEL_GAIN_1CH and LEVEL_GAIN_2CH. On the other hand, when the calibration gain calculation execution determination unit 42 determines that the correlation coefficient cor (K) is "negative", the calibration gain calculation 03-05-2019 15 unit 43 does not calculate the calibration gain and stores it in the calibration gain storage unit 44. Set the calibration value as the calibration gain. [0057] Here, a method of calculating the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH by the calibration gain calculator 43 will be described. [0058] The calibration gain calculation unit 43 uses the following equations (10, 1), (10, 2), (11), (12, 1) and (12, 2) to calibrate the input signal s1. The gain CALIB_GAIN_1CH and the calibration gain CALIB_GAIN_2CH for the input signal s2 are calculated. [0059] The equation (10, 1) is for calculating the average LEVEL_1 CH of the absolute values of all the components of the current frame (the K-th frame) of the input signal s1 (n) captured by the microphone m_1. The value LEVEL_1CH can be regarded as a value reflecting the sensitivity of the microphone m_1. The equation (10, 2) is for calculating the average LEVEL_2CH of the absolute values of all the components of the current frame (the K-th frame) of the input signal s2 (n) captured by the microphone m_2. The value LEVEL_2CH can be regarded as a value reflecting the sensitivity of the microphone m_2. [0060] Note that, for example, the total value of the absolute values of the components of each frame in a predetermined number of frames may be used as the values LEVEL_1CH and LEVEL_2CH reflecting the microphone sensitivity. Also, for example, a value that reflects the average of the absolute values of all elements (signal components) constituting the latest P (P ≦ K) frames whose correlation coefficient cor (K) is “positive”, a value that reflects the microphone sensitivity It may be used as LEVEL_1CH and 03-05-2019 16 LEVEL_2CH. In the latter case, the current frame (Kth frame) is stored by storing the sum of the absolute values of the latest P-1 frame components whose correlation coefficient cor (K) is "positive". When the information of FRAME 1 (K) and FRAME 2 (K) is given, values LEVEL 1 CH and LEVEL 2 CH reflecting microphone sensitivity can be easily calculated. As described above, by calculating the average or the total value of the absolute values of the signal components over a long period of time, it is possible to calculate the value reflecting the microphone sensitivity while suppressing the influence of the instantaneous fluctuation of the input signal. [0061] The equations (10, 1) and (10, 2) are an example of the equation for calculating the value reflecting the microphone sensitivity, and various other equations can be applied as described above. However, it is required that the calculation formula of the value LEVEL_1CH reflecting the microphone sensitivity of the microphone m_1 and the calculation expression of the value LEVEL_2CH reflecting the microphone sensitivity of the microphone m_2 be the same calculation formula. [0062] Equation (11) calculates the average AVE_LEVEL of the sensitivities LEVEL_1 CH and LEVEL_2 CH of the two microphones m_1 and m_2 as the target sensitivity of the microphones m_1 and m_2 after calibration. The larger or smaller value of the sensitivity LEVEL_1CH and LEVEL_2CH of the two microphones m_1 and m_2 may be set as the target sensitivity. [0063] The equation (12, 1) is calibrated so that the value obtained by multiplying the sensitivity LEVEL_1 CH of the microphone m_1 by the calibration gain CALIB_GAIN_1 CH becomes the target sensitivity AVE_LEVEL, as understood from the equation in which the denominator 03-05-2019 17 LEVEL_1 CH on the right side is transposed to the left side. It is an expression which defines gain CALIB_GAIN_1CH. Similarly, as the expression (12, 2) can be understood by considering the expression in which the denominator LEVEL_2CH on the right side is transposed to the left side, the value obtained by multiplying the sensitivity LEVEL_2CH of the microphone m_2 by the calibration gain CALIB_GAIN_2CH becomes the target sensitivity AVE_LEVEL In the equation, the calibration gain CALIB_GAIN_2CH is determined. [0064] The calibration gain storage unit 44 stores calibration gains CALIB_GAIN_1CH (= INIT_GAIN_1CH) and CALIB_GAIN_2CH (= INIT_GAIN_2CH) which are applied when the calibration gain calculation unit 43 does not calculate calibration gains. As such calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH, the value 1.0 which is not calibrated may be applied, or the latest value calculated by the calibration gain calculator 43 may be applied. [0065] The calibration gain output unit 45 generates the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH calculated by the calibration gain calculation unit 43 or the calibration gains INIT_GAIN_1CH and INIT_GAIN_2CH read from the storage unit 24 respectively corresponding to the calibration gain multiplication units 16 and 17. It is given to [0066] The first calibration gain multiplication unit 16 outputs a post-calibration signal y1 (n) obtained by multiplying the input signal s1 (n) from the microphone m_1 by the calibration gain CALIB_GAIN_1CH. [0067] The second calibration gain multiplication unit 17 outputs a post-calibration signal y2 (n) obtained by multiplying the input signal s2 (n) from the microphone m_2 by the calibration gain CALIB_GAIN_2CH. [0068] (A-2) Operation of Embodiment Next, the operation of the overall processing and calculation processing of the calibration gain in the acoustic signal processing device 10 according to the embodiment will be described in detail with reference to the drawings. 03-05-2019 18 [0069] Input signals s1 (n) and s2 (n) for one frame are input from the microphones m_1 and m_2 to the FFT unit 11 via AD converters (not shown). [0070] The FFT unit 11 performs Fourier transform on analysis frames FRAME1 (K) and FRAME2 (K) based on input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the frequency domain And obtain X 2 (f, K). The signals X 1 (f, K) and X 2 (f, K) generated by the FFT unit 11 are supplied to the front suppression signal generation unit 12 and the coherence calculation unit 13. [0071] The front suppression signal generation unit 12 calculates a front suppression signal N (f, K) having directivity in directions other than the front direction based on the signals X1 (f, K) and X2 (f, K). Then, the front suppression signal generation unit 12 generates an average front suppression signal AVE_N (f, K) obtained by averaging the front suppression signals N (f, K) over all frequencies, and this average front suppression signal AVE_N (K). ) To the correlation calculation unit 14. [0072] On the other hand, the coherence calculation unit 13 calculates the coherence COH based on the signals X1 (f, K) and X2 (f, K), and gives the coherence COH to the correlation calculation unit 14. [0073] The correlation calculation unit 14 obtains the average front suppression signal AVE_N (f, K) and 03-05-2019 19 the coherence COH, and calculates the correlation coefficient cor (K) between the average front suppression signal AVE_N (f, K) and the coherence COH, The correlation coefficient cor (K) is given to the calibration gain calculator 15. [0074] The calibration gain calculation unit 15 acquires the correlation coefficient cor (K), observes the positive or negative of the correlation coefficient cor (K), and according to the determination result, each signal s1 (n) and s2 (n) Calculate the calibration gain for. Further, the calibration gain calculation unit 15 outputs the calibration gain CALIB_GAIN_1CH for the signal s1 (n) to the first calibration gain multiplication unit 16, and outputs the calibration gain CALIB_GAIN_2CH for the signal s2 (n) to the second calibration gain multiplication unit 17. . [0075] FIG. 5 is a flowchart showing the processing operation in the calibration gain calculation unit 15. [0076] The correlation coefficient and input signal acquisition unit 41 acquires the correlation coefficient cor (K) from the correlation coefficient unit 14, and generates FRAME1 (K) and FRAME2 (K) of the input signals s1 (n) and s2 (n). Acquire (S51). [0077] Then, the calibration gain calculation execution determination unit 42 determines whether the value of the correlation coefficient cor (K) is positive or negative (S52). [0078] When the correlation coefficient cor (K) is positive, there is no interference sound coming from directions other than the front direction, and it is regarded as the target sound section from the front direction, and the calibration gain calculation unit 43 calculates the correlation coefficient cor (K ), FRAME 1 (K) and FRAME 2 (K), signals according to equations (10, 1), (10, 2), (11), (12, 03-05-2019 20 1), (12, 2) The calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH for s1 (n) and the signal s2 (n) are calculated (S53). At this time, the calibration gain calculation unit 43 stores the calculated calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH in the calibration gain storage unit 44, and updates the calibration gains stored in the calibration gain storage unit 44. [0079] When the correlation coefficient cor (K) is negative, it is considered that there is an interference sound coming from a direction other than the front direction, and the calibration gain calculation unit 43 calculates the value stored in the calibration gain storage unit 44 as the calibration gain CALIB_GAIN_1CH and It is set as CALIB_GAIN_2CH (S54). [0080] That is, when the initial values INIT_GAIN_1CH and INIT_GAIN_2CH of the calibration gain are stored in the calibration gain storage unit 44, INIT_GAIN_1CH is CALIB_GAIN_1CH, and INIT_GAIN_2CH is CALIB_GAIN_2CH. Alternatively, when the latest calibration gain is stored in the calibration gain storage unit 44, the latest calibration gains stored in the calibration gain storage unit 44 are set as the present calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH. [0081] Then, the calibration gain output unit 45 outputs the calibration gain CALIB_GAIN_1CH to the first calibration gain multiplication unit 16, and outputs the calibration gain CALIB_GAIN_2CH to the second calibration gain multiplication unit 17 (S55). Then, the calibration gain calculation unit 15 updates the index K (S56), proceeds to S51, and performs calculation processing of the calibration gain of the next index. 03-05-2019 21 [0082] Here, since the calibration gain calculation unit 15 does not change the calibration gain after calculating the calibration gain once, continuously updating the calibration gain regularly is a waste of the operation amount, so the calibration gain is updated from the middle You may stop it. That is, in an environment in which the acoustic signal processing apparatus 10 having the microphones m_1 and m_2 is used, there is no need to regularly update the calibration gains after acquiring the calibration gains for the microphones m_1 and m_2 in the initial stage. It may be performed when necessary to calculate the calibration gain. [0083] Then, the first calibration gain multiplication unit 16 multiplies the signal s1 (n) by the calibration gain CALIB_GAIN_1CH to output a calibrated signal y1 (n), and the second calibration gain multiplication unit 17 outputs the signal s2 (n) as the signal s2 (n). The calibration gain CALIB_GAIN_2CH is multiplied, and a signal y2 (n) after calibration is output. [0084] (A-3) Effects of the Embodiment As described above, according to this embodiment, when there is an interference sound arriving from a direction other than the front direction, the correlation coefficient between the front suppression signal and the COH is negative. By using the characteristic behavior that the correlation coefficient between the frontal suppression signal and the COH is positive when there is no disturbing sound, the threshold setting for the designer is not affected by the disturbing voice. An easy microphone sensitivity calibration method can be realized. [0085] As a result, by applying the process of calculating the calibration gains for the microphones m_1 and m_2 to the preprocessing of various signal processing methods using the microphone array, it is possible to expect the improvement of the voice processing performance thereafter. [0086] (B) Other Embodiments In the above-described embodiment, various modified embodiments are mentioned, but the present invention can be applied to the following modified embodiments. 03-05-2019 22 [0087] (B-1) In the above-described embodiment, the correlation calculation unit exemplifies the case where the correlation coefficient is calculated as the feature amount of the front suppression signal and the coherence. However, the covariance is calculated as the feature amount of the front suppression signal and the coherence. The same effect as the above-described embodiment can be obtained by calculating the value of. [0088] (B-2) In the embodiment described above, the acoustic signal processing device according to the present invention is applied to various devices as long as it is a device having a voice processing function (for example, voice recognition processing etc.) provided with a plurality of microphones. For example, the present invention can be widely applied to smartphones, tablet terminals, video conference terminals, car navigation systems, call center terminals, robots, devices using sound signals as sensor signals, and the like. [0089] Also, for example, the acoustic signal processing apparatus of the present invention may be mounted on an apparatus having a communication function, and the apparatus may transmit a signal after calibration to a server having a predetermined audio processing function through a network. [0090] Furthermore, for example, an apparatus having a communication function including a plurality of microphones may transmit input signals of the respective microphones to a server on which the acoustic signal processing device of the present invention is mounted through a network. In this case, the server equipped with the acoustic signal processing apparatus can calculate the calibration gain for each input signal according to the correlation coefficient between the front suppression signal and the coherence, as in the above-described embodiment. [0091] (B-3) Although the case where there are two microphones is illustrated in the embodiment 03-05-2019 23 described above, the present invention can be applied to a device that acquires an input signal from each of three or more microphones. [0092] Reference Signs List 10 acoustic signal processing device m_1 and m_2 microphone 11 FTT unit 12 front suppression signal generation unit 13 coherence calculation unit 14 correlation calculation unit 31 front suppression signal coherence acquisition unit 32 ... Correlation coefficient calculation unit, 33 ... Correlation coefficient output unit, 15 ... Calibration gain calculation unit, 41 ... Correlation coefficient and input signal acquisition unit, 42 ... Calibration gain calculation execution judgment unit, 43 ... Calibration gain calculation unit, 44 ... Calibration gain storage unit, 45 ... Calibration gain output unit, 16 ... First calibration gain multiplication unit, 17 ... Second calibration gain multiplication unit. 03-05-2019 24

1/--страниц