Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2018142826 Abstract: When suppressing or subtracting an unintended sound from an input signal, the sound quality of the intended sound is improved, and a suppression coefficient or a subtraction coefficient is controlled with a low processing load. An unintended sound suppression device according to the present invention is a frontal suppression having a blind spot on the front based on the difference between a plurality of frequency domain input signals obtained by converting each input signal from a time domain to a frequency domain. A front suppression signal generation unit that generates a signal, a coherence calculation unit that calculates coherence based on signals obtained from a plurality of input signals, and a feature amount calculation that calculates a feature that indicates the relationship between the front suppression signal and the coherence The coefficient relating to the suppression of the non-target sound included in the input signal is set using the feature amount indicating the relationship between the part and the front suppression signal and the coherence, and the non-target sound included in the input signal is And an unintended sound suppression processing unit for obtaining the suppressed signal after suppression processing. [Selected figure] Figure 1 Non-target sound suppression device, method and program [0001] The present invention relates to an unintended sound suppression apparatus, method, and program, and can be applied to, for example, a communication apparatus or communication software using speech such as a telephone or a teleconference system, or acoustic signal processing used in preprocessing of speech recognition processing. It is. [0002] 03-05-2019 1 In recent years, devices equipped with various voice processing functions such as a voice call function and a voice recognition function, such as smartphones and car navigation systems, have become widespread. However, with the widespread use of these devices, voice processing functions are being used under harsher noise environments than before, such as in a busy city or in a moving car. Therefore, there is a growing demand for signal processing technology that can maintain call sound quality and speech recognition performance even in a noisy environment. [0003] Noises that interfere with the performance of the voice processing function include, for example, crowded streets, background noises such as running noises of cars, and disturbing sounds (eg, disturbed voices such as the voices of people other than the user of the voice processing function) Can be divided into Various effective suppression methods have been proposed on the premise that the background noise has a stationary frequency characteristic and power (see Patent Documents 1 to 3 and Non-patent Document 1). [0004] JP-A-2010-532879 JP-A-2014-106337 JP-A-2014-164191 [0005] Hiraoka Kazuyuki, Hori Gen., "Probability Statistics for Programming", Ohmsha, published on October 23, 2009 [0006] However, as described above, due to the rapid expansion of the use environment of the audio signal processing function, the background noise is also increasing when it is not steady. Therefore, there is a need for a background noise suppression method that can quickly follow 03-05-2019 2 fluctuations in the characteristics of background noise. However, when background noise is suppressed in a signal section in which an interference sound is present, the signal component of the target sound is also dropped and the sound quality is improved. Deterioration may occur. [0007] Further, in Patent Document 3, a signal obtained by suppressing a component coming from the front from an input signal (referred to as a front suppression signal). The technique of suppressing the disturbance sound coming from the surroundings by subtracting) is disclosed, but at the time of subtraction, the strength of the subtraction is often controlled by multiplying the front suppression signal by the subtraction coefficient. When the coefficient is too large, the suppression performance is excessive and distortion of the target sound is increased, and when it is too small, the suppression performance of the interference sound is insufficient, and the sound quality is greatly affected. However, it is difficult to determine the presence of the interference sound superimposed on the target sound, and it is difficult to set the subtraction coefficient to an appropriate value. [0008] Therefore, in view of the above problems, when suppressing or subtracting an unintended sound from an input signal, an unintended sound suppressing device capable of improving the sound quality of the intended sound, suppressing the processing load, and controlling the suppression coefficient or the subtraction coefficient. , Methods and programs are needed. [0009] In order to solve such problems, the non-target sound suppression apparatus according to the first aspect of the present invention comprises: (1) a plurality of input signals obtained from each of a plurality of microphones obtained by converting from time domain to frequency domain A front suppression signal generation unit that generates a front suppression signal having a dead angle in front based on a difference between frequency domain input signals; (2) a coherence calculation unit that calculates coherence based on signals obtained from a plurality of input signals; (3) A feature quantity calculation unit that calculates a feature quantity that indicates the relationship between the front suppression signal and the coherence, and (4) a feature quantity that indicates the relationship between the front suppression signal and the coherence, to be included in the input signal A non-target sound suppression processing unit which sets a coefficient relating to the suppression of the non-target sound, and uses the coefficient to 03-05-2019 3 suppress the non-target sound included in the input signal to obtain a post-suppression processing signal; . [0010] According to a second object sound suppression method of the present invention, (1) a plurality of frequencies obtained by the front suppression signal generation unit converting each input signal from each of the plurality of microphones from the time domain to the frequency domain Based on the difference between the region input signals, a front suppression signal having a dead angle in front is generated, (2) the coherence calculation unit calculates the coherence based on the signals obtained from the plurality of input signals, and (3) feature amount The calculation unit calculates a feature amount indicating the relationship between the front suppression signal and the coherence, and (4) the non-target sound suppression processing unit uses the feature amount indicating the relationship between the front suppression signal and the coherence to input It is characterized in that a coefficient relating to suppression of an unintended sound contained in a signal is set, and a signal after suppression processing in which an unintended sound contained in an input signal is suppressed using the coefficient is obtained. [0011] A non-target sound suppression program according to a third aspect of the present invention is a computer-implemented program for converting a plurality of frequency domain input signals obtained by converting each input signal from each of a plurality of microphones from a time domain to a frequency domain. A front suppression signal generation unit that generates a front suppression signal having a blind spot on the front based on the difference; (2) a coherence calculation unit that calculates coherence based on signals obtained from a plurality of input signals; A feature quantity calculation unit that calculates a feature quantity that indicates the relationship between the suppression signal and the coherence, and (4) a feature quantity that indicates the relationship between the front suppression signal and the coherence, the non-target sound included in the input signal It is characterized in that a coefficient relating to suppression is set, and the function is performed as an unintended sound suppression processing unit that obtains an after-suppression processing signal that suppresses unintended sound included in the input signal using the coefficient. [0012] According to the present invention, when suppressing or subtracting an unintended sound from an input signal, it is possible to control the suppression coefficient or the subtraction coefficient with good processing quality of the target sound and low processing load. 03-05-2019 4 [0013] 1 is a block diagram showing an entire configuration of an unintended sound suppression device according to a first embodiment. It is an explanatory view explaining an example of arrangement of a microphone concerning an embodiment. It is a figure which shows the characteristic of the directivity signal applied with the acoustic signal processing apparatus which concerns on embodiment. It is a block diagram which shows the structure of the WF part which concerns on 1st Embodiment. It is a flowchart which shows the process in the time constant control part of the WF part which concerns on 1st Embodiment. It is a block diagram which shows the whole structure of the non-objective sound suppression apparatus which concerns on 2nd Embodiment. It is a block diagram which shows the structure of the frequency subtraction process part which concerns on 2nd Embodiment. It is a flowchart which shows the process in the time constant control part 23 of the frequency subtraction process part which concerns on 2nd Embodiment. [0014] (A) First Embodiment Hereinafter, a first embodiment of the non-target sound suppression apparatus, method and program according to the present invention will be described in detail with reference to the drawings. [0015] In the first embodiment, a background noise suppressing apparatus and method for quickly following fluctuations in non-stationary background noise characteristics by rapid expansion of the use environment of the audio signal processing function by using the present invention (nontarget sound suppressing apparatus And method). 03-05-2019 5 [0016] Here, when the background noise suppression function is used in an environment where disturbance noise occurs in the surroundings, the coefficient adaptation operation may be erroneously performed in a signal section in which the disturbance noise is present. At this time, the feature of the human voice called the disturbance sound is also referred to as a background noise suppression coefficient (hereinafter, referred to as a “suppression coefficient”. When the suppression processing is performed using the coefficient, the signal component of the target sound is also dropped, and the sound quality may be degraded. [0017] Therefore, in the first embodiment, in order to prevent the above phenomenon, the variation of background noise is continuously monitored while suppressing the influence of the target sound and the disturbance sound, and the adaptive operation of the background noise suppression coefficient is performed based on the result. An apparatus and method for suppressing nontarget sound that can control [0018] (A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing the overall configuration of a non-target sound suppressing device 1 according to the first embodiment. [0019] As shown in FIG. 1, a plurality (two in FIG. 1) of the non-target sound suppression device 1 are shown. Input signals s1 (n) and s2 (n) from the microphones m_1 and m_2). 03-05-2019 6 Here, n is an index indicating the sample input order, and is represented by a positive integer. In the following, it is assumed that the smaller n is the older input sample and the larger is the newer input sample. [0020] The non-target sound suppression apparatus 1 sets parameters (variables) for suppressing background noise following variations in characteristics of background noise based on the respective input signals obtained from the microphones m_1 and m_2, and suppresses the background noise. The post-suppression signal is supplied to the audio processing device 2 in the subsequent stage. [0021] The voice processing device 2 performs predetermined voice processing using the signal after suppression from the non-target sound suppression device 1. The processing content in the voice processing device 2 is not particularly limited, and various processing can be applied. For example, voice communication processing or voice recognition processing in a telephone terminal or a video conference system may be performed. Good. It should be noted that the non-target sound suppression device 1 and the voice processing device 2 may be connected as long as they can transmit and receive signals, and may be connected by wiring of the circuit, or, for example, via a wired line or a wireless line. It may be capable of transmitting and receiving signals by network communication. [0022] FIG. 2 is an explanatory view for explaining an arrangement example of the microphones m_1 and m_2. [0023] As shown in FIG. 2, the microphones m_1 and m_2 are arranged such that the plane including the two microphones m_1 and m_2 is perpendicular to the direction in which the target sound arrives (the direction of the sound source of the target sound) I assume. 03-05-2019 7 In the following, as shown in FIG. 2, the arrival direction of the target sound is referred to as the forward direction or the front direction as viewed from the position between the two microphones m_1 and m_2. Also, in the following, as shown in FIG. 2, when referred to as rightward, leftward, and backward, each direction when the direction of arrival of the target sound is viewed from the position between the two microphones m_1 and m_2 is indicated. It explains as a thing. In this embodiment, it is assumed that the target sound comes from the front direction of the microphones m_1 and m_2 and the non-target sound including the interference sound from the left and right direction (lateral direction). [0024] As shown in FIG. 1, the unintended sound suppression device 1 includes an FFT unit 11, a front suppression signal generation unit 12, a coherence calculation unit 13, a correlation and modGI calculation unit 14, a WF (winer filter) unit 15, and an IFFT unit 16. Have. [0025] The non-target sound suppression device 1 may be realized by installing a program (for example, a non-target sound suppression program) in a computer having a processor, a memory, etc. In this case, the non-target sound suppression device 1 is functional Can be shown using FIG. Note that a part or all of the non-target sound suppression device 1 may be realized as hardware. [0026] The FFT unit 11 receives input signals s1 and s2 from the microphones m_1 and m_2 through AD converters (not shown) and performs fast Fourier transform (or discrete Fourier transform) on the input signals s1 and s2. . Thus, the input signals s1 and s2 are represented in the frequency domain. [0027] 03-05-2019 8 Note that the FFT unit 11 performs the fast Fourier transform, and comprises an analysis Fourier FRAME 1 (K) and a predetermined N (N is an arbitrary integer) samples from the input signals s 1 (n) and s 2 (n). Assume that FRAME 2 (K) is configured. An example of constructing FRAME 1 from input signal s 1 is shown in the following equation (1). [0028] In the equation (1), K is an index representing the order of frames, and is represented by a positive integer. In the following, the smaller the value of K, the older the analysis frame, and the larger the value of K, the newer the analysis frame. Further, in the following description, it is assumed that the index representing the latest analysis frame to be analyzed is K unless otherwise specified. [0029] [0030] The FFT unit 11 performs fast Fourier transform processing for each analysis frame to input a frequency domain signal X1 (f, K) obtained by performing Fourier transform on an analysis frame FRAME 1 (K) configured from the input signal s1, and The frequency domain signal X2 (f, X) obtained by Fourier transforming to the analysis frame FRAME2 (K) composed of the signal s2 is supplied to the front suppression signal generator 12 and the coherence calculator 13. [0031] Here, f is an index representing a frequency. Also, the frequency domain signal X1 (f, K) is not a single value, but is composed of m (m is an arbitrary integer) spectral components of a plurality of frequencies f1 to fm as in equation (2) It is assumed that [0032] 03-05-2019 9 [0033] In the above equation (2), X1 (f, K) is a complex number, and is composed of a real part and an imaginary part. The same applies to X2 (f, K) and the front suppression signal N (f, K) described in the front suppression signal generator 12 described later. [0034] The front suppression signal generation unit 12 performs processing of suppressing the signal component in the front direction for each frequency with respect to the signal supplied from the FFT unit 11. In other words, the front suppression signal generation unit 12 functions as a directional filter that suppresses components in the front direction. [0035] For example, as illustrated in FIG. 3, the front suppression signal generation unit 12 uses an 8shaped bi-directional filter having a dead angle in the front direction to generate a component in the front direction from the signal supplied from the FFT unit 11. Form a directional filter that suppresses [0036] Specifically, the front suppression signal generation unit 12 performs the calculation as shown in the following equation (3) based on the signals X1 (f, K) and X2 (f, K) supplied from the FFT unit 11. And generates a front suppression signal N (f, K) for each frequency. The calculation of the following equation (3) corresponds to the process of forming an 8-shaped bi-directional filter having a dead angle in the front direction as shown in FIG. 03-05-2019 10 Ｎ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）−Ｘ２（ｆ，Ｋ） …（３） [0037] As described above, the front suppression signal generation unit 12 acquires each frequency component of the frequencies f1 to fm (power for one frame of each frequency band). [0038] Further, the front suppression signal generation unit 12 calculates an average front suppression signal AVE_N (K) obtained by averaging the front suppression signal N (f, K) over all the frequencies f1 to fm according to the equation (4). Do. [0039] [0040] The coherence calculation unit 13 calculates a coherence COH (K) by forming a signal having strong directivity in a specific direction included in the frequency domain signals X1 (f, K) and X2 (f, K) from the FFT unit 11. . [0041] Here, the process of calculating the coherence COH (K) in the coherence calculation unit 13 will be described. [0042] The coherence calculator 13 processes the signal B1 (f, K) processed by the filter having strong directivity in the first direction (for example, the left direction) from the frequency domain signals X1 (f, K) and X2 (f, K). Also, the coherence calculation unit 13 processes the frequency domain signals X1 (f, K) and X2 (f, K) with a signal B2 (f , K). As a method of forming the signals B1 (f) and B2 (f) having strong directivity in a specific direction, the existing method can be applied. Here, the following equation (5) is applied to the first direction An example is shown in which the signal B1 having strong directivity is formed and the signal B2 having strong directivity in the second direction is formed by applying the 03-05-2019 11 following equation (6). [0043] [0044] In the above equations (5) and (6), S represents a sampling frequency, N represents an FFT analysis frame length, τ represents a sound wave arrival time difference between the microphone m_1 and the microphone m_2, i represents an imaginary unit, and f represents a frequency. . [0045] Next, the coherence calculation unit 13 performs the following operations (7) and (8) on the signals B1 (f) and B2 (f) obtained as described above. We obtain the coherence COH (K). Here, B2 (f, K) <*> in the equation (7) is a conjugate complex number of B2 (f, K). [0046] [0047] coef (f, K) is a coherence at a component of a frame having an index of an arbitrary index K (an arbitrary frequency f (a frequency of any of frequencies f1 to fm) constituting the analysis frames FRAME1 (K) and FRAME2 (K)) It is assumed that [0048] When coef (f, K) is determined, if the directivity of the signal B1 (f) and the directivity of the signal B (f) are different from each other, the signals B1 (f) and B2 ( The directivity direction according to f) may be any direction other than the front direction. Further, the method of calculating coef (f, K) is not limited to the above-described calculation 03-05-2019 12 method. [0049] The correlation and mod GI calculation unit 14 acquires the front suppression signal N (f, N) (average front suppression signal AVE_N (K)) having directivity other than the front and the coherence COH (K), and the average front suppression signal A correlation coefficient cor (K), which is a feature indicating the relationship between AVE_N (K) and coherence COH (K), is calculated. [0050] In addition, the correlation and modGI calculation unit 14 uses the correlation coefficient cor (K) to represent a feature amount (cor_modGI (K)) that indicates the magnitude of the positive / negative fluctuation of the slope of the amplitude of the correlation coefficient cor (K). Is calculated, and the special amount (cor_modGI (k)) is output to the WF unit 15. [0051] First, the principle of detecting the signal section in which the disturbing sound exists based on the correlation coefficient cor (K) between the average front suppression signal AVE_N (K) and the coherence COH (K) in the correlation and ModGI calculation unit 14 will be described. Do. [0052] Here, a sound source emitting a target sound is present in the front direction of the microphone m_1 and the microphone m_2, and interference noise is generated from directions other than the front direction (for example, lateral directions of the microphone m_1 and the microphone m_2 (ie, left direction, right direction) Shall arrive. [0053] For example, in the case where “a disturbance sound does not exist” and “a target sound exists”, the front suppression signal N (f, K) has a signal value proportional to the magnitude of the target sound component. However, as shown in FIG. 2, the gain in the front direction is smaller than the gain in the lateral direction, and thus has a smaller value than when there is an interference sound. 03-05-2019 13 [0054] Further, the coherence COH (K) is a feature having a deep relationship with the incoming direction of the input signal, and can be rephrased as the correlation between two signal components. This is because the equation (6) is an equation for calculating the correlation with respect to a certain frequency component, and the equation (7) is an equation for calculating the average of the correlation values of all frequency components. When K) is small, it can be said that the correlation between the two signal components is small, and conversely, when the coherence COH (K) is large, it can be said that the correlation between the two signal components is large. The input signal in the case where the coherence COH (K) is small is said to be a signal whose arrival direction is largely deviated to either the right direction or the left direction, and is from a direction other than the front direction. On the other hand, it can be said that the input signal in the case where the coherence COH (K) is large has a small deviation in the arrival direction, and is a signal that has arrived from the front direction. [0055] In this case, the coherence COH (K) has a large value when there is no interference sound and the target sound exists, and the interference sound exists and the target sound exists. , Coherence COH (K) is a small value. [0056] If the above behavior is arranged focusing on the presence or absence of the disturbance sound, the following relationship is obtained. -When there is no interference sound and the target sound is present, the coherence COH (K) 03-05-2019 14 takes a large value, and the front suppression signal N (f, K) (average front suppression signal AVE_N (K)) Is a value proportional to the size of the target sound component. When the “interference sound exists”, the coherence COH (K) has a small value, and the front suppression signal N (f, K) (average front suppression signal AVE_N (K)) has a large value. [0057] By the way, in the case of the above behavior, if the correlation coefficient cor (K) between the front suppression signal N (f, K) (average front suppression signal AVE_N (K)) and the coherence COH (K) is introduced, It can be said that. The correlation coefficient cor (K) is a positive value (cor (K)> 0) when “interference sound is not present”. The correlation coefficient cor (K) is a negative value (cor (K) ≦ 0) when “interference sound is present”. [0058] Therefore, the correlation and modGI calculation unit 14 observes the positive and negative of the correlation coefficient cor (K) between the average front suppression signal AVE_N (K) and the coherence COH (K), and the correlation coefficient cor (K) is positive. It can be determined that no disturbing sound is present, and it can be determined that a disturbing sound is present if the correlation coefficient cor (K) is negative. [0059] Here, although the calculation method of the correlation coefficient cor (K) is not limited, for example, the correlation coefficient cor (K) can be calculated for each frame using the following equation (9) . [0060] 03-05-2019 15 In the following equation (9), cov [AVE_N (K), COH (K)] indicates the covariance of the average front suppression signal AVE_N (K) and the coherence COH (K). Further, in the following equation (9), σ AVE_N (K) indicates the standard deviation of the average front suppression signal AVE_N (K), and σ COH (K) indicates the standard deviation of the coherence COH (K). Furthermore, when the correlation coefficient cor (K) is obtained by the following equation (9), the results of a predetermined number i of frames processed most recently for AVE_N (K) and COH (K) are used: The standard deviation or covariance may be determined. Specifically, in the process of obtaining the correlation coefficient cor (K) in the following (9), for example, i frames processed most recently (K-i-th frame, K- (i-1) Standard deviation (σ N (f, K) and σ COH (K)), using COH (K) and AVE_N pertaining to each of the The covariance (cov [AVE_N (K), COH (K)]) may be determined. In other words, in the process of obtaining the correlation coefficient cor (K), the standard deviation and covariance in the following equation (9) may be determined using the i pieces of AVE_N and COH determined most recently as samples. . The correlation coefficient cor (K) thus obtained takes a value of −1.0 to 1.0. [0061] [0062] Next, in the correlation and mod GI calculation unit 14, using the correlation coefficient cor (K), a feature value representing the magnitude of the positive / negative fluctuation of the slope of the amplitude of the correlation coefficient cor (K) is calculated. [0063] When background noise is present in the input signal, the behavior of the correlation coefficient cor (K) changes as follows. 03-05-2019 16 [0064] ・ If an interference sound is present, the value of the correlation coefficient cor (K) becomes positive, and if the interference sound is not present, the value of the correlation coefficient cor (K) becomes negative. Macroscopic behavior is maintained to some extent . [0065] · The degree of fluctuation of the amplitude of the front suppression signal (average front suppression signal AVE_N (K)) increases due to the influence of background noise while the coherence COH (K) decreases the dynamic range Therefore, the irregularity of the amplitude does not change extremely. For this reason, the synchronization between the increase / decrease of the front suppression signal (average front suppression signal AVE_N (K)) and the increase / decrease of the coherence COH (K) is lost, and the correlation (correlation coefficient cor (K)) increases or decreases. Fluctuation of the In addition, the frequency of positive and negative fluctuations of the correlation coefficient cor (K) increases. [0066] That is, as the influence of the background noise increases, the increase and decrease of the value of the correlation coefficient cor (K) and the positive and negative change frequency of the value of the correlation coefficient cor (K) increase. [0067] Thus, when background noise is present, the frequency of increase or decrease of the value of the correlation coefficient cor (K) and the frequency of positive and negative fluctuations increase, and as the influence of the background noise increases, these fluctuations (ie, the correlation The increase and decrease of the value of the number cor_ (K) and the positive / negative fluctuation) become large. 03-05-2019 17 This behavior is derived only from background noise. Therefore, by observing the fluctuation in the value of the correlation coefficient cor (K), it is possible to estimate the influence of background noise on the target sound and the fluctuation of characteristics without being affected by the target sound and the interference sound. be able to. [0068] Therefore, in the first embodiment, the correlation and modGI calculating unit 14 uses a feature amount called modGI (GI: Gradient Index) to observe increase / decrease and positive / negative fluctuation of the value of the correlation coefficient cor (K). calculate. [0069] Here, modGI is an index for measuring the number of times the inclination direction of the signal waveform changes and its magnitude (see Patent Document 2). modGI is defined as the power of the second-order difference of the calculation target signal normalized with the power of the calculation target signal for an arbitrary signal of the feature amount calculation target. [0070] In the first embodiment, the correlation and modGI calculation unit 14 calculates modGI according to the calculation method described in Patent Document 2. As an example of the calculation formula of modGI defined as described above, the correlation and modGI calculation unit 14 uses the following equation (10) to calculate the feature quantity that indicates the degree of fluctuation of the correlation coefficient cor (K) Calculate cor_modGI (K). [0071] 03-05-2019 18 [0072] Equation (10) represents the frequency at which the positive and negative slopes of the correlation coefficient cor (K) fluctuate. Equation (10) is characterized in that the value of cor_modGI decreases as the positive / negative fluctuation of the signal slope decreases, while the value of cor_modGI increases as the positive / negative fluctuation of the slope increases. In other words, the larger the value of cor_modGI, the larger the influence of background noise, and the smaller the value of cor_modGI, the smaller the influence of background noise. [0073] The WF unit 15 sets the value of the time constant (λ) for controlling the adaptation speed of the suppression coefficient wf_coef (f, K) based on the correlation and the value of cor_modGI (K) from the modGI calculation unit 14, and this time constant The suppression coefficient wf_coef (f, K) is calculated using the value of. [0074] In addition, the WF unit 15 multiplies the frequency domain signal X1 (f, K) of the input signal by the suppression coefficient wf_coef (f, K) to calculate the signal Y (f, K) after suppression processing, and performs an IFFT unit. Output to 16 [0075] FIG. 4 is a block diagram showing the configuration of the WF unit 15 according to the first embodiment. [0076] As shown in FIG. 4, the WF unit 15 according to the first embodiment includes an input signal acquisition unit 21, a time constant control unit 23, a coefficient adaptation unit 24, a background noise suppression processing unit 25, and a signal output unit 26 after suppression processing. Have. 03-05-2019 19 [0077] The input signal acquisition unit 21 acquires the frequency domain signal X 1 (f, K) of the input signal and the cor_modGI (K) from the correlation and modGI calculation unit 14. [0078] The time constant control unit 23 sets the value of the time constant λ for controlling the adaptation speed of the suppression coefficient wf_coef (f, K) based on the correlation and the value of cor_modGI (K) from the modGI calculation unit 14. [0079] Here, the role of the time constant λ will be briefly described. In the WF unit 15, the suppression coefficient adaptation unit 24 described later calculates the suppression coefficient wf_coef (f, K), but prior to this, it is necessary to calculate the background noise characteristic for each frequency. The estimation of background noise is performed, for example, according to Equation 1 of Patent Document 1, and a parameter (time constant) λ is involved here. [0080] The time constant λ has a value of 0.0 to 1.0, and has a role of controlling how much the instantaneous input value is reflected on the background noise characteristic. As the value of the time constant λ is larger, the influence of the instantaneous input becomes stronger, and as the value of the time constant λ is smaller, the influence of the instantaneous input becomes less. Therefore, if the value of the time constant λ is large, the value of the suppression coefficient wf_coef (f, K) is strongly reflected at the input at that moment, and high-speed coefficient adaptation can be realized, while the influence of the instantaneous input becomes strong. The 03-05-2019 20 variation of the coefficient value becomes large, which may reduce the naturalness of the sound quality. On the other hand, when the value of time constant λ is small, although the adaptation speed is slow, the suppression coefficient wf_coef (f, K) obtained is not strongly influenced by the instantaneous characteristics, and the noise characteristics in the past are reflected on average. It is difficult to lose the natural quality of sound quality. [0081] Therefore, when the value of cor_mod (K) is larger than the threshold Θ (for example, when cor_mod (K) is greater than or equal to the threshold)), the time constant control unit 23 has a large influence of background noise. Make the value a large value. On the other hand, when the value of cor_mod (K) is smaller than the threshold Θ (for example, when cor_mod (K) is smaller than the threshold)), the time constant control unit 23 reduces the value of the time constant Make This makes it possible to realize coefficient adaptation according to the characteristics of the background noise without being influenced by the target sound and the disturbance sound. [0082] Here, although the case where the threshold value θ for determining the value of the time constant λ is one is exemplified, two or more threshold values may be set, and the time constant may be finely adjusted for each section to which cor_modGI belongs. λ may be set. [0083] The suppression coefficient adaptation unit 24 uses the time constant λ set by the time constant control unit 23 to calculate the suppression coefficient wf_coef (f, K). The suppression coefficient wf_coef (f, K) can be obtained, for example, using Equation 3 of Patent Document 1. 03-05-2019 21 [0084] The background noise suppression processing unit 25 converts the suppression coefficient wf_coef (f, K) calculated by the suppression coefficient adaptation unit 24 into the frequency domain signal X 1 (f, K) of the input signal using the following equation (11): The multiplication process is performed to calculate the post-suppression signal Y (f, K). Ｙ（ｆ，Ｋ）＝Ｘ１（ｆ，Ｋ）×ｗｆ＿ｃｏｅｆ（ｆ，Ｋ） …（１１） [0085] The post-suppression signal output unit outputs the post-suppression signal Y (f, K) to the IFFT unit 16. [0086] The IFFT unit 16 converts the signal Y (f, K) which is a frequency domain signal into a time domain signal y (n). Note that the IFFT unit 16 may be omitted as long as the subsequent stage circuit can be configured to process the frequency domain signal Y (f, K) as it is. [0087] (A-2) Operation of First Embodiment Next, the operation of non-target sound suppression processing in the non-target sound suppression device 1 according to the first embodiment will be described in detail with reference to the drawings. [0088] First, input signals s1 (n) and s2 (n) for one frame (one processing unit) are supplied to the FFT unit 11 from the microphones m_1 and m_2 via an AD converter (not shown). The FFT unit 11 performs Fourier transform on analysis frames FRAME1 (K) and FRAME2 (K) 03-05-2019 22 based on input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the frequency domain , X2 (f, K). The signals X 1 (f, K) and X 2 (f, K) generated by the FFT unit 11 are supplied to the front suppression signal generation unit 12 and the coherence calculation unit 13. [0089] The front suppression signal generator 12 calculates the front suppression signal N (f, K) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11. Then, the front suppression signal generation unit 12 calculates an average front suppression signal AVE_N (K) based on the front suppression signal N (f, K), and supplies it to the correlation and modGI calculation unit 14. [0090] The coherence calculator 13 generates the coherence COH (K) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11, and supplies the coherence COH (K) to the correlation and modGI calculator 14. [0091] The correlation and mod GI calculation unit 14 calculates the correlation coefficient cor (K), which is a feature indicating the relationship between the average front suppression signal AVE_N (K) and the coherence COH (K), using, for example, the equation (9). Do. [0092] In addition, the correlation and modGI calculation unit 14 uses the correlation coefficient cor (K) to represent cor_modGI (K), which is a feature representing the magnitude of the positive / negative fluctuation of the slope of the amplitude of the correlation coefficient cor (K). Is calculated and this cor_modGI (K) is given to the WF unit 15. [0093] The WF unit 15 receives cor_modGI (K) from the correlation and modGI calculation unit 14 and the frequency domain signal X1 (f, K) of the input signal. [0094] FIG. 5 is a flowchart showing processing in the time constant control unit 23 of the WF unit 15 03-05-2019 23 according to the first embodiment. [0095] First, the time constant control unit 23 compares the value of cor_modGI (K) from the correlation and modGI calculation unit 14 with the threshold Θ (S101), and when the value of cor_modGI (K) is larger than the threshold Θ, the time constant λ Is set to a large value (S102), and when the value of cor_modGI (K) is less than the threshold Θ, the value of the time constant λ is set to a small value (S102). [0096] The time constant λ takes a value of 0.0 <λ <1.0, and as the value of the time constant λ approaches 1.0, it is strongly influenced by the signal inputted at the moment, As the value of the time constant λ approaches 0.0, the influence of the signal input at the moment becomes weaker. Therefore, the value of the time constant λ can be set to a relative magnitude based on the comparison result of the value of cor_modGI (K) and the threshold value Θ. Therefore, when the value of cor_modGI (K) is less than the threshold Θ, the value of the time constant λ is λ1, and the value of the time constant λ when the value of cor_modGI (K) is the threshold Θ or more is λ2, λ1 <λ2 It is sufficient if the relationship is large and small. [0097] Then, the suppression coefficient adaptation unit 24 uses the time constant λ set by the time constant control unit 23 to calculate the suppression coefficient wf_coef (f, K). [0098] That is, as the value of the time constant λ is larger, it is possible to calculate the fast suppression coefficient wf_coef (f, K) in which the influence of the instantaneous input is strongly reflected. 03-05-2019 24 On the other hand, if the value of the time constant λ is small, the influence of the instantaneous input diminishes, and although the adaptation speed of the suppression coefficient wf_coef (f, K) is slow, the suppression coefficient wf_coef (f, K) obtained is the influence of the instantaneous characteristics. And the noise characteristics of the past are reflected on average. Therefore, in this case, the naturalness of the sound quality is not easily lost. [0099] In addition, the background noise suppression processing unit 25 converts the suppression coefficient wf_coef (f, K) calculated by the suppression coefficient adaptation unit 24 into the frequency domain signal X 1 (f, K) of the input signal using equation (11). The multiplication process is performed to calculate the post-suppression signal Y (f, K), and the post-suppression signal output unit outputs the post-suppression signal Y (f, K) to the IFFT unit 16. [0100] The IFFT unit 16 converts the signal Y (f, K), which is a frequency domain signal, into a time domain signal y (n), and outputs the time domain signal y (n) to the speech processing device 2 in the subsequent stage. [0101] (A-3) Effects of the First Embodiment As described above, according to the first embodiment, the modGI of the correlation between the front suppression signal and the coherence becomes larger as the influence of the background noise increases, and the influence is smaller. The time constant of the Wiener filter (WF) can be controlled based on the characteristic behavior of becoming smaller. This enables appropriate coefficient adaptation based on the influence of background noise, and can improve the accuracy of background noise suppression processing. [0102] As a result, by applying the present invention to a communication apparatus such as a television conference system or a mobile telephone or preprocessing of a speech recognition function, 03-05-2019 25 improvement in performance can be expected. [0103] (B) Second Embodiment Next, a second embodiment of the non-target sound suppression apparatus, method and program according to the present invention will be described with reference to the drawings. [0104] In the second embodiment, a non-target sound suppressing apparatus and method for reducing an interference sound coming from surroundings by subtracting a front suppression signal from an input signal, for example, using the present invention And method). [0105] When subtracting a frontal suppression signal from an input signal, the frontal suppression signal is often multiplied by a subtraction coefficient to control the strength of subtraction, and if the subtraction coefficient is too large, the suppression performance is excessive and distortion of the target voice increases. If the subtraction factor is too small, the suppression performance of the disturbing speech is insufficient, and the sound quality is greatly affected. However, it is difficult to determine the presence of disturbing speech superimposed on the target speech, and it is difficult to set the subtraction coefficient to an appropriate value. [0106] Therefore, in the second embodiment, an unintended sound suppressing device that estimates the degree of contribution of interference sound to an input signal, controls the subtraction coefficient of frequency subtraction according to the result, and suppresses interference sound without excess or deficiency. And a method (interference sound suppression apparatus and method). [0107] (B-1) Configuration of Second Embodiment FIG. 6 is a block diagram showing an overall configuration of a non-target sound suppressing device 1A according to a second embodiment. 03-05-2019 26 [0108] The non-target sound suppression apparatus 1A according to the second embodiment is plural (two are shown in FIG. 1). The input signals s1 (n) and s2 (n) are acquired from the microphones m_1 and m_2), the contribution of the disturbing sound to the input signal is estimated, and the subtraction coefficient of the frequency subtraction is controlled according to the result. The postsuppression signal whose sound has been suppressed is supplied to the speech processing device 2 in the subsequent stage. [0109] The speech processing device 2 performs predetermined speech processing using the signal after suppression from the non-target sound suppression device 1A, as in the first embodiment. [0110] As shown in FIG. 6, the unintended sound suppression device 1A includes an FFT unit 11, a front suppression signal generation unit 12, a coherence calculation unit 13, a correlation calculation unit 54, a frequency subtraction processing unit 55, and an IFFT unit 16. [0111] The FFT unit 11, the front suppression signal generation unit 12, the coherence calculation unit 13, and the IFFT unit 16 are basically the same or corresponding components as described in the first embodiment, and thus detailed description will be omitted. Do. [0112] The non-target sound suppression apparatus 1A may be realized by installing a program (for example, a non-target sound suppression program) in a computer having a processor, a memory, etc. In this case, the non-target sound suppression apparatus 1A is functionally Can be shown using FIG. 03-05-2019 27 A part or all of the non-target sound suppression apparatus 1A may be realized as hardware. [0113] The correlation calculation unit 54 acquires the front suppression signal (average front suppression signal AVE_N (K)) from the front suppression signal generation unit 12 and the coherence COH (K) from the coherence calculation unit 13 and calculates the average front suppression signal AVE_N (K). The correlation coefficient cor (K) between X) and the coherence COH is calculated. Further, the correlation calculation unit 54 outputs the calculated correlation coefficient cor (K) to the frequency subtraction processing unit 55. The calculation method of the correlation coefficient cor (K) can be the same method as that of the first embodiment, and for example, equation (9) can be used. [0114] The frequency subtraction processing unit 55 acquires the input signal X1 (f, K), the correlation coefficient cor (K) from the correlation calculation unit 54, and the front suppression signal N (f, K) from the front suppression signal generation unit 12 The subtraction coefficient α is set based on the correlation coefficient cor (K), and the front suppression signal N (f, K) is multiplied by the subtraction coefficient α and then subtracted from the input signal X1 (f, K). , Obtain the signal Y (f, K) after suppression. [0115] FIG. 7 is a block diagram showing the configuration of the frequency subtraction processing unit 55 according to the second embodiment. [0116] As shown in FIG. 7, the frequency subtraction processing unit 55 includes an input signal acquisition unit 31, a subtraction coefficient control unit 32, a subtraction unit 33, and a postsubtraction signal output unit 34. 03-05-2019 28 [0117] The input signal acquisition unit 31 acquires the input signal X1 (f, K), the correlation coefficient cor (K) from the correlation calculation unit 54, and the front suppression signal N (f, K) from the front suppression signal generation unit 12. It is a thing. [0118] The subtraction coefficient control unit 32 sets the subtraction coefficient α based on the correlation coefficient cor (K). [0119] Here, the disturbing sound (here, the disturbing voice is used. The principle of estimation of the degree of contribution) is described below. First, it is assumed that the target sound comes from the front of the microphones m_1 and m_2, and the interference sound from the lateral direction (right direction, left direction) of the microphones m_1 and m_2. [0120] At this time, when the front suppression signal N (f, K) is “if there is no disturbing sound” and “when there is a target sound”, the target sound component is large because the signal component coming from the front is captured. Have a signal value proportional to However, as shown in FIG. 2, the sound collection level in the front direction is smaller than that in the lateral direction, so it is smaller than in the case where “jamming noise is present”. [0121] Further, the coherence COH is a feature amount having a deep relationship with the arrival 03-05-2019 29 direction of the input signal. Therefore, it has a large value when there is no interference sound and has only the target sound, and has a small value when there is interference sound. [0122] The above behavior is summarized as follows, focusing on the presence or absence of the disturbance sound. [0123] In the case of “no disturbance sound” and “only the target sound exists”, the coherence COH is a large value, and the front suppression signal has a value proportional to the size of the target sound component. [0124] The coherence COH is a small value when there is an interference sound, and the front suppression signal is a large value. [0125] This behavior is as follows when the correlation coefficient cor (K) between the front suppression signal N (f, K) and the coherence COH is introduced. [0126] The correlation coefficient cor (K) has a positive value when “jamming noise does not exist”. [0127] The correlation coefficient cor (K) has a negative value when “disturbing speech does not exist”. [0128] By the way, it is desirable from the viewpoint of reducing the excess and deficiency of the interference sound suppression that the subtraction coefficient α has a smaller value as the influence of the interference sound is smaller and a larger value as the influence of the 03-05-2019 30 interference sound is larger (described later See the equation). [0129] As described above, since the positive and negative fluctuates depending on the presence or absence of interference noise, if the correlation coefficient cor (K) is positive, the subtraction coefficient α is decreased, and if the correlation coefficient (K) is negative, the subtraction coefficient α is increased By such processing, control of the subtraction coefficient according to the degree of influence of the disturbance sound can be realized. [0130] Therefore, in the second embodiment, the subtraction coefficient control unit 32 performs subtraction using frequency subtraction processing based on the specific behavior of the correlation coefficient cor (K) between the front suppression signal N (f, K) and the coherence COH. Control the coefficients. [0131] More specifically, the subtraction coefficient control unit 32 sets a large value to the subtraction coefficient α in order to enhance the suppression effect when there is a disturbing voice, and the suppression effect when there is no disturbing sound. In order to weaken, the subtraction coefficient α is set to a small value. [0132] The subtraction coefficient control unit 32 includes, for example, a subtraction coefficient storage unit (not shown) that records the correspondence between the value of the correlation coefficient and the setting value of the subtraction coefficient α, and refers to the subtraction coefficient storage unit. The subtraction coefficient α may be set. [0133] The subtraction unit 33 performs subtraction processing as shown in equation (12) using the subtraction coefficient α obtained from the subtraction coefficient control unit 32. Y (f, K) = X1 (f, K) -α × N (f, K) (12) 03-05-2019 31 [0134] The signal processing unit after subtraction processing 34 outputs the signal after suppression processing (signal after subtraction processing) Y (f, K) calculated by the subtraction unit 33 to the IFFT unit 16. [0135] (B-2) Operation of Second Embodiment Next, the operation of non-target sound suppression processing in the non-target sound suppression device 1A according to the second embodiment will be described in detail with reference to the drawings. [0136] Input signals s1 (n) and s2 (n) for one frame (one processing unit) are supplied to the FFT unit 11 from the microphones m_1 and m_2 via an AD converter (not shown). The FFT unit 11 performs Fourier transform on analysis frames FRAME1 (K) and FRAME2 (K) based on input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the frequency domain , X2 (f, K). The signals X 1 (f, K) and X 2 (f, K) generated by the FFT unit 11 are supplied to the front suppression signal generation unit 12 and the coherence calculation unit 13. [0137] The front suppression signal generator 12 calculates the front suppression signal N (f, K) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11. Then, the front suppression signal generator 12 calculates the average front suppression signal AVE_N (K) based on the front suppression signal N (f, K), and supplies the average front suppression signal AVE_N (K) to the correlation calculator 54. 03-05-2019 32 [0138] The coherence calculator 13 generates the coherence COH (K) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11 and supplies the coherence COH (K) to the correlation calculator 54. [0139] The correlation calculation unit 54 calculates a correlation coefficient cor (K) which is a feature indicating the relationship between the average front suppression signal AVE_N (K) and the coherence COH (K) using, for example, the equation (9). [0140] The frequency subtraction processing unit 55 receives the input signal X1 (f, K), the correlation coefficient cor (K) from the correlation calculation unit 54, and the front suppression signal N (f, K) from the front suppression signal generation unit 12 Be done. [0141] FIG. 8 is a flowchart showing processing in the subtraction coefficient control unit 32 of the frequency subtraction processing unit 55 according to the second embodiment. [0142] First, the subtraction coefficient control unit 32 determines whether the value of the correlation coefficient cor (K) from the correlation calculation unit 54 is negative (S201). Then, when the value of the correlation coefficient cor (K) is negative (that is, when there is a disturbing voice), a large value is set to the subtraction coefficient α in order to enhance the suppression effect (S202). On the other hand, when the value of the correlation coefficient cor (K) is not negative (ie, when there is no disturbing sound), the subtraction coefficient α is set to a small value in order to weaken the suppression effect. [0143] 03-05-2019 33 Then, using the subtraction coefficient α obtained by the subtraction coefficient control unit 32, the subtraction unit 33 obtains the signal Y (f, K) after subtraction processing by equation (12), and the signal output unit 34 after subtraction processing The signal after suppression processing (signal after subtraction processing) Y (f, K) is output to the IFFT unit 16. [0144] The IFFT unit 16 converts the signal Y (f, K), which is a frequency domain signal, into a time domain signal y (n), and outputs the time domain signal y (n) to the speech processing device 2 in the subsequent stage. [0145] (B-3) Effects of Second Embodiment As described above, according to the second embodiment, when a disturbing voice is present, the correlation coefficient between the front suppression signal and the coherence is negative, and the disturbing voice is The presence of disturbed speech superimposed on the target speech is detected based on the characteristic behavior of being positive if not present, and using this result to control the subtraction coefficient used for the frequency subtraction process, jamming. The accuracy of the speech suppression process can be increased. [0146] As a result, by applying the present invention to a communication apparatus such as a television conference system or a mobile telephone or preprocessing of a speech recognition function, improvement in performance can be expected. [0147] (C) Other Embodiments Although various modified embodiments are mentioned in the first and second embodiments described above, the present invention can also be applied to the following modified embodiments. [0148] (C-1) In the first or second embodiment described above, the suppression coefficient or the subtraction coefficient may be calculated for each frequency bin. 03-05-2019 34 In this case, the correlation coefficient can also be realized by calculating for each frequency bin. [0149] (C-2) In the second embodiment, the presence or absence of the interference sound can be determined by focusing on the positive and negative of the correlation coefficient, but the magnitude of the influence of the interference sound is focused on the absolute value of the correlation coefficient I understand. If the correlation coefficient is negative and the absolute value is small, the correlation between the correlation coefficient and the influence of the interference sound is small. If the correlation coefficient is negative and the absolute value is large, the influence of the interference sound is It is big. Therefore, if the input value is small, the output value is small, and if the input value is large, an arbitrary function (for example, a quadratic function) is prepared, and the absolute value of the correlation coefficient is input to this. By setting these values as subtraction coefficients, subtraction coefficients can be set according to the degree of influence (the magnitude of the absolute value of the correlation) of the interference sound. [0150] 1 and 1A: non-target sound suppression device, 11: FFT unit, 12: front suppression signal generation unit, 13: coherence calculation unit, 14: correlation and modGI calculation unit, 15: WF (winner filter) unit, 54: correlation calculation Part 55: Frequency subtraction processing part 16: IFFT part. 03-05-2019 35

1/--страниц