close

Вход

Забыли?

вход по аккаунту

?

JP2018142826

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018142826
Abstract: When suppressing or subtracting an unintended sound from an input signal, the sound
quality of the intended sound is improved, and a suppression coefficient or a subtraction
coefficient is controlled with a low processing load. An unintended sound suppression device
according to the present invention is a frontal suppression having a blind spot on the front based
on the difference between a plurality of frequency domain input signals obtained by converting
each input signal from a time domain to a frequency domain. A front suppression signal
generation unit that generates a signal, a coherence calculation unit that calculates coherence
based on signals obtained from a plurality of input signals, and a feature amount calculation that
calculates a feature that indicates the relationship between the front suppression signal and the
coherence The coefficient relating to the suppression of the non-target sound included in the
input signal is set using the feature amount indicating the relationship between the part and the
front suppression signal and the coherence, and the non-target sound included in the input signal
is And an unintended sound suppression processing unit for obtaining the suppressed signal
after suppression processing. [Selected figure] Figure 1
Non-target sound suppression device, method and program
[0001]
The present invention relates to an unintended sound suppression apparatus, method, and
program, and can be applied to, for example, a communication apparatus or communication
software using speech such as a telephone or a teleconference system, or acoustic signal
processing used in preprocessing of speech recognition processing. It is.
[0002]
03-05-2019
1
In recent years, devices equipped with various voice processing functions such as a voice call
function and a voice recognition function, such as smartphones and car navigation systems, have
become widespread.
However, with the widespread use of these devices, voice processing functions are being used
under harsher noise environments than before, such as in a busy city or in a moving car.
Therefore, there is a growing demand for signal processing technology that can maintain call
sound quality and speech recognition performance even in a noisy environment.
[0003]
Noises that interfere with the performance of the voice processing function include, for example,
crowded streets, background noises such as running noises of cars, and disturbing sounds (eg,
disturbed voices such as the voices of people other than the user of the voice processing
function) Can be divided into Various effective suppression methods have been proposed on the
premise that the background noise has a stationary frequency characteristic and power (see
Patent Documents 1 to 3 and Non-patent Document 1).
[0004]
JP-A-2010-532879 JP-A-2014-106337 JP-A-2014-164191
[0005]
Hiraoka Kazuyuki, Hori Gen., "Probability Statistics for Programming", Ohmsha, published on
October 23, 2009
[0006]
However, as described above, due to the rapid expansion of the use environment of the audio
signal processing function, the background noise is also increasing when it is not steady.
Therefore, there is a need for a background noise suppression method that can quickly follow
03-05-2019
2
fluctuations in the characteristics of background noise. However, when background noise is
suppressed in a signal section in which an interference sound is present, the signal component of
the target sound is also dropped and the sound quality is improved. Deterioration may occur.
[0007]
Further, in Patent Document 3, a signal obtained by suppressing a component coming from the
front from an input signal (referred to as a front suppression signal).
The technique of suppressing the disturbance sound coming from the surroundings by
subtracting) is disclosed, but at the time of subtraction, the strength of the subtraction is often
controlled by multiplying the front suppression signal by the subtraction coefficient. When the
coefficient is too large, the suppression performance is excessive and distortion of the target
sound is increased, and when it is too small, the suppression performance of the interference
sound is insufficient, and the sound quality is greatly affected. However, it is difficult to
determine the presence of the interference sound superimposed on the target sound, and it is
difficult to set the subtraction coefficient to an appropriate value.
[0008]
Therefore, in view of the above problems, when suppressing or subtracting an unintended sound
from an input signal, an unintended sound suppressing device capable of improving the sound
quality of the intended sound, suppressing the processing load, and controlling the suppression
coefficient or the subtraction coefficient. , Methods and programs are needed.
[0009]
In order to solve such problems, the non-target sound suppression apparatus according to the
first aspect of the present invention comprises: (1) a plurality of input signals obtained from each
of a plurality of microphones obtained by converting from time domain to frequency domain A
front suppression signal generation unit that generates a front suppression signal having a dead
angle in front based on a difference between frequency domain input signals; (2) a coherence
calculation unit that calculates coherence based on signals obtained from a plurality of input
signals; (3) A feature quantity calculation unit that calculates a feature quantity that indicates the
relationship between the front suppression signal and the coherence, and (4) a feature quantity
that indicates the relationship between the front suppression signal and the coherence, to be
included in the input signal A non-target sound suppression processing unit which sets a
coefficient relating to the suppression of the non-target sound, and uses the coefficient to
03-05-2019
3
suppress the non-target sound included in the input signal to obtain a post-suppression
processing signal; .
[0010]
According to a second object sound suppression method of the present invention, (1) a plurality
of frequencies obtained by the front suppression signal generation unit converting each input
signal from each of the plurality of microphones from the time domain to the frequency domain
Based on the difference between the region input signals, a front suppression signal having a
dead angle in front is generated, (2) the coherence calculation unit calculates the coherence
based on the signals obtained from the plurality of input signals, and (3) feature amount The
calculation unit calculates a feature amount indicating the relationship between the front
suppression signal and the coherence, and (4) the non-target sound suppression processing unit
uses the feature amount indicating the relationship between the front suppression signal and the
coherence to input It is characterized in that a coefficient relating to suppression of an
unintended sound contained in a signal is set, and a signal after suppression processing in which
an unintended sound contained in an input signal is suppressed using the coefficient is obtained.
[0011]
A non-target sound suppression program according to a third aspect of the present invention is a
computer-implemented program for converting a plurality of frequency domain input signals
obtained by converting each input signal from each of a plurality of microphones from a time
domain to a frequency domain. A front suppression signal generation unit that generates a front
suppression signal having a blind spot on the front based on the difference; (2) a coherence
calculation unit that calculates coherence based on signals obtained from a plurality of input
signals; A feature quantity calculation unit that calculates a feature quantity that indicates the
relationship between the suppression signal and the coherence, and (4) a feature quantity that
indicates the relationship between the front suppression signal and the coherence, the non-target
sound included in the input signal It is characterized in that a coefficient relating to suppression
is set, and the function is performed as an unintended sound suppression processing unit that
obtains an after-suppression processing signal that suppresses unintended sound included in the
input signal using the coefficient.
[0012]
According to the present invention, when suppressing or subtracting an unintended sound from
an input signal, it is possible to control the suppression coefficient or the subtraction coefficient
with good processing quality of the target sound and low processing load.
03-05-2019
4
[0013]
1 is a block diagram showing an entire configuration of an unintended sound suppression device
according to a first embodiment.
It is an explanatory view explaining an example of arrangement of a microphone concerning an
embodiment.
It is a figure which shows the characteristic of the directivity signal applied with the acoustic
signal processing apparatus which concerns on embodiment.
It is a block diagram which shows the structure of the WF part which concerns on 1st
Embodiment.
It is a flowchart which shows the process in the time constant control part of the WF part which
concerns on 1st Embodiment.
It is a block diagram which shows the whole structure of the non-objective sound suppression
apparatus which concerns on 2nd Embodiment. It is a block diagram which shows the structure
of the frequency subtraction process part which concerns on 2nd Embodiment. It is a flowchart
which shows the process in the time constant control part 23 of the frequency subtraction
process part which concerns on 2nd Embodiment.
[0014]
(A) First Embodiment Hereinafter, a first embodiment of the non-target sound suppression
apparatus, method and program according to the present invention will be described in detail
with reference to the drawings.
[0015]
In the first embodiment, a background noise suppressing apparatus and method for quickly
following fluctuations in non-stationary background noise characteristics by rapid expansion of
the use environment of the audio signal processing function by using the present invention (nontarget sound suppressing apparatus And method).
03-05-2019
5
[0016]
Here, when the background noise suppression function is used in an environment where
disturbance noise occurs in the surroundings, the coefficient adaptation operation may be
erroneously performed in a signal section in which the disturbance noise is present.
At this time, the feature of the human voice called the disturbance sound is also referred to as a
background noise suppression coefficient (hereinafter, referred to as a “suppression
coefficient”.
When the suppression processing is performed using the coefficient, the signal component of the
target sound is also dropped, and the sound quality may be degraded.
[0017]
Therefore, in the first embodiment, in order to prevent the above phenomenon, the variation of
background noise is continuously monitored while suppressing the influence of the target sound
and the disturbance sound, and the adaptive operation of the background noise suppression
coefficient is performed based on the result. An apparatus and method for suppressing nontarget sound that can control
[0018]
(A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing the overall
configuration of a non-target sound suppressing device 1 according to the first embodiment.
[0019]
As shown in FIG. 1, a plurality (two in FIG. 1) of the non-target sound suppression device 1 are
shown.
Input signals s1 (n) and s2 (n) from the microphones m_1 and m_2).
03-05-2019
6
Here, n is an index indicating the sample input order, and is represented by a positive integer. In
the following, it is assumed that the smaller n is the older input sample and the larger is the
newer input sample.
[0020]
The non-target sound suppression apparatus 1 sets parameters (variables) for suppressing
background noise following variations in characteristics of background noise based on the
respective input signals obtained from the microphones m_1 and m_2, and suppresses the
background noise. The post-suppression signal is supplied to the audio processing device 2 in the
subsequent stage.
[0021]
The voice processing device 2 performs predetermined voice processing using the signal after
suppression from the non-target sound suppression device 1.
The processing content in the voice processing device 2 is not particularly limited, and various
processing can be applied. For example, voice communication processing or voice recognition
processing in a telephone terminal or a video conference system may be performed. Good. It
should be noted that the non-target sound suppression device 1 and the voice processing device
2 may be connected as long as they can transmit and receive signals, and may be connected by
wiring of the circuit, or, for example, via a wired line or a wireless line. It may be capable of
transmitting and receiving signals by network communication.
[0022]
FIG. 2 is an explanatory view for explaining an arrangement example of the microphones m_1
and m_2.
[0023]
As shown in FIG. 2, the microphones m_1 and m_2 are arranged such that the plane including
the two microphones m_1 and m_2 is perpendicular to the direction in which the target sound
arrives (the direction of the sound source of the target sound) I assume.
03-05-2019
7
In the following, as shown in FIG. 2, the arrival direction of the target sound is referred to as the
forward direction or the front direction as viewed from the position between the two
microphones m_1 and m_2. Also, in the following, as shown in FIG. 2, when referred to as
rightward, leftward, and backward, each direction when the direction of arrival of the target
sound is viewed from the position between the two microphones m_1 and m_2 is indicated. It
explains as a thing. In this embodiment, it is assumed that the target sound comes from the front
direction of the microphones m_1 and m_2 and the non-target sound including the interference
sound from the left and right direction (lateral direction).
[0024]
As shown in FIG. 1, the unintended sound suppression device 1 includes an FFT unit 11, a front
suppression signal generation unit 12, a coherence calculation unit 13, a correlation and modGI
calculation unit 14, a WF (winer filter) unit 15, and an IFFT unit 16. Have.
[0025]
The non-target sound suppression device 1 may be realized by installing a program (for example,
a non-target sound suppression program) in a computer having a processor, a memory, etc. In
this case, the non-target sound suppression device 1 is functional Can be shown using FIG.
Note that a part or all of the non-target sound suppression device 1 may be realized as hardware.
[0026]
The FFT unit 11 receives input signals s1 and s2 from the microphones m_1 and m_2 through
AD converters (not shown) and performs fast Fourier transform (or discrete Fourier transform)
on the input signals s1 and s2. . Thus, the input signals s1 and s2 are represented in the
frequency domain.
[0027]
03-05-2019
8
Note that the FFT unit 11 performs the fast Fourier transform, and comprises an analysis Fourier
FRAME 1 (K) and a predetermined N (N is an arbitrary integer) samples from the input signals s 1
(n) and s 2 (n). Assume that FRAME 2 (K) is configured. An example of constructing FRAME 1
from input signal s 1 is shown in the following equation (1).
[0028]
In the equation (1), K is an index representing the order of frames, and is represented by a
positive integer. In the following, the smaller the value of K, the older the analysis frame, and the
larger the value of K, the newer the analysis frame. Further, in the following description, it is
assumed that the index representing the latest analysis frame to be analyzed is K unless
otherwise specified.
[0029]
[0030]
The FFT unit 11 performs fast Fourier transform processing for each analysis frame to input a
frequency domain signal X1 (f, K) obtained by performing Fourier transform on an analysis
frame FRAME 1 (K) configured from the input signal s1, and The frequency domain signal X2 (f,
X) obtained by Fourier transforming to the analysis frame FRAME2 (K) composed of the signal s2
is supplied to the front suppression signal generator 12 and the coherence calculator 13.
[0031]
Here, f is an index representing a frequency.
Also, the frequency domain signal X1 (f, K) is not a single value, but is composed of m (m is an
arbitrary integer) spectral components of a plurality of frequencies f1 to fm as in equation (2) It
is assumed that
[0032]
03-05-2019
9
[0033]
In the above equation (2), X1 (f, K) is a complex number, and is composed of a real part and an
imaginary part.
The same applies to X2 (f, K) and the front suppression signal N (f, K) described in the front
suppression signal generator 12 described later.
[0034]
The front suppression signal generation unit 12 performs processing of suppressing the signal
component in the front direction for each frequency with respect to the signal supplied from the
FFT unit 11.
In other words, the front suppression signal generation unit 12 functions as a directional filter
that suppresses components in the front direction.
[0035]
For example, as illustrated in FIG. 3, the front suppression signal generation unit 12 uses an 8shaped bi-directional filter having a dead angle in the front direction to generate a component in
the front direction from the signal supplied from the FFT unit 11. Form a directional filter that
suppresses
[0036]
Specifically, the front suppression signal generation unit 12 performs the calculation as shown in
the following equation (3) based on the signals X1 (f, K) and X2 (f, K) supplied from the FFT unit
11. And generates a front suppression signal N (f, K) for each frequency.
The calculation of the following equation (3) corresponds to the process of forming an 8-shaped
bi-directional filter having a dead angle in the front direction as shown in FIG.
03-05-2019
10
N(f,K)=X1(f,K)−X2(f,K) …(3)
[0037]
As described above, the front suppression signal generation unit 12 acquires each frequency
component of the frequencies f1 to fm (power for one frame of each frequency band).
[0038]
Further, the front suppression signal generation unit 12 calculates an average front suppression
signal AVE_N (K) obtained by averaging the front suppression signal N (f, K) over all the
frequencies f1 to fm according to the equation (4). Do.
[0039]
[0040]
The coherence calculation unit 13 calculates a coherence COH (K) by forming a signal having
strong directivity in a specific direction included in the frequency domain signals X1 (f, K) and
X2 (f, K) from the FFT unit 11. .
[0041]
Here, the process of calculating the coherence COH (K) in the coherence calculation unit 13 will
be described.
[0042]
The coherence calculator 13 processes the signal B1 (f, K) processed by the filter having strong
directivity in the first direction (for example, the left direction) from the frequency domain
signals X1 (f, K) and X2 (f, K). Also, the coherence calculation unit 13 processes the frequency
domain signals X1 (f, K) and X2 (f, K) with a signal B2 (f , K).
As a method of forming the signals B1 (f) and B2 (f) having strong directivity in a specific
direction, the existing method can be applied. Here, the following equation (5) is applied to the
first direction An example is shown in which the signal B1 having strong directivity is formed and
the signal B2 having strong directivity in the second direction is formed by applying the
03-05-2019
11
following equation (6).
[0043]
[0044]
In the above equations (5) and (6), S represents a sampling frequency, N represents an FFT
analysis frame length, τ represents a sound wave arrival time difference between the
microphone m_1 and the microphone m_2, i represents an imaginary unit, and f represents a
frequency. .
[0045]
Next, the coherence calculation unit 13 performs the following operations (7) and (8) on the
signals B1 (f) and B2 (f) obtained as described above. We obtain the coherence COH (K).
Here, B2 (f, K) <*> in the equation (7) is a conjugate complex number of B2 (f, K).
[0046]
[0047]
coef (f, K) is a coherence at a component of a frame having an index of an arbitrary index K (an
arbitrary frequency f (a frequency of any of frequencies f1 to fm) constituting the analysis frames
FRAME1 (K) and FRAME2 (K)) It is assumed that
[0048]
When coef (f, K) is determined, if the directivity of the signal B1 (f) and the directivity of the
signal B (f) are different from each other, the signals B1 (f) and B2 ( The directivity direction
according to f) may be any direction other than the front direction.
Further, the method of calculating coef (f, K) is not limited to the above-described calculation
03-05-2019
12
method.
[0049]
The correlation and mod GI calculation unit 14 acquires the front suppression signal N (f, N)
(average front suppression signal AVE_N (K)) having directivity other than the front and the
coherence COH (K), and the average front suppression signal A correlation coefficient cor (K),
which is a feature indicating the relationship between AVE_N (K) and coherence COH (K), is
calculated.
[0050]
In addition, the correlation and modGI calculation unit 14 uses the correlation coefficient cor (K)
to represent a feature amount (cor_modGI (K)) that indicates the magnitude of the positive /
negative fluctuation of the slope of the amplitude of the correlation coefficient cor (K). Is
calculated, and the special amount (cor_modGI (k)) is output to the WF unit 15.
[0051]
First, the principle of detecting the signal section in which the disturbing sound exists based on
the correlation coefficient cor (K) between the average front suppression signal AVE_N (K) and
the coherence COH (K) in the correlation and ModGI calculation unit 14 will be described. Do.
[0052]
Here, a sound source emitting a target sound is present in the front direction of the microphone
m_1 and the microphone m_2, and interference noise is generated from directions other than the
front direction (for example, lateral directions of the microphone m_1 and the microphone m_2
(ie, left direction, right direction) Shall arrive.
[0053]
For example, in the case where “a disturbance sound does not exist” and “a target sound
exists”, the front suppression signal N (f, K) has a signal value proportional to the magnitude of
the target sound component.
However, as shown in FIG. 2, the gain in the front direction is smaller than the gain in the lateral
direction, and thus has a smaller value than when there is an interference sound.
03-05-2019
13
[0054]
Further, the coherence COH (K) is a feature having a deep relationship with the incoming
direction of the input signal, and can be rephrased as the correlation between two signal
components.
This is because the equation (6) is an equation for calculating the correlation with respect to a
certain frequency component, and the equation (7) is an equation for calculating the average of
the correlation values of all frequency components. When K) is small, it can be said that the
correlation between the two signal components is small, and conversely, when the coherence
COH (K) is large, it can be said that the correlation between the two signal components is large.
The input signal in the case where the coherence COH (K) is small is said to be a signal whose
arrival direction is largely deviated to either the right direction or the left direction, and is from a
direction other than the front direction.
On the other hand, it can be said that the input signal in the case where the coherence COH (K) is
large has a small deviation in the arrival direction, and is a signal that has arrived from the front
direction.
[0055]
In this case, the coherence COH (K) has a large value when there is no interference sound and the
target sound exists, and the interference sound exists and the target sound exists. , Coherence
COH (K) is a small value.
[0056]
If the above behavior is arranged focusing on the presence or absence of the disturbance sound,
the following relationship is obtained.
-When there is no interference sound and the target sound is present, the coherence COH (K)
03-05-2019
14
takes a large value, and the front suppression signal N (f, K) (average front suppression signal
AVE_N (K)) Is a value proportional to the size of the target sound component.
When the “interference sound exists”, the coherence COH (K) has a small value, and the front
suppression signal N (f, K) (average front suppression signal AVE_N (K)) has a large value.
[0057]
By the way, in the case of the above behavior, if the correlation coefficient cor (K) between the
front suppression signal N (f, K) (average front suppression signal AVE_N (K)) and the coherence
COH (K) is introduced, It can be said that.
The correlation coefficient cor (K) is a positive value (cor (K)> 0) when “interference sound is
not present”.
The correlation coefficient cor (K) is a negative value (cor (K) ≦ 0) when “interference sound is
present”.
[0058]
Therefore, the correlation and modGI calculation unit 14 observes the positive and negative of
the correlation coefficient cor (K) between the average front suppression signal AVE_N (K) and
the coherence COH (K), and the correlation coefficient cor (K) is positive. It can be determined
that no disturbing sound is present, and it can be determined that a disturbing sound is present if
the correlation coefficient cor (K) is negative.
[0059]
Here, although the calculation method of the correlation coefficient cor (K) is not limited, for
example, the correlation coefficient cor (K) can be calculated for each frame using the following
equation (9) .
[0060]
03-05-2019
15
In the following equation (9), cov [AVE_N (K), COH (K)] indicates the covariance of the average
front suppression signal AVE_N (K) and the coherence COH (K).
Further, in the following equation (9), σ AVE_N (K) indicates the standard deviation of the
average front suppression signal AVE_N (K), and σ COH (K) indicates the standard deviation of
the coherence COH (K).
Furthermore, when the correlation coefficient cor (K) is obtained by the following equation (9),
the results of a predetermined number i of frames processed most recently for AVE_N (K) and
COH (K) are used: The standard deviation or covariance may be determined.
Specifically, in the process of obtaining the correlation coefficient cor (K) in the following (9), for
example, i frames processed most recently (K-i-th frame, K- (i-1) Standard deviation (σ N (f, K)
and σ COH (K)), using COH (K) and AVE_N pertaining to each of the The covariance (cov [AVE_N
(K), COH (K)]) may be determined.
In other words, in the process of obtaining the correlation coefficient cor (K), the standard
deviation and covariance in the following equation (9) may be determined using the i pieces of
AVE_N and COH determined most recently as samples. .
The correlation coefficient cor (K) thus obtained takes a value of −1.0 to 1.0.
[0061]
[0062]
Next, in the correlation and mod GI calculation unit 14, using the correlation coefficient cor (K), a
feature value representing the magnitude of the positive / negative fluctuation of the slope of the
amplitude of the correlation coefficient cor (K) is calculated.
[0063]
When background noise is present in the input signal, the behavior of the correlation coefficient
cor (K) changes as follows.
03-05-2019
16
[0064]
・ If an interference sound is present, the value of the correlation coefficient cor (K) becomes
positive, and if the interference sound is not present, the value of the correlation coefficient cor
(K) becomes negative. Macroscopic behavior is maintained to some extent .
[0065]
· The degree of fluctuation of the amplitude of the front suppression signal (average front
suppression signal AVE_N (K)) increases due to the influence of background noise while the
coherence COH (K) decreases the dynamic range Therefore, the irregularity of the amplitude
does not change extremely.
For this reason, the synchronization between the increase / decrease of the front suppression
signal (average front suppression signal AVE_N (K)) and the increase / decrease of the coherence
COH (K) is lost, and the correlation (correlation coefficient cor (K)) increases or decreases.
Fluctuation of the
In addition, the frequency of positive and negative fluctuations of the correlation coefficient cor
(K) increases.
[0066]
That is, as the influence of the background noise increases, the increase and decrease of the
value of the correlation coefficient cor (K) and the positive and negative change frequency of the
value of the correlation coefficient cor (K) increase.
[0067]
Thus, when background noise is present, the frequency of increase or decrease of the value of
the correlation coefficient cor (K) and the frequency of positive and negative fluctuations
increase, and as the influence of the background noise increases, these fluctuations (ie, the
correlation The increase and decrease of the value of the number cor_ (K) and the positive /
negative fluctuation) become large.
03-05-2019
17
This behavior is derived only from background noise.
Therefore, by observing the fluctuation in the value of the correlation coefficient cor (K), it is
possible to estimate the influence of background noise on the target sound and the fluctuation of
characteristics without being affected by the target sound and the interference sound. be able to.
[0068]
Therefore, in the first embodiment, the correlation and modGI calculating unit 14 uses a feature
amount called modGI (GI: Gradient Index) to observe increase / decrease and positive / negative
fluctuation of the value of the correlation coefficient cor (K). calculate.
[0069]
Here, modGI is an index for measuring the number of times the inclination direction of the signal
waveform changes and its magnitude (see Patent Document 2).
modGI is defined as the power of the second-order difference of the calculation target signal
normalized with the power of the calculation target signal for an arbitrary signal of the feature
amount calculation target.
[0070]
In the first embodiment, the correlation and modGI calculation unit 14 calculates modGI
according to the calculation method described in Patent Document 2.
As an example of the calculation formula of modGI defined as described above, the correlation
and modGI calculation unit 14 uses the following equation (10) to calculate the feature quantity
that indicates the degree of fluctuation of the correlation coefficient cor (K) Calculate cor_modGI
(K).
[0071]
03-05-2019
18
[0072]
Equation (10) represents the frequency at which the positive and negative slopes of the
correlation coefficient cor (K) fluctuate.
Equation (10) is characterized in that the value of cor_modGI decreases as the positive / negative
fluctuation of the signal slope decreases, while the value of cor_modGI increases as the positive /
negative fluctuation of the slope increases. In other words, the larger the value of cor_modGI, the
larger the influence of background noise, and the smaller the value of cor_modGI, the smaller the
influence of background noise.
[0073]
The WF unit 15 sets the value of the time constant (λ) for controlling the adaptation speed of
the suppression coefficient wf_coef (f, K) based on the correlation and the value of cor_modGI (K)
from the modGI calculation unit 14, and this time constant The suppression coefficient wf_coef
(f, K) is calculated using the value of.
[0074]
In addition, the WF unit 15 multiplies the frequency domain signal X1 (f, K) of the input signal by
the suppression coefficient wf_coef (f, K) to calculate the signal Y (f, K) after suppression
processing, and performs an IFFT unit. Output to 16
[0075]
FIG. 4 is a block diagram showing the configuration of the WF unit 15 according to the first
embodiment.
[0076]
As shown in FIG. 4, the WF unit 15 according to the first embodiment includes an input signal
acquisition unit 21, a time constant control unit 23, a coefficient adaptation unit 24, a
background noise suppression processing unit 25, and a signal output unit 26 after suppression
processing. Have.
03-05-2019
19
[0077]
The input signal acquisition unit 21 acquires the frequency domain signal X 1 (f, K) of the input
signal and the cor_modGI (K) from the correlation and modGI calculation unit 14.
[0078]
The time constant control unit 23 sets the value of the time constant λ for controlling the
adaptation speed of the suppression coefficient wf_coef (f, K) based on the correlation and the
value of cor_modGI (K) from the modGI calculation unit 14.
[0079]
Here, the role of the time constant λ will be briefly described.
In the WF unit 15, the suppression coefficient adaptation unit 24 described later calculates the
suppression coefficient wf_coef (f, K), but prior to this, it is necessary to calculate the background
noise characteristic for each frequency.
The estimation of background noise is performed, for example, according to Equation 1 of Patent
Document 1, and a parameter (time constant) λ is involved here.
[0080]
The time constant λ has a value of 0.0 to 1.0, and has a role of controlling how much the
instantaneous input value is reflected on the background noise characteristic.
As the value of the time constant λ is larger, the influence of the instantaneous input becomes
stronger, and as the value of the time constant λ is smaller, the influence of the instantaneous
input becomes less.
Therefore, if the value of the time constant λ is large, the value of the suppression coefficient
wf_coef (f, K) is strongly reflected at the input at that moment, and high-speed coefficient
adaptation can be realized, while the influence of the instantaneous input becomes strong. The
03-05-2019
20
variation of the coefficient value becomes large, which may reduce the naturalness of the sound
quality.
On the other hand, when the value of time constant λ is small, although the adaptation speed is
slow, the suppression coefficient wf_coef (f, K) obtained is not strongly influenced by the
instantaneous characteristics, and the noise characteristics in the past are reflected on average. It
is difficult to lose the natural quality of sound quality.
[0081]
Therefore, when the value of cor_mod (K) is larger than the threshold Θ (for example, when
cor_mod (K) is greater than or equal to the threshold)), the time constant control unit 23 has a
large influence of background noise. Make the value a large value.
On the other hand, when the value of cor_mod (K) is smaller than the threshold Θ (for example,
when cor_mod (K) is smaller than the threshold)), the time constant control unit 23 reduces the
value of the time constant Make This makes it possible to realize coefficient adaptation according
to the characteristics of the background noise without being influenced by the target sound and
the disturbance sound.
[0082]
Here, although the case where the threshold value θ for determining the value of the time
constant λ is one is exemplified, two or more threshold values may be set, and the time constant
may be finely adjusted for each section to which cor_modGI belongs. λ may be set.
[0083]
The suppression coefficient adaptation unit 24 uses the time constant λ set by the time constant
control unit 23 to calculate the suppression coefficient wf_coef (f, K).
The suppression coefficient wf_coef (f, K) can be obtained, for example, using Equation 3 of
Patent Document 1.
03-05-2019
21
[0084]
The background noise suppression processing unit 25 converts the suppression coefficient
wf_coef (f, K) calculated by the suppression coefficient adaptation unit 24 into the frequency
domain signal X 1 (f, K) of the input signal using the following equation (11): The multiplication
process is performed to calculate the post-suppression signal Y (f, K).
Y(f,K)=X1(f,K)×wf_coef(f,K) …(11)
[0085]
The post-suppression signal output unit outputs the post-suppression signal Y (f, K) to the IFFT
unit 16.
[0086]
The IFFT unit 16 converts the signal Y (f, K) which is a frequency domain signal into a time
domain signal y (n).
Note that the IFFT unit 16 may be omitted as long as the subsequent stage circuit can be
configured to process the frequency domain signal Y (f, K) as it is.
[0087]
(A-2) Operation of First Embodiment Next, the operation of non-target sound suppression
processing in the non-target sound suppression device 1 according to the first embodiment will
be described in detail with reference to the drawings.
[0088]
First, input signals s1 (n) and s2 (n) for one frame (one processing unit) are supplied to the FFT
unit 11 from the microphones m_1 and m_2 via an AD converter (not shown).
The FFT unit 11 performs Fourier transform on analysis frames FRAME1 (K) and FRAME2 (K)
03-05-2019
22
based on input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the
frequency domain , X2 (f, K). The signals X 1 (f, K) and X 2 (f, K) generated by the FFT unit 11 are
supplied to the front suppression signal generation unit 12 and the coherence calculation unit
13.
[0089]
The front suppression signal generator 12 calculates the front suppression signal N (f, K) based
on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11. Then, the front suppression signal
generation unit 12 calculates an average front suppression signal AVE_N (K) based on the front
suppression signal N (f, K), and supplies it to the correlation and modGI calculation unit 14.
[0090]
The coherence calculator 13 generates the coherence COH (K) based on the signals X1 (f, K) and
X2 (f, K) from the FFT unit 11, and supplies the coherence COH (K) to the correlation and modGI
calculator 14.
[0091]
The correlation and mod GI calculation unit 14 calculates the correlation coefficient cor (K),
which is a feature indicating the relationship between the average front suppression signal
AVE_N (K) and the coherence COH (K), using, for example, the equation (9). Do.
[0092]
In addition, the correlation and modGI calculation unit 14 uses the correlation coefficient cor (K)
to represent cor_modGI (K), which is a feature representing the magnitude of the positive /
negative fluctuation of the slope of the amplitude of the correlation coefficient cor (K). Is
calculated and this cor_modGI (K) is given to the WF unit 15.
[0093]
The WF unit 15 receives cor_modGI (K) from the correlation and modGI calculation unit 14 and
the frequency domain signal X1 (f, K) of the input signal.
[0094]
FIG. 5 is a flowchart showing processing in the time constant control unit 23 of the WF unit 15
03-05-2019
23
according to the first embodiment.
[0095]
First, the time constant control unit 23 compares the value of cor_modGI (K) from the correlation
and modGI calculation unit 14 with the threshold Θ (S101), and when the value of cor_modGI
(K) is larger than the threshold Θ, the time constant λ Is set to a large value (S102), and when
the value of cor_modGI (K) is less than the threshold Θ, the value of the time constant λ is set to
a small value (S102).
[0096]
The time constant λ takes a value of 0.0 <λ <1.0, and as the value of the time constant λ
approaches 1.0, it is strongly influenced by the signal inputted at the moment, As the value of the
time constant λ approaches 0.0, the influence of the signal input at the moment becomes
weaker.
Therefore, the value of the time constant λ can be set to a relative magnitude based on the
comparison result of the value of cor_modGI (K) and the threshold value Θ.
Therefore, when the value of cor_modGI (K) is less than the threshold Θ, the value of the time
constant λ is λ1, and the value of the time constant λ when the value of cor_modGI (K) is the
threshold Θ or more is λ2, λ1 <λ2 It is sufficient if the relationship is large and small.
[0097]
Then, the suppression coefficient adaptation unit 24 uses the time constant λ set by the time
constant control unit 23 to calculate the suppression coefficient wf_coef (f, K).
[0098]
That is, as the value of the time constant λ is larger, it is possible to calculate the fast
suppression coefficient wf_coef (f, K) in which the influence of the instantaneous input is strongly
reflected.
03-05-2019
24
On the other hand, if the value of the time constant λ is small, the influence of the instantaneous
input diminishes, and although the adaptation speed of the suppression coefficient wf_coef (f, K)
is slow, the suppression coefficient wf_coef (f, K) obtained is the influence of the instantaneous
characteristics. And the noise characteristics of the past are reflected on average.
Therefore, in this case, the naturalness of the sound quality is not easily lost.
[0099]
In addition, the background noise suppression processing unit 25 converts the suppression
coefficient wf_coef (f, K) calculated by the suppression coefficient adaptation unit 24 into the
frequency domain signal X 1 (f, K) of the input signal using equation (11). The multiplication
process is performed to calculate the post-suppression signal Y (f, K), and the post-suppression
signal output unit outputs the post-suppression signal Y (f, K) to the IFFT unit 16.
[0100]
The IFFT unit 16 converts the signal Y (f, K), which is a frequency domain signal, into a time
domain signal y (n), and outputs the time domain signal y (n) to the speech processing device 2
in the subsequent stage.
[0101]
(A-3) Effects of the First Embodiment As described above, according to the first embodiment, the
modGI of the correlation between the front suppression signal and the coherence becomes larger
as the influence of the background noise increases, and the influence is smaller. The time
constant of the Wiener filter (WF) can be controlled based on the characteristic behavior of
becoming smaller.
This enables appropriate coefficient adaptation based on the influence of background noise, and
can improve the accuracy of background noise suppression processing.
[0102]
As a result, by applying the present invention to a communication apparatus such as a television
conference system or a mobile telephone or preprocessing of a speech recognition function,
03-05-2019
25
improvement in performance can be expected.
[0103]
(B) Second Embodiment Next, a second embodiment of the non-target sound suppression
apparatus, method and program according to the present invention will be described with
reference to the drawings.
[0104]
In the second embodiment, a non-target sound suppressing apparatus and method for reducing
an interference sound coming from surroundings by subtracting a front suppression signal from
an input signal, for example, using the present invention And method).
[0105]
When subtracting a frontal suppression signal from an input signal, the frontal suppression
signal is often multiplied by a subtraction coefficient to control the strength of subtraction, and if
the subtraction coefficient is too large, the suppression performance is excessive and distortion
of the target voice increases. If the subtraction factor is too small, the suppression performance
of the disturbing speech is insufficient, and the sound quality is greatly affected.
However, it is difficult to determine the presence of disturbing speech superimposed on the
target speech, and it is difficult to set the subtraction coefficient to an appropriate value.
[0106]
Therefore, in the second embodiment, an unintended sound suppressing device that estimates
the degree of contribution of interference sound to an input signal, controls the subtraction
coefficient of frequency subtraction according to the result, and suppresses interference sound
without excess or deficiency. And a method (interference sound suppression apparatus and
method).
[0107]
(B-1) Configuration of Second Embodiment FIG. 6 is a block diagram showing an overall
configuration of a non-target sound suppressing device 1A according to a second embodiment.
03-05-2019
26
[0108]
The non-target sound suppression apparatus 1A according to the second embodiment is plural
(two are shown in FIG. 1).
The input signals s1 (n) and s2 (n) are acquired from the microphones m_1 and m_2), the
contribution of the disturbing sound to the input signal is estimated, and the subtraction
coefficient of the frequency subtraction is controlled according to the result. The postsuppression signal whose sound has been suppressed is supplied to the speech processing device
2 in the subsequent stage.
[0109]
The speech processing device 2 performs predetermined speech processing using the signal after
suppression from the non-target sound suppression device 1A, as in the first embodiment.
[0110]
As shown in FIG. 6, the unintended sound suppression device 1A includes an FFT unit 11, a front
suppression signal generation unit 12, a coherence calculation unit 13, a correlation calculation
unit 54, a frequency subtraction processing unit 55, and an IFFT unit 16.
[0111]
The FFT unit 11, the front suppression signal generation unit 12, the coherence calculation unit
13, and the IFFT unit 16 are basically the same or corresponding components as described in the
first embodiment, and thus detailed description will be omitted. Do.
[0112]
The non-target sound suppression apparatus 1A may be realized by installing a program (for
example, a non-target sound suppression program) in a computer having a processor, a memory,
etc. In this case, the non-target sound suppression apparatus 1A is functionally Can be shown
using FIG.
03-05-2019
27
A part or all of the non-target sound suppression apparatus 1A may be realized as hardware.
[0113]
The correlation calculation unit 54 acquires the front suppression signal (average front
suppression signal AVE_N (K)) from the front suppression signal generation unit 12 and the
coherence COH (K) from the coherence calculation unit 13 and calculates the average front
suppression signal AVE_N (K). The correlation coefficient cor (K) between X) and the coherence
COH is calculated.
Further, the correlation calculation unit 54 outputs the calculated correlation coefficient cor (K)
to the frequency subtraction processing unit 55.
The calculation method of the correlation coefficient cor (K) can be the same method as that of
the first embodiment, and for example, equation (9) can be used.
[0114]
The frequency subtraction processing unit 55 acquires the input signal X1 (f, K), the correlation
coefficient cor (K) from the correlation calculation unit 54, and the front suppression signal N (f,
K) from the front suppression signal generation unit 12 The subtraction coefficient α is set
based on the correlation coefficient cor (K), and the front suppression signal N (f, K) is multiplied
by the subtraction coefficient α and then subtracted from the input signal X1 (f, K). , Obtain the
signal Y (f, K) after suppression.
[0115]
FIG. 7 is a block diagram showing the configuration of the frequency subtraction processing unit
55 according to the second embodiment.
[0116]
As shown in FIG. 7, the frequency subtraction processing unit 55 includes an input signal
acquisition unit 31, a subtraction coefficient control unit 32, a subtraction unit 33, and a postsubtraction signal output unit 34.
03-05-2019
28
[0117]
The input signal acquisition unit 31 acquires the input signal X1 (f, K), the correlation coefficient
cor (K) from the correlation calculation unit 54, and the front suppression signal N (f, K) from the
front suppression signal generation unit 12. It is a thing.
[0118]
The subtraction coefficient control unit 32 sets the subtraction coefficient α based on the
correlation coefficient cor (K).
[0119]
Here, the disturbing sound (here, the disturbing voice is used.
The principle of estimation of the degree of contribution) is described below.
First, it is assumed that the target sound comes from the front of the microphones m_1 and m_2,
and the interference sound from the lateral direction (right direction, left direction) of the
microphones m_1 and m_2.
[0120]
At this time, when the front suppression signal N (f, K) is “if there is no disturbing sound” and
“when there is a target sound”, the target sound component is large because the signal
component coming from the front is captured. Have a signal value proportional to
However, as shown in FIG. 2, the sound collection level in the front direction is smaller than that
in the lateral direction, so it is smaller than in the case where “jamming noise is present”.
[0121]
Further, the coherence COH is a feature amount having a deep relationship with the arrival
03-05-2019
29
direction of the input signal.
Therefore, it has a large value when there is no interference sound and has only the target sound,
and has a small value when there is interference sound.
[0122]
The above behavior is summarized as follows, focusing on the presence or absence of the
disturbance sound.
[0123]
In the case of “no disturbance sound” and “only the target sound exists”, the coherence COH
is a large value, and the front suppression signal has a value proportional to the size of the target
sound component.
[0124]
The coherence COH is a small value when there is an interference sound, and the front
suppression signal is a large value.
[0125]
This behavior is as follows when the correlation coefficient cor (K) between the front suppression
signal N (f, K) and the coherence COH is introduced.
[0126]
The correlation coefficient cor (K) has a positive value when “jamming noise does not exist”.
[0127]
The correlation coefficient cor (K) has a negative value when “disturbing speech does not
exist”.
[0128]
By the way, it is desirable from the viewpoint of reducing the excess and deficiency of the
interference sound suppression that the subtraction coefficient α has a smaller value as the
influence of the interference sound is smaller and a larger value as the influence of the
03-05-2019
30
interference sound is larger (described later See the equation).
[0129]
As described above, since the positive and negative fluctuates depending on the presence or
absence of interference noise, if the correlation coefficient cor (K) is positive, the subtraction
coefficient α is decreased, and if the correlation coefficient (K) is negative, the subtraction
coefficient α is increased By such processing, control of the subtraction coefficient according to
the degree of influence of the disturbance sound can be realized.
[0130]
Therefore, in the second embodiment, the subtraction coefficient control unit 32 performs
subtraction using frequency subtraction processing based on the specific behavior of the
correlation coefficient cor (K) between the front suppression signal N (f, K) and the coherence
COH. Control the coefficients.
[0131]
More specifically, the subtraction coefficient control unit 32 sets a large value to the subtraction
coefficient α in order to enhance the suppression effect when there is a disturbing voice, and the
suppression effect when there is no disturbing sound. In order to weaken, the subtraction
coefficient α is set to a small value.
[0132]
The subtraction coefficient control unit 32 includes, for example, a subtraction coefficient
storage unit (not shown) that records the correspondence between the value of the correlation
coefficient and the setting value of the subtraction coefficient α, and refers to the subtraction
coefficient storage unit. The subtraction coefficient α may be set.
[0133]
The subtraction unit 33 performs subtraction processing as shown in equation (12) using the
subtraction coefficient α obtained from the subtraction coefficient control unit 32.
Y (f, K) = X1 (f, K) -α × N (f, K) (12)
03-05-2019
31
[0134]
The signal processing unit after subtraction processing 34 outputs the signal after suppression
processing (signal after subtraction processing) Y (f, K) calculated by the subtraction unit 33 to
the IFFT unit 16.
[0135]
(B-2) Operation of Second Embodiment Next, the operation of non-target sound suppression
processing in the non-target sound suppression device 1A according to the second embodiment
will be described in detail with reference to the drawings.
[0136]
Input signals s1 (n) and s2 (n) for one frame (one processing unit) are supplied to the FFT unit 11
from the microphones m_1 and m_2 via an AD converter (not shown).
The FFT unit 11 performs Fourier transform on analysis frames FRAME1 (K) and FRAME2 (K)
based on input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the
frequency domain , X2 (f, K).
The signals X 1 (f, K) and X 2 (f, K) generated by the FFT unit 11 are supplied to the front
suppression signal generation unit 12 and the coherence calculation unit 13.
[0137]
The front suppression signal generator 12 calculates the front suppression signal N (f, K) based
on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11.
Then, the front suppression signal generator 12 calculates the average front suppression signal
AVE_N (K) based on the front suppression signal N (f, K), and supplies the average front
suppression signal AVE_N (K) to the correlation calculator 54.
03-05-2019
32
[0138]
The coherence calculator 13 generates the coherence COH (K) based on the signals X1 (f, K) and
X2 (f, K) from the FFT unit 11 and supplies the coherence COH (K) to the correlation calculator
54.
[0139]
The correlation calculation unit 54 calculates a correlation coefficient cor (K) which is a feature
indicating the relationship between the average front suppression signal AVE_N (K) and the
coherence COH (K) using, for example, the equation (9).
[0140]
The frequency subtraction processing unit 55 receives the input signal X1 (f, K), the correlation
coefficient cor (K) from the correlation calculation unit 54, and the front suppression signal N (f,
K) from the front suppression signal generation unit 12 Be done.
[0141]
FIG. 8 is a flowchart showing processing in the subtraction coefficient control unit 32 of the
frequency subtraction processing unit 55 according to the second embodiment.
[0142]
First, the subtraction coefficient control unit 32 determines whether the value of the correlation
coefficient cor (K) from the correlation calculation unit 54 is negative (S201).
Then, when the value of the correlation coefficient cor (K) is negative (that is, when there is a
disturbing voice), a large value is set to the subtraction coefficient α in order to enhance the
suppression effect (S202).
On the other hand, when the value of the correlation coefficient cor (K) is not negative (ie, when
there is no disturbing sound), the subtraction coefficient α is set to a small value in order to
weaken the suppression effect.
[0143]
03-05-2019
33
Then, using the subtraction coefficient α obtained by the subtraction coefficient control unit 32,
the subtraction unit 33 obtains the signal Y (f, K) after subtraction processing by equation (12),
and the signal output unit 34 after subtraction processing The signal after suppression
processing (signal after subtraction processing) Y (f, K) is output to the IFFT unit 16.
[0144]
The IFFT unit 16 converts the signal Y (f, K), which is a frequency domain signal, into a time
domain signal y (n), and outputs the time domain signal y (n) to the speech processing device 2
in the subsequent stage.
[0145]
(B-3) Effects of Second Embodiment As described above, according to the second embodiment,
when a disturbing voice is present, the correlation coefficient between the front suppression
signal and the coherence is negative, and the disturbing voice is The presence of disturbed
speech superimposed on the target speech is detected based on the characteristic behavior of
being positive if not present, and using this result to control the subtraction coefficient used for
the frequency subtraction process, jamming. The accuracy of the speech suppression process can
be increased.
[0146]
As a result, by applying the present invention to a communication apparatus such as a television
conference system or a mobile telephone or preprocessing of a speech recognition function,
improvement in performance can be expected.
[0147]
(C) Other Embodiments Although various modified embodiments are mentioned in the first and
second embodiments described above, the present invention can also be applied to the following
modified embodiments.
[0148]
(C-1) In the first or second embodiment described above, the suppression coefficient or the
subtraction coefficient may be calculated for each frequency bin.
03-05-2019
34
In this case, the correlation coefficient can also be realized by calculating for each frequency bin.
[0149]
(C-2) In the second embodiment, the presence or absence of the interference sound can be
determined by focusing on the positive and negative of the correlation coefficient, but the
magnitude of the influence of the interference sound is focused on the absolute value of the
correlation coefficient I understand.
If the correlation coefficient is negative and the absolute value is small, the correlation between
the correlation coefficient and the influence of the interference sound is small. If the correlation
coefficient is negative and the absolute value is large, the influence of the interference sound is It
is big.
Therefore, if the input value is small, the output value is small, and if the input value is large, an
arbitrary function (for example, a quadratic function) is prepared, and the absolute value of the
correlation coefficient is input to this. By setting these values as subtraction coefficients,
subtraction coefficients can be set according to the degree of influence (the magnitude of the
absolute value of the correlation) of the interference sound.
[0150]
1 and 1A: non-target sound suppression device, 11: FFT unit, 12: front suppression signal
generation unit, 13: coherence calculation unit, 14: correlation and modGI calculation unit, 15:
WF (winner filter) unit, 54: correlation calculation Part 55: Frequency subtraction processing
part 16: IFFT part.
03-05-2019
35
Документ
Категория
Без категории
Просмотров
0
Размер файла
49 Кб
Теги
jp2018142826
1/--страниц
Пожаловаться на содержимое документа