close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2016122131

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2016122131
Abstract: PROBLEM TO BE SOLVED: To provide a target sound component detection device
capable of correctly detecting a frequency component of a target sound including only one or a
few frequency components. SOLUTION: A device according to the present invention uses a
directivity forming unit for forming a plurality of directivity signals different in a predetermined
direction having a dead angle from an input sound signal, and using a plurality of directivity
signals formed. Obtaining a coherence coefficient calculation unit, a coherence coefficient feature
amount calculation unit which obtains the feature amount representing the magnitude and the
number of times the inclination direction of the signal waveform changes and the coherence
coefficient is regarded as a time change signal; And a target sound component determination unit
that determines a frequency component specific to the target sound based on the magnitude of
the range and the difference degree of the coherence coefficient related to the adjacent
frequency component. [Selected figure] Figure 1
Object sound component detection device and program, object sound extraction device and
program
[0001]
The present invention relates to a target sound component detection device and program, and a
target sound extraction device and program, which can be applied to, for example, extraction of a
target sound including only one or a few frequency components from an input sound signal. .
[0002]
11-04-2019
1
In recent years, sound source separation processing and noise suppression processing have come
to be utilized also for processing of sounds other than voice such as sounds emitted by machines
for inspection of machinery and the like.
The sound emitted by the machine (mechanical sound) includes, for example, only one or a few
frequency components (a signal including one frequency component is a sine wave signal having
that frequency, and in this specification, (Sine wave signal as appropriate) (in other words,
stationary sound in which energy is concentrated at a specific frequency component). Such
mechanical noise is often buried in ambient background noise depending on the measurement
environment. Under such circumstances, in order to accurately extract the target mechanical
sound, it is required that the mechanical sound is not suppressed together with the ambient noise
in the suppression of the ambient noise to be extracted.
[0003]
The sound targeted by the present invention is not limited to mechanical sound, but is a sound
containing only one or a few frequency components, like mechanical sound, and such sound is
hereinafter referred to as abnormal sound.
[0004]
By the way, in patent document 1, based on the arrival direction of an input sound signal, a
section (target sound section) including a target sound arriving from the front and a section
(noise section) other than that are detected, and based on the result. A Wiener filter method has
been already proposed that applies a Wiener filter coefficient to suppress stationary noise (see
Patent Document 2 for a method of controlling the Wiener filter coefficient).
[0005]
JP, 2013-61421, A JP, 2010-532879, A JP, 2014-106337, A
[0006]
The Wiener filter method described in Patent Document 1 estimates a target voice section based
on the difference between the conventional long-term average level of noise and the
instantaneous level, in order to determine the target sound section based on the arrival direction
of the input sound signal. Unlike the method, the method is excellent in that it can accurately
detect the target voice section even under large noise.
11-04-2019
2
[0007]
However, for example, when a sine wave arrives from the front, although the target voice section
detection method described in Patent Document 1 does not have a sine wave abnormal sound as
the target sound despite the fact that it comes from the front, There is a risk that it may be
erroneously determined as a target sound).
[0008]
In the method of detecting a target voice segment of Patent Document 1, the power and the
degree of correlation in each frequency component of two processed signals to which specific
directivity is given to signals captured by two microphones are reflected. The coherence
coefficient coef (f, K) for each frequency bin shown in equation (4) of Patent Document 1 and the
coherence COH (K) shown in equation (5) are used (f is an index representing a frequency bin , K
is an index representing the input frame).
The cause of erroneous determination of the sine wave abnormal noise is that although the
component contained in the input sound signal is concentrated at a specific frequency, the
coherence coefficient coef (f, K) in the frequency bin of the frequency possessed by the sine wave
has a large value, The other frequency bins have small values, so that the coherence COH (K)
obtained by averaging the coherence coefficients coef in all bands becomes a small value and
does not reach the determination threshold Θ of the target sound section.
[0009]
If the target sound section can be accurately detected, the frequency component of the target
sound can be accurately detected, and various processes in the subsequent stage can be
appropriately performed.
However, due to the above-mentioned causes, when the target sound is a sine wave abnormal
noise, the detection accuracy is low, and there is a high possibility that various kinds of
processing in the subsequent stage may be inappropriate.
[0010]
11-04-2019
3
Therefore, when the abnormal sound including only one or a few frequency components is the
target sound, a target sound component detection device and program capable of correctly
detecting the component of the target sound are desired.
There is also a need for a target sound extraction apparatus and program that can appropriately
extract the target sound when the abnormal sound containing only one or a few frequency
components is the target sound.
[0011]
According to a first aspect of the present invention, there is provided a target sound component
detection device that uses a sound including one or a small number of frequency components as
a target sound and detects from the input sound signal a frequency component specific to the
target sound included in the input sound signal. (1) A plurality of directivity signals provided
with directivity characteristics having a dead angle in a predetermined direction by performing
delay subtraction processing on an input sound signal, and a plurality of directivity having
different predetermined directions having a blind angle (2) a coherence coefficient calculation
unit for obtaining a coherence coefficient using a plurality of directional signals formed; (3)
capturing the obtained coherence coefficient as a time change signal; Coherence coefficient
feature quantity calculation means for obtaining a coherence factor feature quantity representing
the number of times the inclination direction of the signal waveform changes and its magnitude;
(4) the magnitude of the range of the obtained coherence factor feature quantity and the adjacent
Based on the difference degree of coherence factor according to the frequency components, and
having a target sound component determining means for determining a specific frequency
component in the target sound.
[0012]
The second aspect of the present invention is an object sound applied as an object sound
including one or a small number of frequency components and for detecting a frequency
component specific to the object sound included in the input sound signal from an input sound
signal. A component detection program, comprising: (1) a plurality of directivity signals having
directivity characteristics having a dead angle in a predetermined direction by performing delay
subtraction processing on an input sound signal; Directivity forming means for forming a
plurality of directivity signals different in azimuth; (2) coherence coefficient calculating means
for obtaining a coherence coefficient using the formed plurality of directivity signals; and (3) the
obtained coherence factor Coherence coefficient feature quantity calculation means for obtaining
a coherence factor feature quantity that represents a time change signal and indicates the
number of times the inclination direction of the signal waveform changes and the magnitude
11-04-2019
4
thereof; (4) the obtained coheren And the magnitude of the coefficients characteristic of range,
based on the difference degree of coherence factor according to adjacent frequency components,
wherein the function as the target sound component determining means for determining a
specific frequency component in the target sound.
[0013]
A third invention of the present invention is a target sound extraction device for setting a sound
including one or a few frequency components as a target sound and extracting a target sound
included in an input sound signal, wherein (1) the first present invention (2) target sound
component detection device, and (2) an input sound signal based on information on frequency
components unique to the target sound determined by the target sound component
determination means in the target sound component detection device and other frequency
components The target sound included in the input sound signal is extracted by suppressing
frequency components other than the frequency component specific to the target sound in the
above or by increasing the frequency component specific to the target sound in the input sound
signal And extraction means.
[0014]
A fourth aspect of the present invention is a target sound extraction program applied to a target
sound that is a sound including one or a few frequency components and that is used to extract a
target sound included in an input sound signal. 1) It functions as a target sound component
detection program according to the second aspect of the present invention, and (2) frequency
components unique to the target sound determined by the target sound component
determination means in the target sound component detection program, and others The input
sound signal by suppressing frequency components other than the frequency component specific
to the target sound in the input sound signal or increasing the frequency component specific to
the target sound in the input sound signal based on the information on the frequency
components of It is characterized in that it functions as a target sound extraction means for
extracting the target sound included in.
[0015]
According to the present invention, it is possible to realize a target sound component detection
device and program capable of correctly detecting the component that the target sound has
when the abnormal sound containing only one or a few frequency components is the target
sound.
[0016]
11-04-2019
5
Further, according to another aspect of the present invention, it is possible to realize a target
sound extraction device and program capable of appropriately extracting the target sound when
the abnormal sound including only one or a few frequency components is the target sound.
[0017]
It is a block diagram showing composition of an object sound extraction device concerning a 1st
embodiment.
It is explanatory drawing which shows the value of the coherence coefficient for every frequency
component (frequency bin) in, when the frequency component contained in abnormal noise is
one (when abnormal noise is a sine wave abnormal noise).
It is a block diagram which shows the detailed structure of the target sound / target sound
component detection part in the target sound extraction apparatus of 1st Embodiment, and a
target sound extraction part.
It is a flowchart which shows the case where the function of the front noise detection part of FIG.
3 and the noise containing frequency component detection part is implement | achieved by
software.
[0018]
(A) First Embodiment A target sound component detection apparatus and program, and a target
sound extraction apparatus and program according to a first embodiment of the present
invention will be described with reference to the drawings.
[0019]
The target sound component detection device according to the first embodiment detects a
component of the target sound by using an abnormal sound including only one or a few
frequency components as the target sound.
11-04-2019
6
The target sound extraction device according to the first embodiment incorporates the target
sound component detection device according to the first embodiment, and reduces non-target
sound components in the input sound signal by using the detection result of the target sound
components. By doing this, it is intended to extract (the signal of) the target sound from the input
sound signal.
In the first embodiment, the Wiener filter method is applied as a method of suppressing nontarget sound components in the input sound signal.
[0020]
(A-1) Parameters to be applied and reasons for the application In the target sound extraction
device according to the first embodiment, in addition to the coherence coefficient coef (f, K) used
in the description technique of Patent Document 1, the coherence coefficient coef The modGI
value modGI (f, K) shown in the equation (1) for (f, K) is applied.
In the equation (1), the coherence coefficient coef (f, K) is represented by s (K), and the modGI
value modGI (f, K) is represented by modGI.
[0021]
The modGI value will be briefly described (see Patent Document 3 for details).
modGI means modified gradient index (hereinafter referred to as GI).
[0022]
For the GI before being modified, reference is made to the reference document "Naofumi Aoki,"
"A Band Extension Technique for Narrow Band Telephony Speech Based on Full Wave
Rectification", IEICE Trans.
11-04-2019
7
Commun.,Vol.
E93-B (3), pp. 729-731, 2010 ".
[0023]
GI is an index for measuring the number of times the inclination direction of the signal waveform
changes and its magnitude. GI is obtained by dividing the sum of absolute differences of
successive samples when the direction of inclination changes by the square root of the power of
the frame. Therefore, GI tends to increase as the number of changes in inclination in one frame
increases, and increases as the amount of change when the inclination changes increases. From
such a property, it can be said that GI is directly connected to the amount of high frequency
components contained in the input waveform.
[0024]
However, since GI uses a parameter that takes only a binary value of 0 or 2 that is a variable Δn
(n) and a large number of jumps with time occur frequently, the value becomes irregularly large
or small. It has the characteristic of "dooming".
[0025]
Since modGI has the property that the GI value goes wild (it has a large jump), it has a high
correlation with the GI, but has a high correlation with the GI, and a new feature that the change
with a large jump is stabilized. It has been proposed as a quantity.
modGI is the power of the second-order difference of the calculation target signal normalized by
the “power of the calculation target signal” for an arbitrary signal (coherence coefficient in the
present application) of the feature amount calculation target (this is multiplied by a constant Are
also included)).
[0026]
Since modGI has high correlation with GI, it functions as a stable index for measuring the number
of times the signal waveform inclination direction changes and its magnitude, and also reflects
11-04-2019
8
the amount of high frequency components included in the input waveform Function.
[0027]
Here, the behavior of the modGI value in the case where an abnormal noise of a sine wave arrives
from the front is examined.
When modGI value modGI (f, K) is calculated for the coherence coefficient coef (f, K) for each
frequency component, the coherence coefficient coef (f, K) is substantially constant (this value)
for frequency components of frequencies possessed by sine waves. Although the value is large),
the modGI value modGI (f, K) is a minute value because the characteristic is close to a direct
current signal. Further, since there is no input at frequency components other than the frequency
possessed by sine waves, the coherence coefficient coef (f, K) is a minute and almost constant
value, so that it has characteristics like a DC signal, modGI value modGI (f, K) is a minute value.
[0028]
Comparing the behavior of the coherence coefficient coef (f, K) with the modGI value modGI (f, K)
for the frequency components included in the above-mentioned incoming sine wave abnormal
noise and the other frequency components, (a1) sinusoidal abnormal noise The coherence factor
coef (f, K) has a large steady-state value in the frequency components included in the equation,
and the modGI value modGI (f, K) has a small value in the modGI value. The coefficient coef (f, K)
is an extremely small steady-state value, and the modGI value modGI (f, K) is a minute value.
[0029]
Therefore, by utilizing these behaviors (a1) and (a2), when the modGI value modGI (f, K) falls
within a certain range in all frequency bins in (b1) frame, the sine as the target sound from the
front It can be determined that wave noise has arrived, and (b2) if the difference in coherence
coefficient between adjacent frequency bins in a frame is large, one of the frequency bins has a
frequency component included in sine wave noise. If the difference is small, it can be determined
that both frequency bins do not have frequency components of sinusoidal abnormal noise.
[0030]
In the above, although the sine wave abnormal noise having one frequency component has been
described, the same can be said for an abnormal noise with few frequency components included.
11-04-2019
9
[0031]
As described above, if it is possible to accurately detect the frequency component of the
abnormal sound that is the target sound coming from the front, the non-target sound component
in the input sound signal can also be appropriately suppressed. It can be extracted.
[0032]
The target sound extraction apparatus according to the first embodiment is one to which the
above-described idea is applied, and is intended to accurately extract abnormal noise coming
from the front.
[0033]
(A-2) Configuration of First Embodiment FIG. 1 is a block diagram showing a configuration of a
target sound extraction device 10 according to the first embodiment.
[0034]
The target sound extraction device 10 of the first embodiment may be constructed by connecting
various hardware-related components, and some components (for example, a microphone, analog
/ digital conversion, etc.) The parts (parts except the A / D conversion part) may be constructed
so as to realize the functions by applying an execution configuration of programs such as a CPU,
a ROM, and a RAM.
Regardless of which construction method is applied, the functional detailed configuration of the
target sound extraction device 10 is the configuration shown in FIG.
[0035]
In FIG. 1, a target sound extraction device 10 according to the first embodiment includes a pair of
microphones m1 and m2, an FFT unit 11, a first directivity forming unit 12, a second directivity
forming unit 13, and a coherence coefficient calculation. The unit 14 has a modGI calculation
unit 15, a target sound / target sound component detection unit 16, a target sound extraction
unit 17, and an IFFT (Inverse Fast Fourier Transform) unit 18.
[0036]
11-04-2019
10
The pair of microphones m1 and m2 are disposed to be separated by a predetermined distance
(or an arbitrary distance), and each capture surrounding sound.
Each of the microphones m1 and m2 is non-directional (or has very slight directivity in the front
direction).
The acoustic signals (input sound signals) captured by the microphones m1 and m2 are
converted into digital signals s1 (n) and s2 (n) through corresponding A / D conversion units (not
shown) and supplied to the FFT unit 11 Be
Here, n is an index representing the order of sample input, and is represented by a positive
integer.
In the text, it is assumed that the smaller n is the older input sample, and the larger n is the
newer input sample.
[0037]
The FFT unit 11 receives input sound signal sequences s1 (n) and s2 (n) from the microphones
m1 and m2, and performs fast Fourier transform (or discrete Fourier transform) on the input
sound signals s1 and s2.
Thereby, the input sound signals s1 and s2 are expressed in the frequency domain.
In addition, when performing fast Fourier transformation, analysis frames FRAME1 (K) and
FRAME2 (K) consisting of predetermined N samples are configured and applied from input sound
signals s1 (n) and s2 (n). Although the example which comprises analysis frame FRAME1 (K)
from input sound signal s1 (n) is shown to the following (2) Formula, analysis frame FRAME 2 (K)
is the same.
11-04-2019
11
[0038]
Here, K is an index representing the order of frames, and is expressed by a positive integer. In the
text, the smaller K is the older analysis frame, and the larger K is the newer analysis frame.
Further, in the following description, it is assumed that the index representing the latest analysis
frame to be analyzed is K, unless otherwise specified.
[0039]
The FFT unit 11 performs fast Fourier transform processing for each analysis frame to convert it
into frequency domain signals X1 (f, K) and X2 (f, K), and obtains the obtained frequency domain
signals X1 (f, K). And X 2 (f, K) to the first directivity forming unit 12 and the second directivity
forming unit 13. Further, the FFT unit 11 supplies one frequency domain signal X 1 (f, K) to the
target sound extraction unit 17. Here, f is an index representing a frequency. In addition, X1 (f, K)
is not a single value, and as shown in equation (3), X1 (f, K) is composed of spectrum components
of a plurality of frequencies f1 to fm. The same applies to X2 (f, K) and B1 (f, K) and B2 (f, K)
described later. X1(f,K)={(f1,K),(f2,K),…,(fm,K)} …(3)
[0040]
The first directivity forming unit 11 forms a signal B1 (f, K) having strong directivity in a specific
direction from the two frequency domain signals X1 (f, K) and X2 (f, K), and The directivity
forming unit 12 generates a signal B2 (f, K) having strong directivity in a specific direction
(different from the above-described specific direction) from the two frequency domain signals X1
(f, K) and X2 (f, K). It forms. An existing method can be applied as a method of forming the
signals B1 (f, K) and B2 (f, K) having strong directivity in a specific direction. For example,
applying the equation (4), directivity is strong in the right direction. By applying B1 (f, K) and the
equation (5), B2 (f, K) having strong directivity in the left direction can be formed. In the
equations (4) and (5), the frame index K is omitted because it is not involved in the operation.
[0041]
The coherence coefficient calculation unit 14 obtains the coherence coefficient coef (f, K) by
11-04-2019
12
performing the operation shown in equation (6) based on the two directional signals B1 (f) and
B2 (f) described above. is there. Note that B2 (f) <*> in the equation (6) is a conjugate complex
number of B2 (f).
[0042]
The coherence factor coef (f, K) conceptually represents the correlation for a certain frequency
component of the signal coming from the right and the signal coming from the left. Therefore,
the case where the coherence coefficient coef (f, K) is small is the case where the correlation of
the frequency components of the two directional signals B1 and B2 is small, and conversely, the
case where the coherence coefficient coef (f, K) is large Can be reworded as the case where the
correlation is large. If the correlation is small, the direction of arrival of the frequency component
in the input sound signal is largely biased to either right or left, or even if there is no bias, a
correlation such as noise is less likely to appear. In the case of a small amount of ingredients.
Therefore, when the value of the coherence coefficient coef (f, K) is large, there is no bias in the
arrival direction, so it can be said that the component in the input sound signal comes from the
front.
[0043]
However, in the case of abnormal noise, the frequency component included is one or a small
number, so when abnormal noise comes from the front, the coherence coefficient coef (f, K) for
the included frequency component Only the value of will increase. FIG. 2 shows the value (solid
line) of the coherence coefficient coef (f, K) for each frequency component (frequency bin) when
there is one frequency component included in the noise (when the noise is a sine wave noise) ) Is
shown. In FIG. 2, the value of the coherence coefficient coef (f, K) for each frequency component
(frequency bin) in the audio signal is indicated by a broken line for reference.
[0044]
The coherence coefficient calculation unit 14 supplies the obtained coherence coefficient coef (f,
K) to the modGI calculation unit 15 and the target sound / target sound component detection
unit 16.
[0045]
11-04-2019
13
The modGI calculation unit 15 calculates modGI values modGI (f, K) for the coherence coefficient
coef (f, K) for each frequency component, and obtains the obtained modGI values modGI (f, K) as
target sound / target sound components. This is given to the detection unit 16.
The above formula (1) is applied as a formula for calculating modGI value modGI (f, K), and the
coherence factor coef (f, K) is substituted into the calculation target signal s (K) of formula (1) to
obtain modGI value. Calculate modGI (f, K). Although the formula (1) is the same calculation
formula as the formula (13) of Patent Document 3, the modGI value is obtained by applying the
formulas (5) and (10) to (12) described in Patent Document 3. modGI (f, K) may be calculated.
[0046]
The target sound / target sound component detection unit 16 determines whether the input
sound signal is the target sound (noise) based on the coherence coefficient coef (f, K) and the
modGI value modGI (f, K). If it is the target sound, it is determined whether or not each frequency
component (frequency bin) is a frequency component unique to the target sound, and a
determination result ctrl_WF (f, K) indicating whether it is a unique frequency component or not
It is given to the target sound extraction unit 17. The determination in the target sound / target
sound component detection unit 16 is the determination of (b1) and (b2) described above.
[0047]
The target sound extraction unit 17 suppresses the non-target sound component in the given
frequency domain signal X1 (f, K) based on the determination result ctrl_WF (f, K) to obtain the
frequency of the target sound (noise). The component is emphasized (target sound is extracted),
and the obtained target sound extraction signal Y (f, K) is provided to the IFFT unit 18. The target
sound extraction unit 17 applies the Wiener filter method to extract the target sound, and a
known method can be applied as the extraction method. In addition, as a control method of a
winner filter coefficient, the control method of patent document 2 is applicable.
[0048]
The IFFT unit 18 converts a target sound extraction signal Y (f, K) which is a frequency domain
11-04-2019
14
signal into a time domain signal y (n). The IFFT unit 18 can be omitted if the post-stage circuit
can be configured to process the frequency domain signal Y (f, K) as it is.
[0049]
FIG. 3 is a block diagram showing the detailed configurations of the target sound / target sound
component detection unit 16 and the target sound extraction unit 17. As shown in FIG.
Hereinafter, a portion including the target sound / target sound component detection unit 16 and
the target sound extraction unit 17 will be referred to as a WF control / target sound extraction
processing unit 20.
[0050]
In FIG. 3, the WF control / target sound extraction processing unit 20 includes an input signal
reception unit 21, a front noise detection unit 22, a noise containing frequency component
detection unit 23, a WF coefficient adaptation unit 24, a WF coefficient multiplication unit 25,
and a purpose. A sound extraction signal transmission unit 26 is provided.
[0051]
The input signal receiving unit 21 receives the frequency domain signal X1 (f, K) from the first
directivity forming unit 12, the coherence coefficient coef (f, K) from the coherence coefficient
calculating unit 14, and the modGI from the modGI calculating unit 15. It takes in the value
modGI (f, K).
[0052]
The frontal noise detection unit 22 determines whether the current input sound signal is in the
frontal noise zone.
The front abnormal noise detection unit 22 determines the above-described (b1), that is,
determines whether the modGI value modGI (f, K) falls within a predetermined range in all
frequency bins in a frame.
The front noise detection unit 22 determines that the target sound (noise) is coming from the
11-04-2019
15
front when it falls within a certain range.
[0053]
As long as the determination of (b1) described above can be performed, the specific
determination method by the front abnormal noise detection unit 22 is not limited. Below, an
example of a specific determination method is described. The front noise detection unit 22 sets
modGI values modGI (START, K) to modGI (END, K) of all frequency components (START
represents the first (lowest) frequency bin, END represents the last (highest) The minimum value
and the maximum value in frequency bins are searched using a known search algorithm, and the
difference range_modGI (K) (= maximum value−minimum value) is compared with the threshold
Ψ, and the difference range_modGI When (K) is smaller than the threshold Ψ, it is determined
that the current frame K is a frame within the front abnormal noise section.
[0054]
Other determination methods can be briefly described as a method of calculating standard
deviations, variances or variation coefficients of modGI values modGI (START, K) to modGI (END,
K) and comparing them with a threshold.
[0055]
The abnormal noise-containing frequency component detection unit 23 specifies a frequency
component unique to the target sound (noise) when the target sound (noise) is coming from the
front, and the frequency component and the other frequency components To give a
distinguishing mark.
The abnormal noise-containing frequency component detection unit 23 determines the abovedescribed determination (b2), that is, determines whether or not the difference in coherence
coefficient between adjacent frequency bins in a frame is large. When the abnormal noise
containing frequency component detection unit 23 determines that the difference in coherence
coefficient between adjacent frequency bins in a frame is large, the frequency bin (frequency
component) having the larger coherence coefficient is included in the abnormal noise. Identify as
a frequency component.
11-04-2019
16
[0056]
If the determination of (b2) described above can be performed, a specific determination method
by the abnormal noise-containing frequency component detection unit 23 is not limited. Below,
an example of a specific determination method is described. The all-tone-containing frequency
component detection unit 23 sets the coherence coefficients coef (f−1, K) and coef (f, K) of
adjacent frequency bins (where f−1 is the frequency bin immediately before the frequency bin
f). The ratio rate_coef (represented) is calculated and compared with the threshold Φ. If the ratio
rate_coef exceeds the threshold Φ, it is determined that the frequency bin f contains the
frequency component of the target sound. The all noise containing frequency component
detection unit 23 supplies the determination result ctrl_WF (f, K) for each frequency bin to the
WF coefficient adaptation unit 24 as a parameter indicating whether it is a suppression
frequency component in the Wiener filter.
[0057]
To briefly describe another determination method, when the difference coef (f, K) −coef (f−1, K)
between the coherence coefficients of adjacent frequency bins is equal to or greater than a
threshold, the target sound is A method of determining that a frequency component is included
can be mentioned.
[0058]
The WF coefficient adaptation unit 24 and the WF coefficient multiplication unit 25 together
with the formation configuration of the determination result ctrl_WF (f, K) constitute a winner
filter.
[0059]
The WF coefficient adaptation unit 24 adaptively controls the Wiener filter coefficient for the
frequency component to be suppressed.
As described above, as the control method of the Wiener filter coefficient, the control method
described in Patent Document 2 can be applied.
[0060]
11-04-2019
17
The WF coefficient multiplication unit 25 is a winner obtained by adaptively controlling the WF
coefficient adaptation unit 24 with respect to the frequency component to be suppressed in the
frequency domain signal X1 (f, K) from the first directivity formation unit 12. It is to suppress by
multiplying the filter coefficient.
In other words, the WF coefficient multiplication unit 25 emphasizes (extracts) the frequency
component of the target sound (noise) in the frequency domain signal X1 (f, K).
[0061]
The target sound extraction signal transmission unit 26 supplies the target sound extraction
signal Y (f, K) output from the WF coefficient multiplication unit 25 to the IFFT unit 18.
[0062]
(A-3) Operation of the First Embodiment Next, the operation of the target sound extraction
apparatus 10 according to the first embodiment will be described with reference to the drawings.
The order of operation will be described.
[0063]
Signals s1 (n) and s2 (n) input from the pair of microphones m1 and m2 are converted from time
domain to frequency domain signals X1 (f, K) and X2 (f, K) by the FFT unit 11, respectively. After
that, directional signals B1 (f, K) and B2 (f, K) having a dead angle in a predetermined direction
are generated by the first and second directivity forming units 12 and 13, respectively.
Then, in the coherence coefficient calculation unit 14, the directional signals B1 (f, K) and B2 (f,
K) are applied, the calculation of the equation (6) is executed, and the coherence coefficient coef
(f, K) is calculated. It is given to the modGI calculation unit 15 and the target sound / target
sound component detection unit 16.
In the modGI calculation unit 15, the modGI value modGI (f, K) for the coherence coefficient coef
(f, K) is calculated according to, for example, the equation (1), and the modGI calculation unit 15
and the target sound / target sound component detection unit 16 Given.
11-04-2019
18
[0064]
The target sound / target sound component detection unit 16 determines whether the input
sound signal is the target sound (noise) based on the coherence coefficient coef (f, K) and the
modGI value modGI (f, K). If the target sound is selected, it is further determined whether each
frequency component (frequency bin) is a frequency component unique to the target sound, and
a determination result ctrl_WF (f, which indicates whether the frequency component is a unique
frequency component). K) is given to the target sound extraction unit 17. In the target sound
extraction unit 17, the non-target sound component in the frequency domain signal X1 (f, K) is
suppressed based on the determination result ctrl_WF (f, K), and the frequency component of the
target sound (noise) is emphasized. The obtained target sound extraction signal Y (f, K) is
converted to a time domain signal y (n) by the IFFT unit 18 and applied to the subsequent stage
circuit.
[0065]
The following operations are executed in the WF control / target sound extraction processing
unit 20 to which the target sound / target sound component detection unit 16 and the target
sound extraction unit 17 correspond.
[0066]
In the front abnormal noise detection unit 22, the modGI value modGI value modGI (START, K) to
modGI (END) of all frequency bins of the current frame are obtained based on the modGI value
modGI (f, K) acquired by the input signal reception unit 21. , K) is determined within a certain
range, and if within a certain range, it is determined that the target sound (noise) is coming from
the front.
[0067]
In the abnormal noise containing frequency component detection unit 23, when the front
abnormal noise detection unit 22 determines that the target sound (noise) is coming from the
front, the coherence coefficient coef (taken by the input signal reception unit 21) Based on f and
K), it is determined whether or not the difference in coherence coefficient is large between
adjacent frequency bins in the current frame, and if the difference in coherence coefficient is
large between frequency bins, the frequency with the larger coherence coefficient The bin
(frequency component) is identified as the frequency component included in the abnormal noise,
and the other frequency components are determined to be suppressed.
11-04-2019
19
[0068]
The WF coefficient adaptation unit 24 adaptively controls the Wiener filter coefficient for the
frequency component to be suppressed, and the WF coefficient multiplication unit 25 suppresses
the frequency domain signal X1 (f, K) captured by the input signal reception unit 21. The target
frequency component is multiplied by the adaptively controlled Wiener filter coefficient to
suppress the frequency component to be suppressed in the frequency domain signal X1 (f, K).
Thereby, the frequency component of the target sound (noise) in the frequency domain signal X
1 (f, K) is emphasized (extracted), and the target sound extraction signal Y (f, K) obtained is given
to the IFFT unit 18.
[0069]
FIG. 4 is a flowchart mainly showing a case where the functions of the front abnormal noise
detection unit 22 and the abnormal noise containing frequency component detection unit 23 are
realized by software.
The process shown in FIG. 4 is repeatedly executed each time the process target frame K is
switched to a new frame.
[0070]
When the new frame becomes the processing target frame K, first, the coherence coefficient coef
(f, K) and the modGI value modGI (f, K) for the frame K are fetched (step S100).
[0071]
And first, the maximum value MAX (modGI (f, K)) and the minimum value MIN (modGI (f, K))
among modGI values modGI (START, K) to modGI (END, K) of all frequency components Is
searched for (step S101), and the difference range_modGI (K) between the maximum value MAX
(modGI (f, K)) obtained by the search and the minimum value MIN (modGI (f, K)) is calculated
(step S102). ).
11-04-2019
20
[0072]
Then, the calculated difference range_modGI (K) is compared with the threshold Ψ (step S103).
Here, the threshold Ψ is predetermined by simulation or the like, and is selected as a value that
can separate the difference range_modGI (K) relating to the target sound (noise) and the
difference range _ modGI (K) relating to the non-target sound. Ru.
[0073]
When the calculated difference range_modGI (K) is smaller than the threshold value Ψ, it is
assumed that the current process target frame K is a frame within the front abnormal noise
section, and the following loop processing RP repeated for each frequency bin is performed.
[0074]
On the other hand, when the calculated difference range_modGI (K) is equal to or larger than the
threshold value 、, the present process target frame K is a frame within the section of the nontarget sound, and a Wiener filter predetermined for the frame The setting for application to nonapplication is performed (step S104).
For example, all processing in the subsequent stage may be stopped and the input sound signal
may be bypassed, or, for example, a Wiener filter may be operated on all frequency components
to suppress background noise.
[0075]
The loop processing RP is repeated while the parameter f defining the frequency bin is
incremented (or decremented) from the smallest (or largest) value START to the largest (or
smallest) value END.
[0076]
11-04-2019
21
In the loop processing RP, first, the coherence coefficient coef (f, K) of the frequency bin defined
by the parameter f with respect to the coherence coefficient coef (f−1, K) of the frequency bin
immediately preceding the frequency bin defined by the parameter f The ratio rate_coef (f, K) is
calculated (step S105).
Here, when the parameter f is the smallest value START, the previous frequency bin does not
exist, but the coherence coefficient coef (f−1, K) to be applied at this time is determined in
advance.
Here, an average value or the like of frequency bins having no frequency component of the target
sound (noise) may be calculated by simulation or the like, and may be set in advance as a value.
[0077]
When the ratio rate_coef (f, K) of the two coherence factors is calculated, the calculated ratio
rate_coef (f, K) is compared with the threshold Φ (step S106).
The threshold value Φ here is also predetermined by simulation etc., and the value of the ratio
when the frequency bin specified by the parameter f includes a frequency component specific to
the target sound (noise) and the frequency specified by the parameter f It is selected to be a
value that can be separated from the ratio value when the bin does not include the frequency
component specific to the target sound (noise).
[0078]
If the ratio of the two coherence factors, rate_coef (f, K), is less than or equal to the threshold Φ,
then the frequency bin specified by the parameter f does not include frequency components
specific to the target sound (noise), and Wiener filtering is performed. So that “0” is set to the
determination result ctrl_WF (f, K) (step S 107), while the parameter f defines if the ratio of the
two coherence coefficients rate_coef (f, K) is greater than the threshold Φ Assuming that the
frequency bin contains a frequency component specific to the target sound (noise), “1” is set
as the determination result ctrl_WF (f, K) so that the Wiener filter process is not performed (step
S108).
[0079]
11-04-2019
22
When the loop processing RP when the parameter f is END is completed, the parameter K is
incremented (step S109), the new frame becomes the processing target frame K, and the abovedescribed processing is repeated.
[0080]
(A-4) Effects of First Embodiment According to the first embodiment, even when the abnormal
sound including only one or a few frequency components is the target sound, the frequency
component specific to the target sound in the input sound signal Can be detected correctly.
[0081]
Since the frequency component unique to the target sound can be separated from the frequency
component in the non-target sound, the accuracy of the suppression of the noise component
using the information (in other words, the enhancement of the target sound) can be made high.
[0082]
The target sound in which the non-target sound component is suppressed has its feature
emphasized, so that the accuracy, efficiency, etc. of processing using it can be improved.
For example, when the frequency characteristic of the obtained target sound is collated with the
frequency characteristic of the ideal target sound registered in advance, when diagnosing the
normality of the machine, the diagnostic accuracy can be improved.
[0083]
(B) Other Embodiments In the description of the first embodiment, various modified
embodiments are mentioned, but further, modified embodiments as exemplified below can be
mentioned.
[0084]
In the first embodiment, the case where modGI is applied is shown, but GI before being corrected
is also an index for measuring the number of times the inclination direction of the signal
waveform changes and its magnitude, so in the first embodiment GI may be applied instead of
modGI.
11-04-2019
23
[0085]
In the first embodiment, the processing that has been processed with the frequency domain
signal may be processed with the time domain signal, if possible.
[0086]
The present invention is characterized in the configuration after obtaining the coherence factor,
and the previous configuration is not necessarily limited to that of the first embodiment.
For example, signals of a microphone array having three or more microphones may be processed
to obtain a coherence coefficient, and then modGI (or GI) may be calculated to detect a frequency
component specific to the target sound.
[0087]
Although in the first embodiment, information on frequency components specific to the target
sound and information on frequency components other than the target sound are used in the
Wiener filter, the information on the separated frequency components is used for this purpose.
Of course, it is not limited.
Further, it goes without saying that the target sound enhancement method (noise suppression
method) to be applied is not limited to the Wiener filter method.
Although the case where the target sound is extracted by suppressing the frequency component
of the non-target sound is described above, the target sound is extracted by increasing the
frequency component of the target sound instead or in addition to this. You may do it.
[0088]
Although the first embodiment shows an apparatus and program for immediately processing
signals captured by a pair of microphones, the present invention is also applicable to the case
where the signals captured by the pair of microphones are recorded on a recording medium and
reproduced. Can be applied.
11-04-2019
24
[0089]
DESCRIPTION OF SYMBOLS 10 ... Target sound extraction apparatus, m1, m2 ... Microphone, 11
... FFT part, 12, 13 ... Directionality formation part, 14 ... Coherence coefficient calculation part,
15 ... modGI calculation part, 16 ... Target sound and target sound component detection part 17
17 target sound extraction unit 18 IFFT unit.
11-04-2019
25
Документ
Категория
Без категории
Просмотров
0
Размер файла
38 Кб
Теги
description, jp2016122131
1/--страниц
Пожаловаться на содержимое документа