close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2015505069

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2015505069
A method of processing digitized microphone signal data to detect wind noise. The first and
second sets of signal samples are obtained simultaneously from the two microphones. A first
number of samples in the first set above the first predefined comparison threshold is determined.
A second number of samples below the first predefined comparison threshold in the first set is
determined. A third number of samples in the second set above the second predefined
comparison threshold is determined. A fourth number of samples below the second predefined
comparison threshold in the second set is determined. Wind noise is present if the first and
second numbers differ from the third and fourth numbers by more than a predefined detection
threshold, as determined by, for example, a chi-square test The display is output.
Method and apparatus for wind noise detection
[0001]
CROSS-REFERENCE TO RELATED APPLICATIONS This application claims the benefit of Australian
Provisional Patent Application No. 2011905381, filed Dec. 22, 2011, and Australian Provisional
Patent Application No. 2012903050, filed Jul. 17, 2012. , Which are incorporated herein by
reference.
[0002]
The present invention relates to the digital processing of signals from microphones and other
such transducers, and in particular to detect the presence of wind noise or the like in such
signals, for example to be able to initiate or control compensation of wind noise. Apparatus and
method for
11-04-2019
1
[0003]
As used herein, wind noise refers to turbulence in the air stream flowing past the microphone
port, as opposed to wind noise blowing on other objects, such as the sound of a leaf obtained
when the wind blows on a tree in the far field. It is defined as the microphone signal that
originates from the stream.
Wind noise may mask other signals that may be objectionable to the user and / or of interest.
Preferably, the digital signal processor is configured to take measures to improve the adverse
effects of wind noise on signal quality. To do so, in order to reliably detect wind noise as it
occurs, without actually being detected as wind noise when other factors are affecting the signal.
Suitable means are needed.
[0004]
Conventional approaches to wind noise detection (WND) produce sounds that are not wind in the
far field, so wind noise is substantially uncorrelated between microphones, but similar noises in
each microphone It is assumed to have pressure level (SPL) and phase. However, for non-windy
sounds generated in the far-field, the SPL between the microphones is substantially due to local
acoustic reflections, room echoes, and / or differences in microphone coverage, obstructions, or
locations. It can be different. Significant differences in SPL between microphones can also be
caused by non-wind generated sounds in the near field, such as telephone handsets being used
near microphones. Differences in microphone output signals may also be caused by differences
in microphone sensitivity, ie mismatched microphones, which may result in loose manufacturing
tolerances for a given model of microphones, or use of microphones of different models in the
system It can be attributed to
[0005]
Due to the spacing between the microphones, the phase of the non-wind sound will be different
at the sound introduction portion of each microphone, as long as the sound does not reach from
the direction of simultaneously reaching both microphones. In directional microphone
11-04-2019
2
applications, the axis of the microphone array is usually oriented towards the desired source,
which causes the worst time delay and hence the largest phase difference between the
microphones.
[0006]
When the wavelength of the received sound far exceeds the spacing between the microphones,
the microphone signals are fairly well correlated, and the conventional WND method will not
falsely detect low frequency winds. However, when the wavelength of the received sound reaches
the spacing of the microphones, the phase difference weakens the correlation of the microphone
signals and a sound that is not wind can be erroneously detected as wind. The wider the distance
between the microphones, the lower the frequency at which the non-wind sound is erroneously
detected as wind, ie, the wider the part of the audible range where the erroneous detection
occurs. Depending on the hardware configuration and wind speed, wind noise detection works
satisfactorily over most if not all of the audible range, given that wind noise in the hearing aid
microphone can range from less than 100 Hz to more than 8000 Hz. It is desirable to be able to
detect wind noise and to activate suitable suppression means only in the sub-bands where wind
noise is a problem. False positives can also occur due to other causes of phase differences
between the microphone signals, such as local acoustic reflections, room reverberations, and / or
differences in the phase response of the microphone or the length of the inlet.
[0007]
Existing approaches to WND include three techniques, referred to herein as correlation,
difference, and difference-sum. These are briefly described below.
[0008]
First, in the correlation method presented in US Pat. No. 7,340,068, the two microphone signals
are low-pass filtered (fc = 1 kHz) and then the following equation: (N) and y (n) are samples of the
outputs of microphones x and y, respectively, with l = 0 for zero correlation lag and k = 0 for
single sample correlation, or k> 0 for correlation across sample blocks Cross-correlations and
autocorrelations are calculated using The detector output D should theoretically reach 1 for nonwind sounds, where x (n) and y (n) need to be similar, and to 0 for wind noise It needs to have a
tendency towards, here x (n) and y (n) need not be similar. The output of the detector passes
11-04-2019
3
through the low pass smoothing filter and the wind is detected when the smoothed output D
<0.67, and preferably when the smoothed output D <0.5 Be done.
[0009]
Second, in the WND difference method described in US Pat. No. 6,882,736, the absolute value of
the difference between the two microphone signals is given by the equation: D = | x (n) -y ( n) |
(2) where x (n) and y (n) are samples of the outputs of the microphones x and y, respectively. The
detector output D should theoretically reach 0 for non-wind sources, where x (n) and y (n) need
to be highly correlated, and wind noise increases Then, x (n) and y (n) need not be very similar.
The value of D passes through the low pass smoothing filter and a wind is detected when the
smooth value exceeds a threshold.
[0010]
Third, in the difference-sum method described in U.S. Pat. No. 7,171,008, the ratio of the power
of the difference between two microphone signals to the power of the sum can be expressed by
the equation: , X (n) and y (n) are calculated using the samples of the outputs of microphones x
and y over a period of time, which may be one sample or sample block, respectively. The detector
output D should theoretically reach 0 for far-field sources, where x (n) and y (n) should be
similar, and D is for wind noise It needs to have a tendency towards 1, where x (n) and y (n) need
not be similar.
[0011]
Any discussion of documents, acts, materials, devices, articles or the like which has been included
in the present specification is solely for the purpose of providing a context for the present
invention. When any or all of these matters form the basis of the prior art, or as prior to the
priority date of each claim of the present application, it was recognized as general knowledge of
the field to which the present invention relates It should not be considered.
[0012]
Throughout the specification, variations such as the words "comprise" or "comprises or
comprising" include the stated elements, integers or steps or elements, integers or groups of
11-04-2019
4
steps but any other It should be understood that it implies that the elements, integers or steps, or
elements, integers or groups of steps of.
[0013]
According to a first aspect, the invention is a method of processing digitized microphone signal
data to detect wind noise, comprising: obtaining a first set of signal samples from a first
microphone; Obtaining from the two microphones a second set of signal samples occurring
substantially simultaneously with the first set; a first of the first set of samples having a value
above a first predefined comparison threshold; Determining the number of 1's and determining a
second number of samples below the first predefined comparison threshold in the first set; a
second pre-set in the second set Determining a third number of samples above the defined
comparison threshold, and determining a fourth number of samples below the second predefined
comparison threshold in the second set; The number and the second number are estimated from
the third number and the fourth number To a greater extent than the detection threshold
defined, to determine whether different, and if so, the method comprising the step of outputting
instructing (indication) that wind noise is present.
[0014]
The first and second sets of signal samples may include wideband time domain samples
substantially directly obtained from the respective microphones.
Alternatively, the first and second sets of signal samples may be sub-band times that reflect a
particular spectral band of the wide band microphone signal, such as may be obtained, for
example, by low pass, high band or band pass filtering of the microphone signal. It may contain
area samples.
In some embodiments, the first and second sets of signal samples may include spectral
magnitude data, eg, obtained by performing a Fourier transform, eg, a fast Fourier transform, on
the microphone signal. In yet another embodiment, the first and second sets of signal samples
include power data, complex signal data, or other types of signal data in which wind noise
exceeds the detection threshold difference in the data value. To generate the first and second
sets.
[0015]
11-04-2019
5
In many embodiments, the first predefined comparison threshold is the same as the second
predefined comparison threshold. In some embodiments, the first and second predefined
comparison thresholds may each be zero. In other embodiments, the first and second predefined
comparison thresholds may be set to one value or each value between digital quantization levels
so that the sample value is equal to the comparison threshold It does not. In another
embodiment, the first and second predefined comparison thresholds may each be an average of
selected past and / or current signal samples. In yet another embodiment, the first and second
predefined comparison thresholds comprise DC components (whether continuous or intermittent
DC components) in the signal sample. It can be a value of In other embodiments, the first and
second predefined comparison thresholds may be equal to the average of each bin of one or
more frames of FFT data. In yet another embodiment, the first and second predefined comparison
thresholds may be any other suitable value of the obtained data sample. In alternative
embodiments of the present invention, the first predefined comparison threshold may be
different from the second predefined comparison threshold. For example, in such an alternative
embodiment, the first predefined comparison threshold may be configured to count samples
rated zero as a positive number while the second predefined comparison threshold is May be
configured to count samples rated as zero as negative numbers, or vice versa if appropriate and /
or beneficial to the application and / or implementation platform.
[0016]
It is to be understood throughout the present specification that, with regard to the number of
"positive" samples, it refers to samples which are above the corresponding predefined
comparison threshold, ie samples which are positive with respect thereto. The corresponding
meaning is given in terms of the number of "negative" samples. Hence, the positive and negative
of the conventional meaning apply when the corresponding predefined comparison threshold
equals zero.
[0017]
Determining whether the number of positive and negative samples in the first set is different
from the number of positive and negative samples in the second set by more than a predefined
detection threshold; , By applying a chi-square test. In such an embodiment, if the chi-square
calculation returns a value close to zero or a value below a predefined detection threshold, an
indication that there is no wind noise may be output while the chi-square calculation may If a
11-04-2019
6
value above the detection threshold is given, an indication that wind noise is present may be
output. In such embodiments, for sample block size 16 and microphone spacing 12 mm, the
detection threshold may be in the range of 0.5 to about 4, more preferably in the range of 1 to
2.5. For sample block size 16 and microphone spacing 120 mm, the detection threshold may be
in the range of about 2 to about 10, more preferably in the range of 3 to 8, or more preferably in
the range of about 5 to 7 . However, in other embodiments having different block sizes and / or
microphone spacings and / or devices, the appropriate detection thresholds may be quite
different. The detection threshold is, for example, 1 or 2 m. It can be set to a level that is not
triggered by wind below s <-1>. Furthermore, in such embodiments, the output of the chi-square
calculation, or more generally, the first number and the second number may be different using
the third number and the fourth number, and so on Otherwise, one can estimate the wind
intensity in quiet conditions, or the extent to which wind noise exceeds other sounds.
[0018]
In an alternative embodiment, the number of positive and negative samples in the first set is
different from the number of positive and negative samples in the second set to a degree that
exceeds a predefined detection threshold. The step of determining whether or not to compare
multiple sets of binary or classification data, any other suitable statistical test, such as McNemar's
test or Stuart-Maxwell test, etc. Can be implemented by
[0019]
The first and second microphones are of a behind-the-ear (BTE) type, a shell of a cochlear
implant BTE unit, or a BTE type, an ear canal type, an in-the-canal type, a complete It may be
attached to an ear canal insertion type or other style hearing aid or the like.
Alternatively, the first and second microphones may be part of a headset or handset of a
telephone, or other audio device, such as a camera, video camera, tablet computer or the like. The
signal may be sampled at 8 kHz, 16 kHz or 48 kHz, for example. Some embodiments may use
longer block lengths for higher sampling rates, so a single block covers similar time frames.
Alternatively, the input to the wind noise detector can be downsampled, so shorter block lengths
can be used in applications where wind noise does not have to be detected over the full
bandwidth of the higher sampling rate (if necessary). ). The block length may be 16 samples, 32
samples, or any other suitable length.
[0020]
11-04-2019
7
In some embodiments, the method may further include obtaining each set of signal samples from
a third microphone or an additional microphone. In such embodiments, a comparison of the
number of positive and negative samples in each set of samples obtained from three or more
microphones can be made. For example, a chi-square test can be applied to a set of three or more
microphone signal samples by using an appropriate 3 × 2 or 4 × 2 or larger observation and
prediction value matrix.
[0021]
According to another aspect, the present invention provides a computing device configured to
perform the method of the first aspect.
[0022]
According to another aspect, the invention is a computer program product comprising computer
program code means for causing a computer to perform a procedure for processing digitized
microphone signal data to detect wind noise, comprising There is provided a computer program
product comprising computer program code means for implementing the method of the aspect.
[0023]
In a preferred embodiment of the invention, each microphone signal is preferably high-pass
filtered, for example by a pre-amplifier or an ADC, in order to remove any DC component, and the
sample values operated by the method are generally positive. To include a mixture of the number
of and the negative number.
However, in an alternative embodiment where the sample values have non-zero zero input
values, the invention sets the comparison threshold to zero input values, ie (a) the number of
samples above the zero input value, and (B) may be applied by determining the number of
samples below the zero input value.
The invention can be applied as well by leaving it to any selected comparison threshold suitable
for sampled data being processed.
11-04-2019
8
[0024]
By considering only the sign of each sample, not the magnitude, for the comparison value, the
method of the invention effectively ignores the magnitude differences between the microphone
signals, so this is It is robust against sources of non-wind, such as near-field sources, local
acoustic reflections, room echoes, and differences in microphone coverage, obstacles, location, or
sensitivity. The microphones to calculate the sample-to-sample correlation between the signals
and to count the number of positive and negative samples per signal across the sample block, in
contrast to other methods that are sensitive to phase and amplitude differences between the
microphone signals The phase difference between the signals is also largely ignored.
[0025]
In some embodiments of the present invention, a single count within each set of samples from
each microphone may be performed. For example, for each set of samples, one of the following
may be counted: how many of the samples are positive, how many of the samples are negative,
how many of the samples are above threshold, or samples How many are below the threshold. In
such embodiments, the output of the indication that wind noise is present using a degree to
which a single count of the first set of signal samples differs from a single count of the second set
of signal samples It can trigger. For example, this may be used as an index to a pre-computed chisquared value look-up table, as an input to a simplified chi-squared equation that may utilize
constants known to the particular application, or another It can be done by inputting into a
suitable statistical test, for example by using it as a binomial test.
[0026]
Depending on the phase difference between the microphones, the presence of non-wind noise
sounds at frequencies that produce an odd number of half cycles or an odd number of samples
per cycle in the sample block is significant even in the absence of wind noise. It should be noted
that the first and second numbers may differ from the third and fourth numbers by the degree of.
Therefore, such a scenario can result in false detection of wind noise, depending on the detection
threshold used. However, in some embodiments, the risk of such a false positive is determining
whether the first number and the second number are different from the fourth number and the
third number, respectively, and It can be dealt with by outputting an indication that wind noise is
present only if the difference is also above the predefined detection threshold. Such an
embodiment may be implemented by exchanging the values of the third number and the fourth
11-04-2019
9
number, or by performing an equivalent inversion of the count of data or samples in one of the
set of samples. , Improve the robustness to non-wind noise sounds at such problematic
frequencies. Such embodiments are referred to herein as "minimum" techniques, such as
"minimum chi-square wind noise detection" techniques. An alternative embodiment avoids two
chi-square calculations, optionally making the third number equal to the number of negative
samples of the second set, and selectively making the fourth number the second. Equal to the
number of positive samples in the first set of values, and then using a third number value (ie, the
original value or an alternative value) that is the smallest difference from the first number value.
By carrying out the calculation, the calculation may be performed more efficiently. These
differences are calculated by subtracting each of the third number of original values or
alternative values from the first number. The third number of original values or alternative values
may differ from the first number as much as when the first number and the original third
number are both equal to half the number of samples in each block Note that only the difference
is zero and the chi-squared value is also zero.
[0027]
Hereinafter, examples of the present invention will be described with reference to the attached
drawings.
[0028]
FIG. 1 is a schematic diagram of a system illustrating a chi-square wind noise detector of one
embodiment of the present invention operating in the time domain.
FIG. 7 is a schematic diagram of a system illustrating a subband implementation of the Chisquare WND method operating at the output of a matched time domain filter according to
another embodiment of the present invention. FIG. 7 is a schematic diagram of a system
illustrating a subband implementation of the Chi-square WND method operating on FFT output
data according to yet another embodiment of the invention. FIG. 7 shows chi-square WND scores
generated by the embodiment of FIG. 1 for each pre-recorded input signal. Figure 12 shows WND
scores generated by the prior art correlation method for pre-recorded input signals. Figure 7
shows WND scores generated by the prior art Diff / Sum WND method for pre-recorded input
signals. FIG. 7 shows WND scores generated by the embodiment of FIG. 1 and the prior art WND
method in response to pre-recorded stepped tone sweep input. Simulation of the embodiment of
FIG. 1 and conventional, responsive to simulated tone input up to half the sampling rate, in 10 Hz
to 10 Hz steps, with both microphones in phase but with 9.5 dB near field effects. The WND
score generated by the WND method of the technology is shown. 1 shows the WND scores
11-04-2019
10
generated by the simulation of the embodiment of FIG. 1 and the prior art WND method in
response to simulated far-field tone input in 10 Hz to 10 Hz steps up to half the sampling rate for
a typical hearing aid . FIG. 10 depicts the WND scores of FIG. 9 as improved by the scores
obtained by simulating positive and negative counts inverted for one signal. For a typical hearing
aid, generated by the simulation of the embodiment of FIG. 1 and the prior art WND method in
response to a simulated near-field tone input, varying by 9.5 dB in 10 Hz to 10 Hz steps up to
half the sampling rate Indicates the WND score. Generated by the simulation of the embodiment
of FIG. 1 and the prior art WND method in response to simulated far-field tone input in 10 Hz to
10 Hz steps up to half the sampling rate for a typical Bluetooth® headset Indicates the WND
score. Simulation of the embodiment of FIG. 1 in response to simulated near-field tone input,
varying from 10 Hz to 10 Hz in steps of 9.5 dB to half the sampling rate, for a typical Bluetooth®
headset, and conventionally The WND score generated by the WND method of the technology is
shown.
By the simulation of the embodiment of FIG. 1 and the prior art WND method in response to
simulated far-field tone input in 10 Hz to 10 Hz steps, up to half the sampling rate, for a typical
smartphone handset per 16 samples per block Indicates the WND score generated. A simulation
of the embodiment of FIG. 1 in response to a simulated near-field tone input that varies by 9.5 dB
in 10 Hz to 10 Hz steps by half the sampling rate for a typical smartphone handset of 16
samples per block 7 shows WND scores generated by the prior art WND method. Generated by
the simulation of the embodiment of FIG. 1 and the prior art WND method in response to far field
tone input simulated to half the sampling rate in 10 Hz to 10 Hz steps for a typical smartphone
handset at 32 samples per block Indicates the WND score. Simulation of the embodiment of FIG.
1 in response to a simulated near-field tone input that varies by 9.5 dB in half from the sampling
rate in steps of 10 Hz to 10 Hz for a typical smartphone handset of 32 samples per block 7
shows WND scores generated by the prior art WND method. FIG. 20 shows an example of male
and female voice stimulation of the handsets used in the HATS experiment of FIGS. 19-22, where
waveforms are recorded from the handset microphones. FIG. 20 shows an example of male and
female voice stimulation of the handsets used in the HATS experiment of FIGS. 19-22, where
waveforms are recorded from the handset microphones. The block size shows the output of each
WND method of the recording of the headset of Bluetooth (registered trademark) from HATS of
16 samples. The block size shows the output of each WND method of the recording of the
headset of Bluetooth (registered trademark) from HATS of 16 samples. The block size shows the
output of each WND method of the recording of the headset of Bluetooth (registered trademark)
from HATS of 16 samples. The block size shows the output of each WND method of the
recording of the headset of Bluetooth (registered trademark) from HATS of 16 samples. The
block size shows the output of each WND method of the recording of the headset of Bluetooth
(registered trademark) from HATS of 16 samples.
11-04-2019
11
FIG. 20 shows the output of the chi-square method for the recording of FIG. 19 when applying
the least-chi-square method. FIG. 20 shows the output of the chi-square method for the recording
of FIG. 19 when applying the least-chi-square method. FIG. 20 shows the output of the chi-square
method for the recording of FIG. 19 when applying the least-chi-square method. The output of
each WND method is shown for a smartphone record from HATS with block size 16 samples. The
output of each WND method is shown for a smartphone record from HATS with block size 16
samples. The output of each WND method is shown for a smartphone record from HATS with
block size 16 samples. The output of each WND method is shown for a smartphone record from
HATS with block size 16 samples. The output of each WND method is shown for a smartphone
record from HATS with block size 16 samples. The output of each WND method about the record
of the smart phone from HATS whose block size is 32 samples is shown. The output of each
WND method about the record of the smart phone from HATS whose block size is 32 samples is
shown. The output of each WND method about the record of the smart phone from HATS whose
block size is 32 samples is shown. The output of each WND method about the record of the
smart phone from HATS whose block size is 32 samples is shown. The output of each WND
method about the record of the smart phone from HATS whose block size is 32 samples is
shown. Fig. 7 shows the output of the chi-square method for a pre-recorded input signal
processed by the 1000 Hz and 5000 Hz time domain sub-band filters. Fig. 7 shows the output of
the chi-square method for a pre-recorded input signal processed by the 1000 Hz and 5000 Hz
time domain sub-band filters. Fig. 7 shows the output of the chi-square method for a prerecorded input signal processed by the 1000 Hz and 5000 Hz time domain sub-band filters. FIG.
11 shows the output of the chi-square method for a pre-recorded input signal processed by FFT
bins of 250, 750, 1000, 4000 and 7000 Hz. FIG. 11 shows the output of the chi-square method
for a pre-recorded input signal processed by FFT bins of 250, 750, 1000, 4000 and 7000 Hz.
FIG. 11 shows the output of the chi-square method for a pre-recorded input signal processed by
FFT bins of 250, 750, 1000, 4000 and 7000 Hz. FIG. 11 shows the output of the chi-square
method for a pre-recorded input signal processed by FFT bins of 250, 750, 1000, 4000 and
7000 Hz. FIG. 11 shows the output of the chi-square method for a pre-recorded input signal
processed by FFT bins of 250, 750, 1000, 4000 and 7000 Hz. FIG. 10 shows the output of the
chi-square method for a pre-recorded input step-wise tone sweep signal processed by 1000,
4000 and 7000 Hz FFT bins.
[0029]
Abbreviations: ADC: Analog-to-digital converter BTE: Earhook type CI: Cochlear implant DC: DC
FIR: Finite impulse response HA: Hearing aid HATS: Head and torso simulator IIR: Infinite impulse
response SNR: Signal to noise ratio SPL: Sound pressure level WND: Wind noise detection
11-04-2019
12
[0030]
The WND method of this embodiment, referred to as the chi-squared (χ <2>) WND method,
applies a statistical test to establish an independent level between two or more speech signals.
The chi-square method of this embodiment comprises three steps: 1) construction of an
observation data matrix from blocks of samples of each microphone signal; 2) construction of a
prediction data matrix; and 3) observation and prediction data matrix Calculation of chi-square
statistics from. These steps are illustrated in FIG. 1 for the two microphone case. It should be
noted that although the chi-square WND method of FIG. 1 is described for the case of two
microphones for simplicity, in alternative embodiments the method can be applied to more than
two microphone signals.
[0031]
The input data is as follows: X = [x 1 x 2 ... x m] Y = [y 1 y 2 ... y m] (4) (wherein X and Y are the
front and back microphones respectively) A block of samples and a block of samples of each
microphone signal, such as samples of length m). Advantageously, the Chi-squared WND method
may not require any additional buffering operations, since sample buffering for block based
processing is common in DSP systems And can operate over a wide range of buffer lengths.
Because the preamplifier or ADC generally high pass filters the microphone signal to remove any
DC components, the sample values are generally a mixture of positive and negative numbers, and
the acoustic level There is a tendency towards zero as
[0032]
The observation data matrix O is constructed as in the following equation and includes the
number of positive and negative values in the sample block of each microphone signal as follows:
where POS is a positive sample Function (return), and NEG is a function depending on the
number of negative samples (value <0). In fact, in two complementary DSP systems, it is zero The
value can be classified most easily as a positive value because it has a positive code bit. A value of
zero can be defined as either a positive value or a negative value, for the purpose of the Chisquare WND method, provided that its definition is consistent in a given implementation. As can
be seen from equation (5), each row of the observation matrix O corresponds to a different
microphone, while columns 1 and 2 indicate the number of positive and negative samples,
11-04-2019
13
respectively.
[0033]
From the data of the observation data matrix O, the prediction data matrix E is represented by
the following equation: where r and c are the number of rows and columns of the observation
data matrix O, and N is the observation data matrix O Is the sum of all elements of). Therefore, N
is a constant equal to the number of microphones multiplied by the block length.
[0034]
Using the observation and prediction matrix, the chi-square statistic χ <2> is given by the
following equation: where χ <2> is the square of the difference between the elements of the
observation and prediction data matrix It is calculated as the sum of things). The value of χ 2 is
zero when the ratio of positive to negative samples approximated with non-wind sound is the
same for both microphones. The value of χ <2> increases above zero when the ratio of positive
to negative samples differs at the microphones, which occurs as the wind noise results in the
microphone signals becoming dissimilar.
[0035]
By considering only the sign of each sample, not the magnitude, the chi-square method of the
present embodiment effectively ignores the magnitude differences between the microphone
signals, so near field source of such differences It is robust against non-wind sources such as
local acoustic reflections, room reverberations, and differences in microphone coverage,
obstacles, position, or sensitivity (mismatched microphones).
[0036]
The chi-square method of this embodiment is also largely robust to phase differences, as it does
not attempt to compare microphone signals on a sample-by-sample basis.
For sounds that are not wind, the robustness depends on the relationship between the
wavelength, the magnitude of the phase shift and the block length used in the application. In
11-04-2019
14
contrast to conventional methods, the robustness to phase differences can be increased at high
frequencies, depending on the relationship between block length and microphone spacing. For
example, if the block length is an integral multiple of the wavelength of the stationary sinusoidal
signal, then the number of positive and negative samples is the same for any phase shift of an
integer number of samples. When the wavelength exceeds the block length, the effect of the
phase difference varies from block to block, and the effect is maximum around the zero crossing
and may be zero between the zero crossings. Therefore, to compensate for such effects,
smoothing filters may be used to even out block-to-block changes in the wind score output.
[0037]
As an illustration of robustness to phase differences, in hearing aid applications, typical
microphone spacing up to 20 mm results in a delay of up to 59 μs between microphones
(assuming a sound velocity of 340 m / s), Leads to a phase difference of up to 0.94 samples at a
typical sampling rate of 16 kHz. Such phase differences have a minimal effect on χ <2> statistics
with typical block lengths of 16-64 samples.
[0038]
The following example is provided to provide a further understanding of how the chi-square
WND method of this embodiment works in practice. An example is for wind noise and for two
microphones with a block length of 16 samples. Below is a block of samples for each
microphone.
[0039]
Count the number of positive and negative samples in each block and use it to construct the
observation data matrix O as in equation (5) above: where the number of positive and negative
samples is By definition, with one row for each microphone, shown in the first and second
columns respectively, the sum of each row is equal to the block length (in this case 16). The
prediction matrix E is calculated from the observation data matrix O, as in equation (6) above.
[0040]
The prediction matrix E has the same structure as the observation data matrix O, and both
11-04-2019
15
matrices are used to calculate the chi-square statistic χ <2> as in the above equation (7).
[0041]
The value of chi-square statistic χ <2> is substantially above zero, indicating the presence of
wind noise.
[0042]
In the preferred embodiment of the present invention, some computational steps are simplified
based on known constants.
For example, the prediction matrix E requires the calculation of the product of the row sum and
column sum of the observation data matrix O.
Since the sum of the rows of the observation data matrix O is always equal to the block length B
and N is always equal to the number of microphones M multiplied by the block length, the
calculation of the prediction matrix E is simply as follows: It can be calculated.
[0043]
The conventional chi-square example shows that the rows of the prediction matrix E are identical
to one another, thereby reducing the computational requirements for calculating one value for
each of the j columns of the prediction matrix E.
[0044]
The calculation of χ <2> values can also be simplified and the calculation of the prediction
matrix E can be incorporated into this calculation as the following equation.
[0045]
Therefore, for each element of the observation data matrix O, the square of the difference
between each element and its column average is divided by the column average.
11-04-2019
16
For a given column, the square of the difference is the same for both rows, which further reduces
the computational load required to calculate the χ <2> statistic.
The above are only one example of how the computational load may be optimized for application
and further optimization may be achieved in other embodiments. In some applications, it may be
desirable to use a look-up table of pre-computed χ <2> values that can be displayed as positive
or negative sample count values for each microphone signal. In yet another embodiment,
Equation 13 can be further simplified as follows for two microphones.
[0046]
In another embodiment, the method of the present invention is implemented on a sub-band basis.
Process the buffered output of the time domain digital filter, which may be bandpass, low pass,
or high pass, using the chi-square WND method described above. FIG. 2 shows an example of a
sub-band WND comprising a time domain filter bank. Within each sub-band, the operation of this
method is as described above in the embodiment of FIG. 1 and will not be repeated here. The
most preferred comparison and / or detection thresholds may be different in different sub-bands
and for different applications, this includes microphone positioning, spacing and / or phasing,
and / or wind noise and different frequencies. It should be noted that the source may be due to
factors such as other sound features of the.
[0047]
In yet another embodiment shown in FIG. 3, the Chi-square WND method operates on Fast
Fourier Transform (FFT) data. In this embodiment, the FFT is performed on a block of samples of
each microphone signal, and then the FFT output data is buffered across multiple blocks of each
FFT bin. The buffered FFT output data may be the magnitude, power, or real and / or imaginary
components of the composite FFT output. The magnitude or power data may be in dB for some
applications. Instead of counting the number of positive and negative samples in the block, count
the number of positive and negative FFT output values across the block in the FFT output data
buffer. At this point, the FFT output is treated as a frequency domain sample of the microphone
signal. Because the raw FFT magnitude or power values can not be negative, these values need to
be processed so as to result in positive or negative values. For example, data in the FFT output
buffer: 1) FFT size or power data adjusted such that the data in each buffer has a zero mean
value; or 2) FFT size indicating difference values between successive FFTs Could be processed to
11-04-2019
17
be the difference data of As an alternative to 1) above, the comparison threshold for each FFT bin
and microphone is adaptively set to the average (or other suitable value) of past or current
buffered FFT magnitude or power data obtain. The real or imaginary components of the raw FFT
data may have positive and negative values without further processing, but in the applications of
processing options 1) and 2) above, these components are microphone signals It may be
beneficial because it is more sensitive to amplitude and phase differences between These
exemplary alternatives yield data indicative of changes in acoustic level over time (resolution of
one block). Therefore, the data may vary between microphones due to differences in microphone
sensitivity, near-field effects, or any other constant (i.e., actually slowly changing in time) due to
differences in levels between microphone signals. Does not show level differences.
[0048]
Compared to time domain samples, the FFT data is relatively insensitive to phase differences
between the microphone signals because it represents the average magnitude or power across
the block of samples. The phase has the largest impact on the power estimate of the FFT when
the wavelength is significantly above the block length (ie analysis window), and the impact is
minimal when the wavelength is much smaller than the block length . These useful properties of
the FFT data used to construct the observation data matrix O are the addition of the inherent
robustness of the Chi-square WND method to the magnitude and phase differences between
microphone signals. For sounds that are not wind, short term fluctuations of the FFT bin level
over time similarly occur between the microphones, which results in chi-square values of
approximately zero (i.e. wind is not detected). With respect to wind noise, short term level
variations differ between microphones, which results in larger values for chi-square statistics (i.e.
wind is detected). The FFT bins may be grouped to form a wider bandwidth, and then the
magnitude or power values calculated for each band are used to detect wind noise in that band.
[0049]
In order to demonstrate the effectiveness of the embodiment of FIG. 1, an evaluation was made of
the method of that embodiment by using it to test a number of representative records. The
recording was a microphone output signal obtained from an earpiece (BTE) device with a range
of input stimuli. The stimuli were generated from a far-field loudspeaker, a near-field telephone
handset, or a wind machine. The device was a commercially available cochlear implant (CI) and a
BTE-type shell of a hearing aid (HA) product, each containing two microphones separated by
about 10-15 mm. The microphones were not perfectly aligned, but mismatches are common with
these types of microphones (1-3 dB). The device was attached to the pinnae (outer ear) of a head
11-04-2019
18
and torso simulator (HATS) located at the acoustic booth for all recording except near field. Near
field recordings were obtained by holding the handset of the telephone at a BTE device in the
quiet office free space. The microphone signal was recorded by a high SNR 32-bit sound card at a
sampling rate of about 16 kHz. Table 1 summarizes the stimuli, devices, equipment and
recording conditions.
[0050]
[0051]
The recordings were about 10 seconds each in duration, except for the far-field gradual tone
sweep.
The far-field gradual tone sweep consisted of 31 pure tones from 1.0 to 7.664 kHz with a
duration of 4 seconds per tone (with a multiplicative step of 1.0718). The step-wise tone sweep
also includes unintended level differences due to local auricular reflections and / or room
reflections between microphone signals of up to 10 dB and is somewhat less smooth in the data
shown in FIG. Brought. The near field 1 kHz tone resulted in a 12.2 dB level difference between
the microphone signals. Audio was represented at 70 dBA (measured by ear). The wind speed
was increased by two factors as it is theoretically equal to the 12-dB step of the wind noise level.
The 12 m / s recording was chosen as an example where the microphone output was explicitly
saturated at the electrical clipping levels of both microphones. Because this extremum can be a
potential failure mode of the WND algorithm.
[0052]
The WND algorithm of the embodiment of FIG. 1 was implemented in Matlab / Simulink and used
to process non-overlapping, contiguous blocks of 16 samples of each microphone record. The
output of the WND algorithm is processed by an IIR filter (b = [0.004]; a = [1-0.996], note that
other filter types and coefficients may be used) to obtain one It removes any jittery changes in
the WND algorithm output that may be present from one block to another, thus giving a more
consistent output to certain input stimuli. FIG. 4 shows the output of the Chi-square WND
method for each pre-recorded input signal in this system.
11-04-2019
19
[0053]
It can be seen in FIG. 4 that there is a clear separation between the wind-stimulated WND score
(grouped into 410) and the non-wind-stimulated WND score 420. In group 420, the WND output
generated by the method of this embodiment of the present invention is less than 0.5 for speech
and near-field stimulation, and 1. for uncorrelated microphone noise. It is less than 5. After
establishing a smoothing filter, in group 410, the wind noise WND output score consistently
exceeds 2.5-3.0 for very light winds (1.5 m / s) and increases as wind speed increases 5 Or you
can see that it will grow to six. Therefore, the preferred detection threshold (above which the
WND score is considered to indicate the presence of wind noise) is 2.5 for applications where
winds of 1.5 m / s or more need to be detected. Or 3.5 for applications where winds of 3 m / s or
more need to be detected. In many applications, it may be desirable not to detect or suppress
such breezes, as wind speeds of 1.5 m / s generally cause little or no audible wind noise. It
should be noted that the absolute value of the WND score, and hence the appropriate one or
more thresholds, vary with different sample block sizes. The WND score of the wind noise mixed
with the non-wind sound may be between 410 and 420 grouped, which means that the detection
threshold is most appropriate for wind noise and other sounds of application It is also noted that
it may be set to correspond to various ratios, which may be advantageous in that it may be based
on factors such as the perception of wind noise over other sounds, or the need for processing
following wind noise suppression means I want to be Furthermore, the threshold can also be
improved for different smoothing filters. Because tighter smoothing results in a more consistent
WND output score, this may increase the detection threshold even at the expense of a slower
response time of the filter in response to changes in wind conditions It is because it can. It should
also be noted that the input level threshold is not necessarily needed for WND, as in the other
several methods, since the output of the chi-square method is low (close to zero) for microphone
noise. Nevertheless, alternative embodiments, coupled with the input level threshold, can reliably
detect slow winds using relatively low chi-square thresholds, and detect winds that exceed SPL Is
desirable).
In such embodiments, the use of input level thresholds enables detection that is more closely
related to the loudness of wind noise. Because the wind noise level at a given wind speed may be
a function of the wind incident angle (all of the data shown is the front wind), the mechanical
design of the device, the position of the microphone, the windshield (eg This is because it is
affected by the position of an obstacle in the vicinity of the outer ear) or a wind noise source. In
such embodiments, to detect wind, both the chi-square threshold and the input level threshold
need to be exceeded.
[0054]
11-04-2019
20
To compare with the performance of this embodiment of the present invention, the prior art
correlation and difference-sum WND algorithms described above are implemented in Matlab /
Simulink and similarly used above. A non-overlapping, contiguous block of 16 samples of each
microphone recording shown in Table 1 of Table 1 was processed. The output of each WND
algorithm was reprocessed by an IIR filter (b = [0.004]; a = [1-0.996]).
[0055]
FIG. 5 shows the results of the prior art correlated WND method of US Pat. No. 7,340,068
described above. The output of speech is close to 1.0 as expected, and wind noise is generally
lower (about 0.5 as shown at 520). However, the 12 m / s wind that saturates the microphone
tends to produce similar output with respect to speech, which can result in a correlated WND
method that can not detect strong winds. In addition, the output of uncorrelated microphone
noise and the output of the near-field tone shown at 530 could be incorrectly classified as wind
because it is at the value of the wind range, but the microphone noise adds an input level
threshold Can be distinguished from wind noise by applying the
[0056]
FIG. 6 shows the output of the prior art Diff / Sum WND method of US Pat. No. 7,171,008
described above. The Diff / Sum WND output is about zero as expected for speech, and the
output increases with wind speed. However, in the region shown at 610, the near field tone and
the 1.5 m / s wind can not be distinguished, and the uncorrelated microphone noise can not be
distinguished from the 3.0 m / s wind. The latter two inputs were likely to be distinguished from
one another by applying the additional step of input level threshold.
[0057]
FIG. 7 compares the WND method of the embodiment of FIG. 1 with the prior art correlation and
difference / sum WND method and implements in Matlab / Simulink in response to the
microphone output signal for gradual tone sweep input Shows the output of the WND method.
The chi-square method is robust to tones, and the output value is less than 1.0, and mainly less
than 0.25, over the entire band tested. These values are well below the range of 2.5 to 4.0, as
11-04-2019
21
shown by the 1.5 m / s weak wind output shown in FIG. 4, so the WND method of FIG. Make it
distinguishable from wind noise.
[0058]
In contrast, FIG. 7 shows that the correlated WND method generally converts its non-wind power
(value is about 1) to wind power (value less than 0.67 or 0.5) as frequency increases. Diverting,
which indicates that false detection of wind noise occurs in response to such tones. Similarly, the
difference / sum WND method generally deviates from its non-wind output (value is about 0) to
wind output (value goes to 1) with increasing frequency, which also causes , In response to such
a tone results in false detection of wind noise.
[0059]
Although the above-described embodiments of the present invention propose several thresholds
for the chi-square detector, it should be noted that the setting of appropriate thresholds has some
flexibility and variability. This means that the output of the chi-square WND increases as the
block size increases and is affected by microphone spacing and positioning, and the desired wind
speed or wind noise if a threshold is desired for the application. Because of the ratio of the level
to the other sounds, it can be set quite arbitrarily to trigger the WND.
[0060]
The effectiveness of the present invention over the entire band of FIG. 7 is particularly
advantageous for the sub-band wind noise detector as in FIG. 2 or 3 and the total frequency in
the bandwidth of the Nyquist velocity (generally up to 8-12 kHz) hearing aid bandwidth In order
to distinguish wind noise from other inputs, preferably it should function properly.
[0061]
The audio signal is generally a microphone output signal, but any other source could be used.
Typical applications are hearing aids, cochlear implants, headsets, handsets, video cameras, or
any other medical or consumer device that needs to detect wind noise. To evaluate the
11-04-2019
22
performance of the embodiment of FIG. 1 in such other hardware devices, the sensitivity of the
above-described WND method of falsely detecting pure tones as wind was investigated. Each
method was implemented in a MATLAB stimulus, and two microphone sinusoidal input stimuli
were generated in MATLAB. The phase of the rear microphone signal was delayed relative to the
front microphone according to the spacing of the particular microphone (assuming the speed of
sound is 340 m / s). As shown in Table 2, a typical example of a real time DSP audio product was
modeled.
[0062]
[0063]
The WND output was calculated for frequencies up to half the sampling rate in 10 Hz to 10 Hz
steps.
For each frequency, the average output of each WND method was calculated over a block of 100
consecutive samples. Average values are shown in FIGS. Averaging generally approximates the
low pass filter implemented to remove block-to-block variations in the output of the WND
method.
[0064]
In addition, the above analysis was repeated for a 9.5 dB level difference between microphones
(low rear microphone signal). Assuming a 1 / r <2> relationship in acoustic power from the
distance from the sound source, this sound source approximates a near-field sound source that is
three times more distant from one microphone than the other microphone Was.
[0065]
In the ideal case of a 0 mm microphone spacing (ie both microphones are in phase), the WND
method does not falsely detect tones as wind at any frequency, and the prior art differential-sum,
differential method And the output of the correlation method are equal to 0, 0 and 1 respectively
(correctly indicating no wind noise), and the output of the current Chi-square WND method is
equal to zero (correctly indicating no wind noise) ).
11-04-2019
23
[0066]
However, if there is a 0 mm microphone spacing (ie both microphones are in phase) but the 9.5
dB near field effect described above is present, then the output of the Chi-square WND method is
the level difference between the microphones. While not affected at all by the other method, the
other method may produce a false indication of wind noise, as it is significantly affected in this
simulation as shown in FIG.
Since the output of the difference method in this case is> 4, it can not be seen in FIG.
[0067]
FIG. 9 shows simulated WND output values for a typical hearing aid (as in Table 2). It can be seen
that the conventional WND method incorrectly detects tones as wind at high frequencies. The
chi-square method of the embodiment of FIG. 1 is more robust, although its output at about 5.4
kHz is relatively high, but not necessarily at the nominal wind detection threshold (as shown in
FIG. 4 in some embodiments) , Which may be selected to a height of about 3.5). The chi-square
WND score behavior at 5.4 kHz is due to the tone having a duration of about 3 samples and the
spacing of the microphones causing a phase shift of about 0.56 samples. As a result, about two
thirds of the front microphone samples are positive while about two thirds of the rear
microphone samples are negative, which is the relatively high output of the chi-square WND
method of about 5.4 kHz. Explain. It should be noted that all three prior art methods also suffer a
significant drop, at or about 5.4 kHz or so.
[0068]
The artifact at 5.4 kHz in the current chi-square method, which can be seen from FIG. 9, can be
canceled by inverting the front or back microphone signal and repeating the WND process,
thereby changing the phase relationship between the microphone signals and then Note further
that low values of the two WND output magnitude values are received as the WND output and
passed through the smoothing filter. This approach was applied to the simulation of all four
methods to generate the graph of FIG. The graph shows that while there is little change in the
relatively poor robustness of the conventional WND method, the robustness of the chi-square
WND method for high frequency tones is significantly increased. Therefore, this approach may
11-04-2019
24
be beneficial in some embodiments of the present invention in applications where additional
computational load is justified. If the computational load exchanges these positive and negative
sample count values instead of counting positive and negative sample count values for one
microphone signal again with the inverted signal, and the score is small (i.e. the microphone It
can be further reduced by only performing the second χ <2> calculation at a time when the
sample count between becomes more similar). The calculation load calculates alternative third
and fourth numbers corresponding to the number of negative and positive samples relative to the
second comparison threshold, and the third number of the smallest difference from the first
number (ie, the third number) Further reductions can be made as described above by performing
a single χ <2> calculation on the original or alternative) version.
[0069]
FIG. 11 shows three prior art WND methods when applied with a hearing aid as set in Table 2
and when a 9.5 dB reduction is applied to the rear microphone signal level, and the WND of the
present invention The simulated output score of the method is shown. The chi-squared WND
output is not affected by the level difference between the microphone signals, but the other
methods are obviously adversely affected. Again, an artifact of about 5.4 kHz in the chi-square
WND score may fall below the detection threshold (hence not trigger a false detection) and / or
the corresponding as described above with reference to FIG. It should be noted that in the
method it can be addressed by repeating the score calculation using the inverted signal.
[0070]
The robustness of the prior art WND method and the WND method of the embodiment of FIG. 1
for a simulated example of a typical Bluetooth® headset such as Table 2 is illustrated in FIG.
Again, the chi-square method of the embodiment of FIG. 1 is equally robust to tone input, except
for the frequency scale that has been halved due to the lower sampling rate of the Bluetooth®
headset. Again, an artifact of about 2.7 kHz in the Chi-square WND score due to the half-sample
delay between microphones due to pure tone stimulation with three sample periods may fall
below the detection threshold (thus not triggering false detections) It should be noted that this
may be addressed by repeating the score calculation using the inverted signal, and / or in a
corresponding way as already described with reference to FIG.
[0071]
11-04-2019
25
Prior art WND method for a simulated example of a typical Bluetooth® headset such as Table 2
with a level difference of 9.5 dB between the input signals, and of the WND method of the
embodiment of FIG. Robustness is shown in FIG. Again, the chi-square method of the embodiment
of FIG. 1 is robust to tone input. Again, an artifact of about 2.7 kHz in the chi-square WND score
may fall below the detection threshold (hence not trigger a false detection) and / or the
corresponding as already described with reference to FIG. It should be noted that in the method
it can be addressed by repeating the score calculation using the inverted signal.
[0072]
Therefore, in the Bluetooth® headset example of FIG. 13, the Chi-square WND method is not
affected by the level difference between the microphones, while the other methods are obviously
adversely affected and by pure tone input The wind can be detected incorrectly.
[0073]
The robustness of the prior art WND method and the WND method of the embodiment of FIG. 1
for a simulated example of a typical smartphone handset with 16 samples per block as in Table 2
is shown in FIG.
The relatively large microphone spacing of 150 mm generally degrades performance by
substantially reducing the frequency range over which the conventional WND method is robust
to tones. The peak of the Chi-square WND score below 2 kHz is at a frequency that is
approximately N + 0.5 periods (N = 0, 1, 2, etc.) in block length (i.e., 250 Hz, 750 Hz, 1250 Hz,
etc.). This is because if the block contains the entire first half of the sinusoidal period (i.e. all
samples are positive), the phase shift has the largest effect on the ratio of positive to negative
samples. . The effect of phase shift on the ratio of positive to negative samples tends to decrease
as the number of periods in the block length increases. With a microphone spacing of 150 mm
and a sampling rate of 8 kHz, the phase delay between the two smartphone handset microphones
is up to 3.5 samples (depending on the direction of the sound). This compares to less than one
sample delay for typical hearing aid and Bluetooth® headset applications. Note that this delay
had a smaller effect on the ratio of positive to negative samples below 2 kHz. The effects of phase
delay may be reduced or adjusted in different applications by using longer block sizes. This is to
reduce the delay between microphones to a smaller proportion of samples in the block. In
addition, most of the sub-2 kHz peaks of the chi-square WND score only reach values of about
2.0, which may be below the detection threshold as described above, such peaks are detected by
the chi-square WND detection May not trigger false detection of wind noise in the Furthermore,
11-04-2019
26
the peaks of the Chi-square WND detector can be reduced by repeating the score calculation
using the inverted signal in a corresponding manner as described above with reference to FIG.
[0074]
Prior art WND method for a simulated example of a typical smart phone handset with 16
samples per block and a 9.5 dB level difference between signals as in Table 2 and the WND
method of the embodiment of FIG. 1 The robustness of is shown in FIG. Regarding the precedent,
the chi-square WND method is not affected by the level difference between the microphones, but
the other methods are obviously affected.
[0075]
The robustness of the prior art WND method and the WND method of the embodiment of FIG. 1
for a simulated example of a typical smartphone handset with 32 samples per block as in Table 2
is shown in FIG. Increasing the block size from 16 to 32 samples has the following impact on the
chi-square WND: As more samples are counted, the power increases because the wind detection
threshold needs to be adjusted accordingly. 2. The output calculation frequency is low, which
is more than compensating for the processing of a larger number of samples during the initial
counting step of the chi-square WND method. 3. In the sample, as is apparent from the fact
that the peak height in the chi-square WND score in FIG. 16 is lowered to less than about 1 kHz
as compared with FIG. Because it corresponds to proportions, it has less impact on the output of
the Chi-square WND method for pure tones.
[0076]
Compared to the block size of 16 samples, the low frequency peak of the chi square WND output
is substantially reduced. The reason is that the 3.5 sample delay between microphones is less as
a percentage of the number of samples in a 32 sample block. The peak at about 2.7 kHz is larger
because the block length and hence the numerical output is larger due to the larger sample count
at the input of the Chi-square WND method. However, as in item (1) above, the peak at 2.7 kHz
may still not trigger the detection of wind noise erroneously, as the WND detection threshold is
also raised. Furthermore, the peaks of the Chi-square WND detector can be reduced by repeating
the score calculation using the inverted signal in a corresponding manner as described above
with reference to FIG.
11-04-2019
27
[0077]
Prior art WND method for a simulated example of a typical smart phone handset with 32
samples per block and a 9.5 dB level difference between the input signals as in Table 2, and the
embodiment of FIG. 1 The robustness of the WND method is shown in FIG. Again, for the
previous example, the chi-square WND method is not affected by the level differences between
the microphones, but the other methods are obviously affected. For the case of FIG. 16, the peak
at 2.7 kHz may in some cases fail to trigger the detection of wind noise, and the peak of the Chisquare WND detector described above with reference to FIG. It can optionally be reduced by
repeating the score calculation using the inverted signal in a corresponding manner.
[0078]
With reference to FIGS. 14-17, the 150 mm microphone to smartphone spacing is probably the
worst case scenario, and with the concomitant improvement of the performance of the method of
FIG. It should be noted that it may be present in the device. Furthermore, it should be noted that
these results for 150 mm microphone spacing may also apply to other devices such as video
cameras that may have similar microphone spacing.
[0079]
Therefore, simplifying the sampled input data to the sum of positive and negative code values for
each audio channel across the sample block provides several advantages. The use of code values
provides robustness to differences in magnitude that can occur in the signal for reasons other
than wind, such as near-field sound and mismatched microphones. Matching code values across
blocks of time as opposed to sample-to-sample correlation improves robustness to typical phase
differences resulting from microphone spacing or phase response. By simplifying the sample data
to binary values with respect to zero or other suitable thresholds, a chi-square test, or other
approach can be used.
[0080]
11-04-2019
28
In an alternative embodiment, the chi-squared calculation may be influenced by a pre-computed
chi-squared value look-up table, for example to improve the calculation efficiency or total per
microphone per block Simplify chi-square equations that use constants such as the number of
samples. Comparison of the two blocks of samples may be performed on a subset of the audio
frequency range, for example by prefiltering the signal. The WND score is preferably smoothed
by a suitable FIR, IIR or other filter to reduce frame-to-frame variation in the chi-square WND
score for steady state input sounds.
[0081]
The effectiveness of the WND method of the present invention as applied to handsets and
headsets of telephones was further investigated. Figures 18-22 illustrate the use of sound stimuli
delivered to a headset and handset (each device is in a typical use position) located in the Head
and Torso Simulator (HATS) of the acoustic booth. The output of the chi-square WND method is
compared to the respective outputs of the correlation and difference-sum wind noise detection
(WND) methods described above.
[0082]
The experiments reflected in FIGS. 18-22 evaluated the following hardware / processing cases:
block size = 16 or 32 samples phone handset (120 mm microphone spacing); block size = 16
samples Bluetooth (registration Trademark) Headset (21 mm microphone spacing).
[0083]
More specifically, the Bluetooth® headset has been modified to obtain the results of FIGS. 19 and
20, so that the microphone signal will exit the wire from the device near the ear (ie away from
the microphone inlet) It was accessible through.
The two microphones were at the typical position of a Bluetooth® headset and were 21 mm
apart (typical spacing). In order to obtain the results of FIGS. 21 and 22, the wire is coming out
so that the wire does not go close to the microphone, and therefore the dummy smartphone
handset is also Corrected to. The two microphones are at the top (near the ear) and bottom (near
the mouth) end of the handset, and this results in a microphone spacing of 120 mm, which is the
level between the microphone signals of this type of device and It was considered to be a typical
worst case interval for phase differences.
11-04-2019
29
[0084]
For each headset and handset experiment, the devices were placed in a head and torso simulator
(HATS) at the acoustic booth with each device in a typical use position. For each device, both
microphone signals were simultaneously recorded by the high quality sound card while showing
various sound input stimuli (as shown in Table 3 below). The recording was stored as a WAV file
at a sampling rate of 8 kHz. The HATS faces the stimulus of the sound source for the whole
recording (ie the stimulus presented directly before HATS), which is the worst case direction for
the phase difference of the stimuli between the microphones.
[0085]
[0086]
The tone sweeps described in the last two rows of Table 3 each had a smoothly changing tone
frequency that increased logarithmically with time.
The speech described in lines 4-9 of Table 3 consists of two spoken sentences separated by 1.3
seconds of silence (i.e. silence occupied by microphone noise) started with a stimulus of about 3
seconds And the audio was shown at typical far-field and near-field acoustic levels. There was a
short period of silence at the beginning and the end of the speech stimulation. The wind speed
was chosen to cover the relevant range where the wind noise level reached and / or exceeded the
voice level. Wind stimuli were generated from wind machines.
[0087]
With regard to the evaluation of the hearing aids and cochlear implant devices shown in Table 1,
the inventive and prior art WND algorithms are implemented in Matlab / Simulink and overlap of
samples of each microphone recording obtained from the stimuli of Table 3. Was used to process
continuous blocks. For headset and handset applications, the processing was performed at a
sampling rate of 8 kHz, as is typical for these devices. The output of each WND algorithm is again
processed by the IIR filter (b = [0.004]; a = [1-0.996]) and may be present block by block, of any
11-04-2019
30
noise in the WND algorithm output Since such changes were also removed, more consistent
output was provided for certain input stimuli.
[0088]
Examples of male and female voice recordings of the handset are shown in FIGS. 18a and 18b to
more clearly show the audio gaps.
[0089]
19a-19e show the output of the WND method applied for Bluetooth® headset recording of block
size 16 samples.
The initial response starts from 0 in all cases because of the initialization of the smooth IIR filter.
As seen in FIG. 19a, the chi-squared WND method of the present invention clearly separates wind
noise from speech. During silence between sentences of speech for about 3 to 4 seconds,
uncorrelated microphone noise produces wind-like values returned by the chi-square WND
method. However, since microphone noise is much lower in level (amplitude) than wind noise, a
simple level threshold could be used to distinguish between microphone and wind noise.
[0090]
FIG. 19 b demonstrates that the prior art correlated WND method can give similar values to
speech and wind noise, and thus falsely detect speech as wind noise. FIG. 19c shows that the
prior art Diff / Sum WND method gives about 0 for speech and 1 or more values for wind noise
and microphone noise. FIG. 19 d shows output values in response to the far-field tone sweep. The
output of the Chi-square WND method for far-field tones is less than 1.5 at all frequencies, which
is similar to that of speech and clearly lower than that of wind noise. Therefore, far-field tones
are clearly separated from wind noise by the chi-square method of the present invention. In
contrast, the output of the far field tone correlation WND method may be about 1 (no wind) at
some frequencies and about 0 (wind noise) at other frequencies. Therefore, far-field tones can be
misdetected as wind noise by the correlated WND method. The output of the far-field tone Diff /
Sum WND method may be about 0 (no wind) at some frequencies and more than one (wind
noise) at other frequencies. Therefore, far-field tones can be misdetected as wind noise by the
Diff / Sum WND method. FIG. 19e shows output values responsive to a near field (mouth) tone
sweep. The output of the far-field tone chi-square WND method is less than 2.0 at all frequencies,
11-04-2019
31
which is similar to the value of speech and clearly lower than the value of wind noise. Therefore,
near field tones are clearly separated from wind noise by the chi-square method of the present
invention. In contrast, the output of the near field tone correlation WND method may be about 1
(no wind) at some frequencies and about 0 (wind noise) at other frequencies. Therefore, nearfield tones can be misdetected as wind noise by the correlated WND method. The output of the
near field tone Diff / Sum WND method may be about 0 (no wind) at some frequencies and more
than one (wind noise) at other frequencies. Therefore, near-field tones can be misdetected as
wind noise by the Diff / Sum WND method.
[0091]
20a-20c show the results when chi-square calculations are repeated with one of the two
microphone signals inverted as described with reference to FIG. The lower of the two chi-square
values is the output and passes through the smoothing filter. In the tone sweep simulation, this
made the chi-square WND method of the present invention more robust to tones. Figures 19a,
19d and 19e show that actual tone sweep recording may not require this, but Figures 20a-20c
better isolate chi-square WND output with respect to wind and microphone noise This can be
useful to reduce the need for input level thresholds to be identified between these two types of
noise. The actual tone sweep recording includes echoes, microphone noise, and other effects that
were not a simulation of pure / ideal sinusoidal stimulation, which is between the results of the
simulation and the results of the actual microphone signal. Can explain the difference between
[0092]
FIG. 20a shows that by taking the minimum value of the two chi-square values of each block, the
output of microphone noise during period 3-4 seconds becomes more similar to the output value
of speech, and the value of wind noise It shows clearly being separated. Therefore, in this
scenario, the level threshold is not necessary to separate uncorrelated microphone noise from
wind noise when the minimal approach is applied.
[0093]
As described above and shown in FIG. 19d, the output of the chi-square WND value in response
to the far-field tone sweep was sufficiently low to distinguish the tone from the wind without
taking the minimum value of the two chi-square values . Nevertheless, Figure 20b shows that
11-04-2019
32
taking the minimum value can reduce (improve) the chi-square WND value of the far-field tone.
[0094]
As described above and shown in FIG. 19e, the output of the chi-square WND value in response
to the near-field (mouth) tone is sufficiently low that the near-field tone can be obtained without
taking the minimum value of the two chi-square values. Identified from the wind. Nevertheless,
FIG. 20c shows that by taking the minimum value, the chi-square WND value of the near field
(mouth) tone can also be reduced (improved).
[0095]
21a-21e show the outputs of different WND methods of smartphones with block size 16 samples.
As before, the initial response starts from 0 in all cases because of the initialization of the smooth
IIR filter. FIG. 21a illustrates that the chi-square WND method of the present invention helps to
distinguish wind noise from microphone noise, as it clearly separates wind noise from voice and
microphone noise during an audio gap of about 3 to 4 seconds Indicates that no level threshold
is required to The larger average chi-squared value by the handset compared to the headset is
probably due to the larger microphone spacing, which causes the locally generated wind noise to
be less similar among the microphones Do.
[0096]
FIG. 21 b shows that the correlated WND method only barely separates wind noise from nonwind stimuli. FIG. 21 c shows that the Diff / Sum WND method separated wind noise from speech
but did not separate wind noise from microphone noise at an audio gap of about 3 to 4 seconds.
FIG. 21 d shows that the chi-square WND method of the present invention provides far-field tone
output values similar to those of other non-wind stimuli, and these are typical values for wind
noise (as shown in FIG. 21 a Much less than about 9-12). Therefore, far-field tones are clearly
separated from wind noise by the chi-square WND method of the present invention. In contrast,
the output of the far field tone correlation WND method may be equal to the wind noise value at
some frequencies. Therefore, far-field tones can be misdetected as wind noise by the correlated
WND method. The output of the far-field tone Diff / Sum WND method may be equal to the wind
noise value at some frequencies. Therefore, far-field tones can be misdetected as wind noise by
the Diff / Sum WND method.
11-04-2019
33
[0097]
FIG. 21 e shows that the output of the Chi-square WND method of near-field (mouth-generated)
tones is similar to the values of other non-wind stimuli and is well below the values typical for
wind noise Indicates that. Therefore the near field (mouth generated) tones are clearly separated
from the wind noise. The output of the near field (mouth generated) tone correlation WND
method may be the same as the wind noise value at some frequencies. Therefore, near field
(mouth generated) tones can be misdetected as wind noise by the correlated WND method. The
output of the near field (mouth generated) tone Diff / Sum WND method may be the same as the
wind noise value at some frequencies. Therefore, near field (mouth generated) tones can be
misdetected as wind noise by the Diff / Sum WND method.
[0098]
Compared to a smartphone handset using a block size of 16 samples (as shown in FIGS. 21a-e), a
block size of 32 samples is more effective when used to distinguish wind noise from far-field and
near-field tones. Make the inventive chi-square WND method even more robust. This is illustrated
in Figures 22a-e. In FIG. 22a, the chi-squared WND method clearly distinguishes wind noise input
from other presented stimuli. Figures 22b and 22c show that the correlation WND method and
Diff / Sum WND method are also improved by the larger block size, but the discrimination of
wind noise from other stimuli is less critical with the chi-square WND method of the present
invention Indicates not.
[0099]
FIG. 22 d shows that the chi-square WND output of the far-field tone is far below the wind noise
value for the block size of 32 samples, while the correlated WND method and the Diff / Sum
WND method are far-field for some frequencies Indicates that the tone and the wind noise can
not be distinguished correctly. Figure 22e shows that the Chi-square WND output of the nearfield tone (from the mouth) has a block size well below the wind noise value of 32 samples, while
the correlated WND method and the Diff / Sum WND method do not The frequency indicates that
the near field tone and the wind noise can not be distinguished correctly.
11-04-2019
34
[0100]
23a-c show wind noise detector results obtained by the time domain implementation of the chisquare WND sub-band shown in FIG. The performance of this sub-band time domain
implementation was evaluated in response to the stimuli shown in Table 1 above. A second-order
filter, a fourth-order filter, an IIR filter, an one-octave filter, and a band pass filter are
incorporated into Matlab / Simulink, and the prerecorded microphone signal is filtered to subband, and then the sub-band microphone signal is Chi-square WND Processed by Although these
exemplary IIR filters were chosen because of their ease and efficiency of implementation in
typical DSP processors, different applications of different orders and types of filters with
different cutoff frequencies may be used in this application and It may be used as appropriate for
other applications. Note that for the full band implementation, the output of the WND algorithm
can use IIR filters (b = [0.004]; a = [1-0.996], other filter types and coefficients The constant
input stimulus was given a more consistent output because it removed any jitter-like changes in
the output of the WND algorithm that were processed by and should be present per block.
[0101]
FIG. 23a shows the smoothed chi-square WND output of wind, voice, microphone noise (silence),
and 1 kHz near-field tone stimuli processed by an octave band pass second-order IIR filter
centered at 1 kHz Show. The near field tone is at the center frequency of this band pass filter.
There is a clear separation between the wind noise smoothed WND output (collectively 2320)
and the speech stimulus smoothed output (collectively 2330). The microphone noise output
2310 is between the wind output and the audio output. The peak of speech stimulation is due to
the gaps between phonemes occupied by microphone noise. As mentioned above, the use of the
SPL threshold can be used when there is a need to more clearly distinguish between wind noise
and microphone noise, and this also reduces the peak height between speech of phonemes. The
smoothed WND output 2340 of the near-field tone at the center frequency of this sub-band is
lower than speech and almost zero, thereby correctly indicating the absence of wind.
[0102]
FIG. 23b shows the smoothed chi-square WND output of a wind, voice, microphone noise, and 1
kHz near-field tone stimulus processed by a one-octave band-pass second-order IIR filter
centered at 5 kHz. A significant amount of wind noise may be present at such high frequencies,
and as indicated above, other WND methods reliably identify wind noise and other sounds of
11-04-2019
35
such high frequencies. There is no possibility. The smoothed Chi-square WND output of speech,
microphone noise (quiet), and 1 kHz near-field tones (collectively 2410) is well below 0.5. The
smoothed WND outputs from the 3-12 m / s wind (collectively 2420) are all above about 1.0. For
the 5 kHz band evaluated in this case, the wind smoothed WND output 2430 at 1.5 m / s is at
0.5-1.0, which means that wind noise is lower at this wind speed It is because it concentrates on
Therefore, the chi-square WND has its output correctly reduced for low speed winds, so that
there is almost no wind noise of about 5 kHz and no detection of 1.5 m / s wind in the 5 kHz
band A chi-square threshold of about 1.0 could be used. Higher order band pass filters with
steeper low frequency roll off detect less lower frequency wind noise and with 1.5 m / s wind,
even lower smoothed WND output It occurs.
[0103]
FIG. 23 c shows the same step tone sweep processed by one octave band pass second-order IIR
filter centered on 1 kHz and 5 kHz as used to generate the results of FIGS. 23 a and 23 b. 3A
shows a smoothed chi-square WND output of. In either case, the smoothed Chi-square WND
output is less than 1.0 and is very similar to the smoothed WND output of the full-band
implementation of the Chi-square WND seen in FIG. , Supports the robustness of these exemplary
sub-band implementations of chi-square WND.
[0104]
Figures 24a-e show data of stimuli processed by FFT in the frequency domain before being
processed by chi-square WND. The FFT implementation of the chi-squared WND shown in FIG. 3
was evaluated with the same pre-recorded microphone signals and methods as the full-band
time-domain version shown in FIG. These stimuli are listed in Table 1 above.
[0105]
Chi-square WND performance in the frequency domain was evaluated in Matlab / Simulink using
pre-recorded microphone signals sampled at a rate of 16 kHz. For each microphone, the 64
blocks of overlapping blocks were processed by a 64 point Hanning window and a 64 point Fast
Fourier Transform (FFT). The FFT is calculated every 32 samples, or every 2 ms (ie, 50% overlap
between FFT frames), and transforms the composite FFT data for each bin into magnitude values,
and the magnitude values are Converted to dB. While this FFT processing may be exemplary in
11-04-2019
36
DSP hearing aid applications, this eliminates sampling rate, windows, FFT size, and other
combinations of processing of raw composite FFT output data into other values or units. is not.
[0106]
After computing each pair of FFTs (ie once for each of the two microphones), the dB value is the
buffer of the nearest 16 values (1 for each combination of microphone and FFT bin as shown in
FIG. 3) Stored in one buffer). Thereafter, for each FFT bin, the average of the values in the
corresponding first and second microphone buffers was calculated and used as first and second
comparison thresholds, respectively. However, if the buffer's dB value was below its
corresponding input level threshold, we set the comparison threshold for both microphones so
that they all exceed the corresponding buffer's dB value. This yielded a chi-squared value of zero.
The input level threshold is set for each FFT bin to 5 dB above the maximum microphone noise
level, and this is to ensure that microphone noise is not mistakenly detected as wind noise by this
FFT implementation of chi-square WND Was necessary. Higher input level thresholds may be
used to ensure that winds that are not audible or unobtrusive to the user are not detected.
[0107]
The buffer data was then compared to the corresponding comparison threshold to count the
number of positive and negative values for the comparison threshold. Values within 0.5 dB of the
corresponding comparison threshold were counted as positive because they were treated as
equal to the comparison threshold. This improves the way this FFT implementation of the Chisquare WND handles the constant pure tone input well, thereby providing a pattern that is not
necessarily the same across the microphones to a very small extent, such as less than 0.1 dB.
Switching to either side of the comparison threshold and resulting in false detection of the tone
as wind noise. The positive and negative value counts are then processed as described above to
calculate the chi-square WND output, which is calculated from the already described IIR smooth
filter (b = [0.004]; a = [1-0.996 Processed by).
[0108]
FIG. 24a shows the smoothed chi-square WND output of wind, speech, microphone noise
(silence), and 1 kHz near-field tone stimuli for a 250 Hz FFT bin. The near field tone and
microphone noise output is zero, and there is a clear separation between the voice and wind
11-04-2019
37
noise values, indicating correct detection of wind noise at 250 Hz. A suitable wind detection
threshold may be about 0.1 to 0.2. In general, the smoothed chi-squared output value of wind
noise and speech is lower than the time-domain implementation of chi-squared WND.
[0109]
FIG. 24b shows the smoothed chi-square WND output of the 750 Hz FFT bin. The smoothed chisquare WND output is clearly less than 0.1 for speech and near zero for microphone noise and
near zero for 1 kHz near field tones. The 1.5 m / s wind smoothness value is the lowest and
varies from about 0.1 to 0.2, while the 3 m / s wind smoothness value is slightly higher and
fluctuates about 0.2. This is because the 1.5 m / s wind noise level is only about 12 dB above the
microphone noise of the 750 Hz FFT bin and may not be audible, and optionally need not be
detected, It is correct behavior. 3 m / s wind noise with less reduced smoothed chi-squared
values that tend to keep above 0.2 compared to the 250 Hz FFT bin and still depend on wind
noise consistency Levels are also reduced (but to a lesser extent). Wind noise levels of 6 and 12
m / s have little microphone noise and have a clearly higher smoothed chi-squared value and are
properly classified as wind noise.
[0110]
FIG. 24 c shows the smoothed chi-square WND output of a 1000 Hz FFT bin. The near field tone
is at the center frequency of this band pass filter. The smoothed chi-squared WND output is
clearly less than 0.1 for speech and zero for microphone noise and near zero for 1 kHz near-field
tones. The wind noise level is close to the microphone noise level in this FFT bin, so the smooth
values of 1.5 and 3 m / s wind noise are close to zero. Therefore, the chi-squared WND did not
correctly detect wind noise at wind speeds that do not produce a significant amount of wind
noise at 1 kHz. The wind noise has a large energy at 1 kHz at these wind speeds, so it can detect
the wind noise of these wind speeds correctly in the 1 kHz FFT bin, so the 6 and 12 m / sec wind
smoothed chi-squares The value is clearly higher than that of speech.
[0111]
FIG. 24 d shows the smoothed chi-square WND output of a 4000 Hz FFT bin. At this frequency,
only 12 m / s wind noise has large energy, so it can be correctly classified as wind from the
smoothed Chi-square WND output. The smoothed output of all other stimuli is less than 0.1,
11-04-2019
38
which is appropriate for lower wind speeds and non-wind stimuli.
[0112]
FIG. 24e shows the smoothed chi-square WND output of the 7000 Hz FFT bin. At this frequency,
only 12 m / s wind noise can be correctly classified as wind from the smoothed Chi-square WND
output as it has a large energy. The smoothed output of all other stimuli tends to be less than 0.1,
which is appropriate for lower wind speeds and non-wind stimuli. Therefore, this exemplary FFT
implementation of chi-square WND can correctly detect wind noise present at very high
frequencies, and distinguish between wind noise and non-wind noise. Compared to the sub-band
time-domain implementation, the Chi-Square WND FFT implementation operates in a narrower
frequency band and covers longer times but blocks blocks of samples into RMS input level
estimates Process the data with reduced time resolution because of conversion. These differences
account for the differences shown between the chi-square WND outputs for these
implementations.
[0113]
FIG. 24f shows smoothed chi-square WND outputs 2462, 2464, 2466 of far-field stepped tone
sweeps of the 1000 Hz, 4000 Hz, and 7000 Hz FFT bins, respectively. The smoothed output is
generally zero, typically with spikes less than 0.1, and corresponds to a step change in tone
frequency that produces a sharp transient. The spikes tend to frequencies around the center
frequency of each FFT bin. This confirms the robustness of this FFT implementation of chisquared WND to misdetecting non-wind stimuli as wind noise.
[0114]
It will be appreciated by those skilled in the art that numerous variations and / or modifications
may be made to the invention as shown in the specific embodiments without departing from the
spirit or scope of the invention as broadly described. Be done. Therefore, the present
embodiments are to be considered in all respects only as illustrative and not restrictive.
11-04-2019
39
Документ
Категория
Без категории
Просмотров
0
Размер файла
68 Кб
Теги
jp2015505069, description
1/--страниц
Пожаловаться на содержимое документа