close

Вход

Забыли?

вход по аккаунту

?

JP2018032931

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2018032931
Abstract: The present invention provides a sensitivity calibration gain calculation method capable
of accurately calculating sensitivity calibration gain even in the presence of disturbing sounds,
and setting threshold values more easily. An acoustic signal processing apparatus 10 converts a
plurality of input signals s1 (n) and s2 (n) obtained from a plurality of microphones m_1 and
m_2 into a plurality of frequency domain signals converted from time domain to frequency
domain. A front suppression signal generation unit 12 that generates a front suppression signal
having a blind spot on the front, a coherence calculation unit 13 that calculates coherence based
on signals obtained from a plurality of input signals, and a front suppression signal; Based on the
feature amount calculating unit 14 that calculates a feature amount representing the relationship
with the coherence, a target sound section not affected by the interference sound is detected
based on the feature amount, and a calibration gain for each input signal of the target sound
section And a calibration gain multiplication unit 16 or 17 that calibrates each corresponding
input signal with each calibration gain. [Selected figure] Figure 1
Acoustic signal processing apparatus, program and method
[0001]
The present invention relates to an audio signal processing apparatus, program, and method, and
can be applied to audio signal processing used in, for example, a communication device or
communication software used for telephones, video phones, etc., or for preprocessing of speech
recognition processing. is there.
[0002]
03-05-2019
1
BACKGROUND In recent years, devices equipped with various voice processing functions such as
a voice call function and a voice recognition function, such as smartphones and car navigation,
have become widespread.
However, with the widespread use of these devices, voice processing functions are being used
under harsher noise environments than before, such as in a busy city or in a moving car.
Therefore, there is a growing demand for signal processing technology that can maintain call
sound quality and speech recognition performance even in a noisy environment.
[0003]
JP, 2014-68052, A
[0004]
Hiraoka Kazuyuki, Hori Gen, "Probability Statistics for Programming", published by Ohmsha Co.,
Ltd., October 23, 2009, p. 178-p.179
[0005]
In recent years, acoustic signal processing technology using a multi-channel microphone has
been realized, but even with microphones of the same model number, there is a sensitivity
difference, and accurate acoustic feature values can not be calculated unless the sensitivity
difference is calibrated.
Until now, measure the sensitivity of the microphone in advance, set the correction gain
according to the sensitivity difference, compare the input level for each channel, and
automatically set the correction gain to match the average value It copes with.
However, since the former takes time and the latter fills not only the sensitivity difference of the
microphone but also the difference between the acquired input signals, there is a problem that
the accuracy of the acoustic feature quantity calculated in the later stage can not be ensured.
[0006]
03-05-2019
2
One of the methods for solving this problem is to calculate the calibration gain by comparing the
input level only in the section of the signal component coming from the front of the microphone
among the input signals. This is because if the signal coming from the front is the same distance
between each microphone and the sound source, the acoustic characteristic difference between
the signal components reaching the microphone is small, and the characteristic difference
occurring between the two is only the microphone sensitivity. It is premised that what can be
expected. One solution based on this is the method described in Patent Document 1. This focuses
on the fact that the magnitude of the feature amount of coherence fluctuates depending on
whether the voice of the target speaker arrives from the front, and calculates the microphone
sensitivity difference calibration gain in the signal section where the voice comes from the front.
It is a technology. Note that even if there is a difference in the sensitivity of the microphone, the
behavior in which the magnitude fluctuates depending on whether the voice comes from the
front is maintained, so that the sensitivity difference can be calibrated by this method.
(Complement: For the calculation method of the coherence, refer to Equation 7 of Patent
Document 1)
[0007]
However, since the method of Patent Document 1 has a large value of coherence even when the
target voice coming from the front of the microphone array simultaneously receives the target
voice coming from the left and right from another speaker from another side (interference
sound), it comes from the front. Unintended speech components are also reflected in the
calibration gain. In addition, since the sensitivity difference of the microphone is random for each
microphone array, optimization of the threshold for detecting the signal section coming from the
front is difficult, and there is a possibility that the target voice section is erroneously determined.
[0008]
Therefore, in order to improve the above two problems, there is a need for a sensitivity
calibration gain calculation method that can accurately calculate the sensitivity calibration gain
even if there is an interference sound and can set the threshold more easily. .
[0009]
In order to solve the above problems, an acoustic signal processing apparatus according to a first
aspect of the present invention comprises: (1) a plurality of frequency domains obtained by
converting a plurality of input signals obtained from each of a plurality of microphones from a
03-05-2019
3
time domain to a frequency domain (2) a coherence calculation unit which calculates coherence
based on signals obtained from a plurality of input signals; (2) a front suppression signal
generation unit which generates a front suppression signal having a dead angle in front based on
a difference between signals; ) A feature amount calculation unit that calculates a feature amount
that represents the relationship between the front suppression signal and the above-mentioned
coherence, and (4) Based on the feature amount, detects a target sound section that is not
affected by interference noise, A calibration gain calculation unit that calculates a calibration gain
for each input signal of a section, and (5) a calibration unit that calibrates each corresponding
input signal with each calibration gain are characterized.
[0010]
An acoustic signal processing program according to a second aspect of the present invention is
based on the difference between a plurality of frequency domain signals obtained by converting
a plurality of input signals obtained from each of a plurality of microphones from a time domain
to a frequency domain. A front suppression signal generation unit that generates a front
suppression signal having a blind spot in front, (2) a coherence calculation unit that calculates
coherence based on signals obtained from a plurality of input signals, and (3) a front suppression
signal (4) A target sound section not affected by the interference sound is detected based on the
characteristic quantity calculation unit that calculates the characteristic quantity representing the
relationship with the coherence, and the target sound section with respect to each input signal is
detected. A calibration gain calculation unit that calculates a calibration gain, and (5) each
calibration gain are characterized as functioning as a calibration unit that calibrates each
corresponding input signal.
[0011]
According to a third aspect of the present invention, there is provided an acoustic signal
processing method comprising: (1) a plurality of frequency domain signals obtained by
converting a plurality of input signals obtained from each of a plurality of microphones from a
time domain to a frequency domain; The frontal suppression signal having a blind spot in front is
generated based on the difference of (2), the coherence calculation unit calculates the coherence
based on the signals obtained from the plurality of input signals, and (3) the feature amount
calculation unit (4) The calibration gain calculation unit detects a target sound section that is not
affected by the disturbance sound based on the characteristic quantity, and the target sound
section A calibration gain for each input signal of the section is calculated, and (5) a calibration
unit calibrates each corresponding input signal with each calibration gain.
[0012]
According to the present invention, the sensitivity calibration gain can be accurately calculated
03-05-2019
4
even in the presence of an interference sound, and the threshold can be set more easily.
[0013]
BRIEF DESCRIPTION OF THE DRAWINGS It is a block diagram which shows the whole structure
of the acoustic signal processing apparatus which concerns on embodiment.
It is explanatory drawing which shows the characteristic of the directivity formed of the front
suppression signal generation part which concerns on embodiment.
It is a block diagram which shows the structure of the correlation calculation part which
concerns on embodiment.
It is a block diagram showing composition of a calibration gain calculation part concerning an
embodiment.
It is a flowchart which shows the processing operation in the calibration gain calculation part
which concerns on embodiment.
[0014]
(A) Main Embodiment In the following, embodiments of an acoustic signal processing apparatus,
program and method according to the present invention will be described in detail with reference
to the drawings.
[0015]
(A-1) Configuration of Embodiment FIG. 1 is a block diagram showing the overall configuration of
an acoustic signal processing apparatus 10 according to this embodiment.
[0016]
In FIG. 1, the acoustic signal processing apparatus 10 includes a plurality of (the two cases are
illustrated in FIG. 1) microphones m_1 and m_2, an FFT unit 11, a front suppression signal
generation unit 12, a coherence calculation unit 13, a correlation calculation A unit 14, a
03-05-2019
5
calibration gain calculator 15, a first calibration gain multiplier 16, and a second calibration gain
multiplier 17 are provided.
[0017]
The “feature amount calculation unit” described in the claims includes the correlation
calculation unit 14.
Further, the “calibration gain calculation unit” includes the calibration gain calculation unit 15.
Furthermore, the “calibration unit” includes a first calibration gain multiplication unit 16 and a
second calibration gain multiplication unit 17.
[0018]
In the acoustic signal processing apparatus 10 illustrated in FIG. 1, components other than the
microphones m_1 and m_2 can be realized as software (acoustic signal processing program)
executed by the CPU, and the function of the acoustic signal processing program is shown in FIG.
Can be represented by
[0019]
The mike m_1 and the mike m_2 are disposed apart by a predetermined distance (or an arbitrary
distance), and each of the mike m_1 and the mike m_2 captures surrounding sound.
Each acoustic signal (input signal) captured by each of the microphones m_1 and m_2 is
converted into an analog / digital (A / D) converter (not shown), and each of the input signals s1
(n) and s2 (n) , FFT unit 11, calibration gain calculation unit 15, first calibration gain
multiplication unit 16 and second calibration gain multiplication unit 17.
Here, n is an index representing the input order of samples, and is expressed by a positive
integer.
03-05-2019
6
In the text, the smaller the value of n, the older the input sample, and the larger the value of n,
the newer input sample.
[0020]
The FFT unit 11 receives input signals s1 (n) and s2 (n) from the microphones m_1 and m_2 and
performs fast Fourier transform (or discrete Fourier transform) on the input signals s1 (n) and s2
(n). is there. Thereby, the input signals s1 (n) and s2 (n) can be converted from the time domain
to the frequency domain. Note that the FFT unit 11 performs analysis on the analysis frame
FRAME 1 (K) and N (an arbitrary integer) samples from the input signals s 1 (n) and s 2 (n) when
performing fast Fourier transform. Assume that FRAME 2 (K) is configured.
[0021]
The example which comprises FRAME1 from the input signal s1 is illustrated to (1) Formula. In
the following equation (1), K is an index indicating the order of frames, and is expressed by a
positive integer. In the following, the smaller the value of K, the older the analysis frame, and the
larger the value of K, the newer the analysis frame. Further, in the following description of the
operation, it is assumed that the index K represents the latest analysis frame to be analyzed,
unless otherwise noted.
[0022]
The FFT unit 11 performs fast Fourier transform processing for each analysis frame to input a
frequency domain signal X1 (f, K) obtained by performing Fourier transform on an analysis
frame FRAME 1 (K) configured from the input signal s1, and A frequency domain signal X2 (f, X)
obtained by Fourier transform to an analysis frame FRAME2 (K) composed of the signal s2 is
acquired. The FFT unit 11 supplies the front suppression signal generation unit 12 and also
supplies the frequency domain signal X 1 (f, K) and the frequency domain signal X 2 (f, X) to the
coherence calculation unit 13.
[0023]
03-05-2019
7
Here, f is an index representing a frequency. Also, the frequency domain signals X1 (f, K) and X2
(f, K) are not single values, but m (m is any number) of plural frequencies f1 to fm as shown in
the following equation (2) (Integer component) (spectral component).
[0024]
In the above equation (2), X1 (f, K) is a complex number, and is composed of a real part and an
imaginary part. The same applies to X2 (f, K) and the front suppression signal N (f, K) appearing
in the front suppression signal generator 12 described later.
[0025]
The front suppression signal generation unit 12 performs processing for suppressing, for each
frequency component, a signal component coming from the front direction, for the signal from
the FFT unit. In other words, the front suppression signal generation unit 12 functions as a
directional filter that suppresses components in the front direction.
[0026]
For example, as illustrated in FIG. 2, the front suppression signal generation unit 12 suppresses
the component in the front direction from the signal from the FFT unit 11 using a bi-directional
filter having a dead angle in the front direction of a figure eight. Form a directional filter.
[0027]
Specifically, the front suppression signal generation unit 12 performs calculation as shown in
equation (3) based on the signals X1 (f, K) and X2 (f, K) from the FFT unit 11 to obtain frequency
components. Each front suppression signal N (f, N) is generated.
The calculation of equation (3) below corresponds to the process of forming an 8-shaped bidirectional filter having a dead angle in the front direction as shown in FIG.
N(f,K)=X1(f,K)−X2(f,K) …(3)
03-05-2019
8
[0028]
As described above, the front suppression signal generation unit 12 acquires each frequency
component of the frequencies f1 to fm (power for one frame of each frequency band).
[0029]
Further, the front suppression signal generation unit 12 calculates an average front suppression
signal AVE_N (K) obtained by averaging the front suppression signal N (f, K) over all the
frequencies f1 to fm according to the equation (4). Do.
[0030]
The coherence calculation unit 14 calculates a coherence COH (K) by forming a signal having
strong directivity in a specific direction included in the frequency domain signals X1 (f, K) and
X2 (f, K) from the FFT unit 11. .
[0031]
Here, the process of calculating the coherence COH (K) in the coherence calculation unit 14 will
be described.
[0032]
The coherence calculation unit 14 processes the signal B1 (f, K) processed by the filter having
strong directivity in the first direction (for example, the left direction) from the frequency domain
signals X1 (f, K) and X2 (f, K). Also, the coherence calculation unit 14 generates a signal B2 (f
which is processed from the frequency domain signals X1 (f, K) and X2 (f, K) in the second
direction (for example, the right direction) with a strong directivity. , K).
As a method of forming the signals B1 (f) and B2 (f) having strong directivity in a specific
direction, the existing method can be applied. Here, the following equation (5) is applied to the
first direction An example is shown in which the signal B1 having strong directivity is formed and
the signal B2 having strong directivity in the second direction is formed by applying the
following equation (6).
03-05-2019
9
[0033]
In the above equations (5) and (6), S represents a sampling frequency, N represents an FFT
analysis frame length, τ represents a sound wave arrival time difference between the
microphone m_1 and the microphone m_2, i represents an imaginary unit, and f represents a
frequency. .
[0034]
Next, the coherence calculation unit 13 performs the following operations (7) and (8) on the
signals B1 (f) and B2 (f) obtained as described above. We obtain the coherence COH (K).
Here, B2 (f, K) <*> in the equation (7) is a conjugate complex number of B2 (f, K).
[0035]
coef (f, K) is a coherence at a component of a frame having an index of an arbitrary index K (an
arbitrary frequency f (a frequency of any of frequencies f1 to fm) constituting the analysis frames
FRAME1 (K) and FRAME2 (K)) It is assumed that
[0036]
When coef (f, K) is determined, if the directivity of the signal B1 (f) and the directivity of the
signal B (f) are different from each other, the signals B1 (f) and B2 ( The directivity direction
according to f) may be any direction other than the front direction.
Further, the method of calculating coef (f, K) is not limited to the above-described calculation
method, and for example, the calculation method described in Patent Document 1 can be applied.
[0037]
The correlation calculation unit 14 acquires the average front suppression signal AVE_N (K) from
the front suppression signal generation unit 12, acquires the coherence COH (K) from the
coherence calculation unit 13, and obtains the average front suppression signal AVE_N (K) and
the coherence COH. The correlation coefficient cor (K) with (K) is calculated.
03-05-2019
10
[0038]
The significance of the correlation calculation unit 14 calculating the correlation coefficient
between the front suppression signal (average front suppression signal) having directivity in
directions other than the front direction and the coherence will be described.
[0039]
Here, it is assumed that a sound source emitting a target sound is present in the front direction of
the microphone m_1 and the microphone m_2, and the disturbance sound arrives from a
direction other than the front direction (for example, the left and right direction of the mike m1
and the m2 .
[0040]
For example, in the case where “a disturbance sound is not present” and “a target sound is
present”, the front suppression signal has a signal value proportional to the magnitude of the
target sound component.
However, as shown in FIG. 2, the gain in the front direction is smaller than the gain in the lateral
direction, and thus has a smaller value than when there is an interference sound.
[0041]
Further, the coherence COH (K) is a feature having a deep relationship with the incoming
direction of the input signal, and can be rephrased as the correlation between two signal
components.
This is because equation (6) is an equation for calculating the correlation for a certain frequency
component, and equation (7) is an equation for calculating the average of the correlation values
of all frequency components.
Therefore, when the coherence COH (K) is small, the correlation between the two signal
03-05-2019
11
components is small, and conversely, when the coherence COH (K) is large, it can be rephrased
that the correlation between the two signal components is large. Can.
The input signal in the case where the coherence COH (K) is small is said to be a signal whose
arrival direction is largely biased to either the right or the left, and that it comes from a direction
other than the front direction.
On the other hand, it can be said that the input signal in the case where the coherence COH (K) is
large has a small deviation in the arrival direction, and is a signal that has arrived from the front
direction.
[0042]
In this case, the coherence COH (K) has a large value when there is no interference sound and the
target sound exists, and the interference sound exists and the target sound exists. , Coherence
COH (K) is a small value.
[0043]
If the above behavior is arranged focusing on the presence or absence of the disturbance sound,
the following relationship is obtained.
・ When there is no interference sound and “the target sound exists”, the coherence COH (K)
becomes a large value, and the front suppression signal becomes a value proportional to the size
of the target sound component ・ “disturbance When there is sound, the coherence COH (K)
has a small value, and the front suppression signal has a large value.
[0044]
By the way, in the case of the above behavior, if the correlation coefficient between the front
suppression signal and the coherence COH (K) is introduced, the following can be said.
-When there is no interference sound, the correlation coefficient has a positive value. When there
03-05-2019
12
is an interference sound, the correlation coefficient has a negative value. Therefore, the presence
or absence of the interference sound can be determined only by observing the positive or
negative of the correlation coefficient between the front suppression signal and the coherence.
Then, using this behavior, when the value of the correlation coefficient between the front
suppression signal and the coherence is “positive”, it can be judged as a section of only the
target sound from the front direction, and therefore, it is not affected by the disturbing sound.
The calibration gain of the sensitivity difference of the microphones m_1 and m_2 can be
calculated. In addition, since the target voice section can be detected only by observing the
positive and negative values of the correlation coefficient, threshold setting can be facilitated
unlike the prior art.
[0045]
Hereinafter, the process of calculating the correlation coefficient between the front suppression
signal and the coherence in the correlation calculation unit 14 will be described in detail with
reference to the drawings.
[0046]
FIG. 3 is a block diagram showing the configuration of the correlation calculation unit 14
according to the embodiment.
[0047]
In FIG. 3, the correlation calculation unit 14 includes a front suppression signal / coherence
acquisition unit 31, a correlation coefficient calculation unit 32, and a correlation coefficient
output unit 33.
[0048]
The front suppression signal / coherence acquisition unit 31 acquires the average front
suppression signal AVE_N (K) and the coherence COH (K), and the correlation coefficient
calculation unit 32 calculates the average front suppression signal AVE_N (K) and the coherence
COH (K And the correlation coefficient cor (K) is calculated.
Then, the correlation coefficient output unit 33 outputs the calculated correlation coefficient cor
(K) to the calibration gain calculation unit 15.
03-05-2019
13
[0049]
Here, the calculation method of the correlation coefficient cor (K) is not limited, but, for example,
the calculation method described in Non-Patent Document 1 can be applied.
For example, the correlation coefficient cor (K) is determined for each frame using the following
equation (9).
In the following equation (9), Cov [AVE_N (K), COH (K)] indicates the covariance of the average
front suppression signal AVE_N (K) and the coherence COH (K). Further, in the following
equation (9), σ AVE_N (K) represents the standard deviation of the average front suppression
signal AVE_N (K), and σ COH (K) represents the standard deviation of the coherence COH (K).
The correlation coefficient cor (K) thus obtained takes a value of −1.0 to 1.0.
[0050]
The calibration gain calculation unit 15 acquires the correlation coefficient cor (K) from the
correlation calculation unit 14 and observes the positive and negative of the correlation
coefficient cor (K), and in a section where the correlation coefficient cor (K) is “positive”. The
calibration gains of the microphone m_1 and the microphone m_2 are calculated using only the
input signal.
[0051]
FIG. 4 is a block diagram showing the configuration of the calibration gain calculator 15
according to the embodiment.
[0052]
In FIG. 4, the calibration gain calculation unit 15 includes a correlation coefficient and input
signal reception unit 41, a calibration gain calculation execution determination unit 42, a
calibration gain calculation unit 43, a calibration gain storage unit 44, and a calibration gain
output unit 45.
[0053]
The correlation coefficient and input signal acquisition unit 41 acquires the correlation
03-05-2019
14
coefficient cor (K) and the analysis frames of the input signal FRAME1 (K) and FRAME2 (K) from
the correlation calculation unit 14.
[0054]
The calibration gain calculation execution determination unit 42 determines whether the value of
the correlation coefficient cor (K) is “positive” or “negative” in order to determine whether
to calculate the calibration gain. .
That is, when the value of the correlation coefficient cor (K) is "positive", the calibration gain
calculation execution determination unit 42 determines that the input signal includes the target
sound section in which the disturbance sound is not included, and calculates the calibration gain.
It determines that it is a section to be executed.
On the other hand, when the value of the correlation coefficient cor (K) is "negative", the
calibration gain calculation execution determination unit 42 determines that the section contains
a disturbance sound in the input signal and does not execute the calculation of the calibration
gain. It determines that it is a section.
[0055]
The calibration gain calculation unit 43 calculates calibration gains LEVEL_GAIN_1CH and
LEVEL_GAIN_2CH with respect to the sensitivity difference between the microphones m_1 and
m_2 according to the determination result by the calibration gain calculation execution
determination unit 42.
[0056]
When the calibration gain calculation execution determination unit 42 determines that the
correlation coefficient cor (K) is “positive”, the calibration gain calculation unit 43 calculates
calibration gains LEVEL_GAIN_1CH and LEVEL_GAIN_2CH.
On the other hand, when the calibration gain calculation execution determination unit 42
determines that the correlation coefficient cor (K) is "negative", the calibration gain calculation
03-05-2019
15
unit 43 does not calculate the calibration gain and stores it in the calibration gain storage unit
44. Set the calibration value as the calibration gain.
[0057]
Here, a method of calculating the calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH by
the calibration gain calculator 43 will be described.
[0058]
The calibration gain calculation unit 43 uses the following equations (10, 1), (10, 2), (11), (12, 1)
and (12, 2) to calibrate the input signal s1. The gain CALIB_GAIN_1CH and the calibration gain
CALIB_GAIN_2CH for the input signal s2 are calculated.
[0059]
The equation (10, 1) is for calculating the average LEVEL_1 CH of the absolute values of all the
components of the current frame (the K-th frame) of the input signal s1 (n) captured by the
microphone m_1. The value LEVEL_1CH can be regarded as a value reflecting the sensitivity of
the microphone m_1.
The equation (10, 2) is for calculating the average LEVEL_2CH of the absolute values of all the
components of the current frame (the K-th frame) of the input signal s2 (n) captured by the
microphone m_2. The value LEVEL_2CH can be regarded as a value reflecting the sensitivity of
the microphone m_2.
[0060]
Note that, for example, the total value of the absolute values of the components of each frame in
a predetermined number of frames may be used as the values LEVEL_1CH and LEVEL_2CH
reflecting the microphone sensitivity.
Also, for example, a value that reflects the average of the absolute values of all elements (signal
components) constituting the latest P (P ≦ K) frames whose correlation coefficient cor (K) is
“positive”, a value that reflects the microphone sensitivity It may be used as LEVEL_1CH and
03-05-2019
16
LEVEL_2CH.
In the latter case, the current frame (Kth frame) is stored by storing the sum of the absolute
values of the latest P-1 frame components whose correlation coefficient cor (K) is "positive".
When the information of FRAME 1 (K) and FRAME 2 (K) is given, values LEVEL 1 CH and LEVEL
2 CH reflecting microphone sensitivity can be easily calculated.
As described above, by calculating the average or the total value of the absolute values of the
signal components over a long period of time, it is possible to calculate the value reflecting the
microphone sensitivity while suppressing the influence of the instantaneous fluctuation of the
input signal.
[0061]
The equations (10, 1) and (10, 2) are an example of the equation for calculating the value
reflecting the microphone sensitivity, and various other equations can be applied as described
above. However, it is required that the calculation formula of the value LEVEL_1CH reflecting the
microphone sensitivity of the microphone m_1 and the calculation expression of the value
LEVEL_2CH reflecting the microphone sensitivity of the microphone m_2 be the same calculation
formula.
[0062]
Equation (11) calculates the average AVE_LEVEL of the sensitivities LEVEL_1 CH and LEVEL_2
CH of the two microphones m_1 and m_2 as the target sensitivity of the microphones m_1 and
m_2 after calibration. The larger or smaller value of the sensitivity LEVEL_1CH and LEVEL_2CH
of the two microphones m_1 and m_2 may be set as the target sensitivity.
[0063]
The equation (12, 1) is calibrated so that the value obtained by multiplying the sensitivity
LEVEL_1 CH of the microphone m_1 by the calibration gain CALIB_GAIN_1 CH becomes the
target sensitivity AVE_LEVEL, as understood from the equation in which the denominator
03-05-2019
17
LEVEL_1 CH on the right side is transposed to the left side. It is an expression which defines gain
CALIB_GAIN_1CH. Similarly, as the expression (12, 2) can be understood by considering the
expression in which the denominator LEVEL_2CH on the right side is transposed to the left side,
the value obtained by multiplying the sensitivity LEVEL_2CH of the microphone m_2 by the
calibration gain CALIB_GAIN_2CH becomes the target sensitivity AVE_LEVEL In the equation, the
calibration gain CALIB_GAIN_2CH is determined.
[0064]
The calibration gain storage unit 44 stores calibration gains CALIB_GAIN_1CH (=
INIT_GAIN_1CH) and CALIB_GAIN_2CH (= INIT_GAIN_2CH) which are applied when the
calibration gain calculation unit 43 does not calculate calibration gains. As such calibration gains
INIT_GAIN_1CH and INIT_GAIN_2CH, the value 1.0 which is not calibrated may be applied, or
the latest value calculated by the calibration gain calculator 43 may be applied.
[0065]
The calibration gain output unit 45 generates the calibration gains CALIB_GAIN_1CH and
CALIB_GAIN_2CH calculated by the calibration gain calculation unit 43 or the calibration gains
INIT_GAIN_1CH and INIT_GAIN_2CH read from the storage unit 24 respectively corresponding
to the calibration gain multiplication units 16 and 17. It is given to
[0066]
The first calibration gain multiplication unit 16 outputs a post-calibration signal y1 (n) obtained
by multiplying the input signal s1 (n) from the microphone m_1 by the calibration gain
CALIB_GAIN_1CH.
[0067]
The second calibration gain multiplication unit 17 outputs a post-calibration signal y2 (n)
obtained by multiplying the input signal s2 (n) from the microphone m_2 by the calibration gain
CALIB_GAIN_2CH.
[0068]
(A-2) Operation of Embodiment Next, the operation of the overall processing and calculation
processing of the calibration gain in the acoustic signal processing device 10 according to the
embodiment will be described in detail with reference to the drawings.
03-05-2019
18
[0069]
Input signals s1 (n) and s2 (n) for one frame are input from the microphones m_1 and m_2 to the
FFT unit 11 via AD converters (not shown).
[0070]
The FFT unit 11 performs Fourier transform on analysis frames FRAME1 (K) and FRAME2 (K)
based on input signals s1 (n) and s2 (n) for one frame, and a signal X1 (f, K) indicated in the
frequency domain And obtain X 2 (f, K).
The signals X 1 (f, K) and X 2 (f, K) generated by the FFT unit 11 are supplied to the front
suppression signal generation unit 12 and the coherence calculation unit 13.
[0071]
The front suppression signal generation unit 12 calculates a front suppression signal N (f, K)
having directivity in directions other than the front direction based on the signals X1 (f, K) and
X2 (f, K).
Then, the front suppression signal generation unit 12 generates an average front suppression
signal AVE_N (f, K) obtained by averaging the front suppression signals N (f, K) over all
frequencies, and this average front suppression signal AVE_N (K). ) To the correlation calculation
unit 14.
[0072]
On the other hand, the coherence calculation unit 13 calculates the coherence COH based on the
signals X1 (f, K) and X2 (f, K), and gives the coherence COH to the correlation calculation unit 14.
[0073]
The correlation calculation unit 14 obtains the average front suppression signal AVE_N (f, K) and
03-05-2019
19
the coherence COH, and calculates the correlation coefficient cor (K) between the average front
suppression signal AVE_N (f, K) and the coherence COH, The correlation coefficient cor (K) is
given to the calibration gain calculator 15.
[0074]
The calibration gain calculation unit 15 acquires the correlation coefficient cor (K), observes the
positive or negative of the correlation coefficient cor (K), and according to the determination
result, each signal s1 (n) and s2 (n) Calculate the calibration gain for.
Further, the calibration gain calculation unit 15 outputs the calibration gain CALIB_GAIN_1CH
for the signal s1 (n) to the first calibration gain multiplication unit 16, and outputs the calibration
gain CALIB_GAIN_2CH for the signal s2 (n) to the second calibration gain multiplication unit 17. .
[0075]
FIG. 5 is a flowchart showing the processing operation in the calibration gain calculation unit 15.
[0076]
The correlation coefficient and input signal acquisition unit 41 acquires the correlation
coefficient cor (K) from the correlation coefficient unit 14, and generates FRAME1 (K) and
FRAME2 (K) of the input signals s1 (n) and s2 (n). Acquire (S51).
[0077]
Then, the calibration gain calculation execution determination unit 42 determines whether the
value of the correlation coefficient cor (K) is positive or negative (S52).
[0078]
When the correlation coefficient cor (K) is positive, there is no interference sound coming from
directions other than the front direction, and it is regarded as the target sound section from the
front direction, and the calibration gain calculation unit 43 calculates the correlation coefficient
cor (K ), FRAME 1 (K) and FRAME 2 (K), signals according to equations (10, 1), (10, 2), (11), (12,
03-05-2019
20
1), (12, 2) The calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH for s1 (n) and the
signal s2 (n) are calculated (S53).
At this time, the calibration gain calculation unit 43 stores the calculated calibration gains
CALIB_GAIN_1CH and CALIB_GAIN_2CH in the calibration gain storage unit 44, and updates the
calibration gains stored in the calibration gain storage unit 44.
[0079]
When the correlation coefficient cor (K) is negative, it is considered that there is an interference
sound coming from a direction other than the front direction, and the calibration gain calculation
unit 43 calculates the value stored in the calibration gain storage unit 44 as the calibration gain
CALIB_GAIN_1CH and It is set as CALIB_GAIN_2CH (S54).
[0080]
That is, when the initial values INIT_GAIN_1CH and INIT_GAIN_2CH of the calibration gain are
stored in the calibration gain storage unit 44, INIT_GAIN_1CH is CALIB_GAIN_1CH, and
INIT_GAIN_2CH is CALIB_GAIN_2CH.
Alternatively, when the latest calibration gain is stored in the calibration gain storage unit 44, the
latest calibration gains stored in the calibration gain storage unit 44 are set as the present
calibration gains CALIB_GAIN_1CH and CALIB_GAIN_2CH.
[0081]
Then, the calibration gain output unit 45 outputs the calibration gain CALIB_GAIN_1CH to the
first calibration gain multiplication unit 16, and outputs the calibration gain CALIB_GAIN_2CH to
the second calibration gain multiplication unit 17 (S55).
Then, the calibration gain calculation unit 15 updates the index K (S56), proceeds to S51, and
performs calculation processing of the calibration gain of the next index.
03-05-2019
21
[0082]
Here, since the calibration gain calculation unit 15 does not change the calibration gain after
calculating the calibration gain once, continuously updating the calibration gain regularly is a
waste of the operation amount, so the calibration gain is updated from the middle You may stop
it.
That is, in an environment in which the acoustic signal processing apparatus 10 having the
microphones m_1 and m_2 is used, there is no need to regularly update the calibration gains
after acquiring the calibration gains for the microphones m_1 and m_2 in the initial stage. It may
be performed when necessary to calculate the calibration gain.
[0083]
Then, the first calibration gain multiplication unit 16 multiplies the signal s1 (n) by the
calibration gain CALIB_GAIN_1CH to output a calibrated signal y1 (n), and the second calibration
gain multiplication unit 17 outputs the signal s2 (n) as the signal s2 (n). The calibration gain
CALIB_GAIN_2CH is multiplied, and a signal y2 (n) after calibration is output.
[0084]
(A-3) Effects of the Embodiment As described above, according to this embodiment, when there is
an interference sound arriving from a direction other than the front direction, the correlation
coefficient between the front suppression signal and the COH is negative. By using the
characteristic behavior that the correlation coefficient between the frontal suppression signal and
the COH is positive when there is no disturbing sound, the threshold setting for the designer is
not affected by the disturbing voice. An easy microphone sensitivity calibration method can be
realized.
[0085]
As a result, by applying the process of calculating the calibration gains for the microphones m_1
and m_2 to the preprocessing of various signal processing methods using the microphone array,
it is possible to expect the improvement of the voice processing performance thereafter.
[0086]
(B) Other Embodiments In the above-described embodiment, various modified embodiments are
mentioned, but the present invention can be applied to the following modified embodiments.
03-05-2019
22
[0087]
(B-1) In the above-described embodiment, the correlation calculation unit exemplifies the case
where the correlation coefficient is calculated as the feature amount of the front suppression
signal and the coherence. However, the covariance is calculated as the feature amount of the
front suppression signal and the coherence. The same effect as the above-described embodiment
can be obtained by calculating the value of.
[0088]
(B-2) In the embodiment described above, the acoustic signal processing device according to the
present invention is applied to various devices as long as it is a device having a voice processing
function (for example, voice recognition processing etc.) provided with a plurality of
microphones. For example, the present invention can be widely applied to smartphones, tablet
terminals, video conference terminals, car navigation systems, call center terminals, robots,
devices using sound signals as sensor signals, and the like.
[0089]
Also, for example, the acoustic signal processing apparatus of the present invention may be
mounted on an apparatus having a communication function, and the apparatus may transmit a
signal after calibration to a server having a predetermined audio processing function through a
network.
[0090]
Furthermore, for example, an apparatus having a communication function including a plurality of
microphones may transmit input signals of the respective microphones to a server on which the
acoustic signal processing device of the present invention is mounted through a network.
In this case, the server equipped with the acoustic signal processing apparatus can calculate the
calibration gain for each input signal according to the correlation coefficient between the front
suppression signal and the coherence, as in the above-described embodiment.
[0091]
(B-3) Although the case where there are two microphones is illustrated in the embodiment
03-05-2019
23
described above, the present invention can be applied to a device that acquires an input signal
from each of three or more microphones.
[0092]
Reference Signs List 10 acoustic signal processing device m_1 and m_2 microphone 11 FTT unit
12 front suppression signal generation unit 13 coherence calculation unit 14 correlation
calculation unit 31 front suppression signal coherence acquisition unit 32 ... Correlation
coefficient calculation unit, 33 ... Correlation coefficient output unit, 15 ... Calibration gain
calculation unit, 41 ... Correlation coefficient and input signal acquisition unit, 42 ... Calibration
gain calculation execution judgment unit, 43 ... Calibration gain calculation unit, 44 ... Calibration
gain storage unit, 45 ... Calibration gain output unit, 16 ... First calibration gain multiplication
unit, 17 ... Second calibration gain multiplication unit.
03-05-2019
24
Документ
Категория
Без категории
Просмотров
2
Размер файла
36 Кб
Теги
jp2018032931
1/--страниц
Пожаловаться на содержимое документа