close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2002062895

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2002062895
[0001]
TECHNICAL FIELD The present invention filters signals received by a plurality of microphones in
voice recognition, hands-free telephones, television cameras, teleconferences, remote lectures,
abnormal sound monitoring, etc. The present invention relates to a method and apparatus for
reducing noise and frequency degradation and outputting a sound emitted from a target sound
source with high quality by outputting the sound.
[0002]
2. Description of the Related Art First, the meaning of high-quality sound collection will be
described.
[0003]
In the signal received by the microphone, in addition to the sound generated from the target
sound source (the target sound), the air conditioning sound, the fan sound of the electric
equipment, the electric noise generated by the microphone amplifier, the signal cable, etc. And
other noises.
In addition, frequency degradation occurs in the target sound component in the process of sound
collection. As the frequency degradation of the target sound component is smaller, the picked up
sound has a waveform closer to the target sound, so the smaller the frequency degradation of the
10-04-2019
1
target sound component, the higher the quality. Therefore, high-quality sound collection means
high sound-to-noise ratio (power ratio of target signal to noise) and sound collection with small
frequency degradation of the target sound component.
[0004]
Next, an adaptive array using a single virtual target sound source will be described.
[0005]
The adaptive array is a method in which signals picked up by a plurality of microphones
(microphone arrays) are respectively filtered, added and output, and it is determined according to
noise characteristics such as noise intensity, position and frequency. By adaptively updating the
filter coefficients, noise can be suppressed and the target sound can be picked up with high
quality.
[0006]
In a conventional adaptive array using a single virtual target sound source, a virtual sound
virtually synthesized from noise actually picked up and sound coming to a microphone from a
single virtual target sound source position set in advance. By updating the filter coefficient so
that the sensitivity of the microphone array to noise is low and the sensitivity of the array to
virtual target sound source position is high using the target signal, the sound of the sound source
present at the virtual target sound source position is It is possible to pick up on quality.
[0007]
However, the actual target sound source is expected to be at a position shifted from the virtual
target sound source position or to move.
For example, if the target sound is a person, it will always move and will not talk at the same
position every time.
As described above, when the actual target sound source deviates from the virtual target sound
source position, the prior art can not correct the deviation of the virtual target sound source
position with respect to the actual target sound source. This may make it difficult to hear sounds
or make speech recognition and abnormal sound detection difficult.
10-04-2019
2
[0008]
Next, the prior art will be described in detail.
[0009]
FIG. 12 is a diagram showing a conventional sound collection device CS11.
[0010]
The conventional sound collection apparatus CS11 comprises microphones 111 to 11M, adders
121 to 12M, 14A, 14B, 15 (the + symbol represents addition, and the − symbol represents
subtraction), and the second variable filter 13A1. 13AM, first variable filters 13B1 to 13BM,
adaptive algorithm unit 16, signal generator 17C, delay unit 19C, virtual sound source position
setting unit 26C, space characteristic estimation unit 27C, and space characteristic filters 18C1
to 18C. 18 CM and an adaptation period detection unit 20.
[0011]
Next, the symbols of the formulas used below will be defined.
[0012]
Let n be the time discretized by the sampling period, let xi (n) be the signal picked up by the ith
microphone 11i at time n, and extract L samples (samples required by the filter) to obtain a
matrix Let x (n) = [xi (n), xi (n-1), ..., xi (n-L + 1), x2 (n), ..., xM (n-L + 1)] T. .
[0013]
The output signal of the signal generator 17C is v '(n), the spatial characteristic filter for the i-th
microphone 11i is represented by g'i (n), and the spatial characteristic filter output is u'i (n) Let
gg'i (n) * v '(n), and extract L samples and represent them in a matrix as u' (n) = [ui (n), ui (n-1), ...,
ui (N−L + 1), u2 (n),..., UM (n−L + 1)] T.
[0014]
Where * is a convolution operation.
10-04-2019
3
The second variable filters 13A1 to 13AM and the first variable filters 13B1 to 13BM are L-tap
FIR filters (filters that multiply each data by a constant and add them together), and the filter
coefficient h '(n) is h '(N) = [hi (n), hi (n-1),..., Hi (n-L + 1), h2 (n),..., HM (n-L + 1)] T ”.
[0015]
Where hi (n-p + 1) represents the filter coefficient of the p th tap of the filter for the ith
microphone at time n, and the second variable filter and the first variable filter have the same
filter coefficient Is used.
Further, the output of the adder 14A is represented by y '(n), the output of the adder 14B is
represented by y (n), the output (error) of the adder 15 is represented by e (n), The delay amount
at 19 C is represented by τ ′ 0.
[0016]
Next, the convergence solution and the correction equation of the filter in the above-mentioned
conventional example are derived.
[0017]
First, the root mean square of the output (error) e (n) of the adder 15 is determined.
The smaller the mean square error, the smaller the noise power at the adder 14A output, and the
smaller the frequency degradation of the virtual target sound at the adder 14A output. Therefore,
the filter that minimizes this mean square error is the optimum filter. I assume.
[0018]
However, overline means time averaging.
[0019]
Assuming that the noise and the virtual target sound are uncorrelated, the above equation (1) can
10-04-2019
4
be transformed into the following equation (2).
[0020]
Assuming that the first variable filter h (n) is an L-tap FIR filter (a filter that multiplies each data
by a constant and adds these) and vector expression of equation (2), Equation (3) of
[0021]
Where the virtual target signal v '(n) is the average power
[0022]
It is assumed that the signal is a stationary signal of
[0023]
It is the following.
[0024]
Since the filter which minimizes the above equation (3) is the optimum filter, the equation (3) is
partially differentiated with h (n), and it is set to 0 to find the minimum point.
[0025]
## EQU6 ## If equation (4) is solved for h (n), an optimal filter h (opt, n) which minimizes
equation (3) can be obtained.
[0026]
As a method of obtaining the optimum filter of the above equation (5), there are adaptive
algorithms such as LMS algorithm, NLMS algorithm, projection algorithm and the like.
This time, the correction formula is shown taking the NLMS method as an example.
[0027]
10-04-2019
5
The correction equation is expressed by the following equation (6).
[0028]
H (n + 1) = h (n) + 2α [{x ′ ′ (n) e (n)} / {x ′ ′ (n) x′′T (n)}] ...... Formula (6) where , X ′ ′
(n) is expressed by the following equation (7).
[0029]
X ′ ′ (n) = u ′ (n) + x (n) Equation (7) where α is an update coefficient and is a constant
greater than 0 and not more than 1.
[0030]
Above, it has been shown that the optimum filter of equation (5) can be determined using the
correction equation of equation (6).
[0031]
Next, the signal generator 17C will be described.
[0032]
The signal generator 17C is used to update the filter including the condition of maintaining the
sensitivity to the virtual target sound source position.
Therefore, in order to maintain sensitivity in all frequency bands, the signals output from the
signal generators 171 to 17 J need to include all frequency components.
In addition, the sequential correction algorithm has the property that the convergence speed is
high with respect to a white signal (a signal that uniformly includes frequency components).
For these reasons, a signal generator that generates white noise is usually used.
[0033]
10-04-2019
6
The adaptation period detection unit 20 has a function of stopping the adaptation operation
when there is an actual target sound.
That is, if the adaptive operation is performed when there is an actual target sound, the filter is
updated so as to reduce the sensitivity to the actual target sound, so it is necessary to stop the
filter update in this case.
The adaptation period detection unit 20 detects the presence of an actual target sound by
monitoring the power of the signal collected by the microphone, and stops the adaptation
operation.
[0034]
As described above, in the conventional sound collecting apparatus CS11, the sound that arrives
at the microphone from the noise actually collected and the single virtual target sound source
position set in advance. And the virtual target signal virtually synthesized, the filter coefficient is
updated so that the sensitivity of the microphone array to noise is low and the sensitivity of the
microphone array to the virtual target sound source position is high, and the target sound is high.
Try to pick up on quality.
[0035]
However, the position where the sensitivity of the microphone array is high is only the virtual
target sound source position, not the actual target sound source position.
There is no problem if the actual target sound source position and the virtual target sound source
position completely match, but if the actual target sound source position deviates from the
virtual target sound source position, the frequency characteristics of the target sound deteriorate
Will occur.
[0036]
In particular, with respect to a high frequency component (several kHz) having a short
wavelength, the characteristics for the target sound may be significantly deteriorated if the
10-04-2019
7
deterioration is severe and the deviation by several Cm.
[0037]
In the above-mentioned prior art, the position where high-quality sound can be collected is
limited to the virtual target sound source position, so when the moving sound source (person or
the like) or the sound source position can not be accurately known (when monitoring abnormal
sound) It is difficult to use for
[0038]
In the conventional adaptive array using a single virtual target sound source, if there is a gap
between the virtual target sound source position and the actual sound source position, the target
sound component is degraded in the frequency characteristics and the moving sound source And
when the position is not accurately known (in the case of monitoring an abnormal sound), there
is a problem that it is difficult to use.
[0039]
According to the present invention, in the adaptive array, the degradation of the frequency
characteristic of the target sound component which occurs when the target sound source moves
or when the target sound source position is not accurately known is improved to realize highquality sound collection. It is an object of the present invention to provide a sound collection
method and apparatus.
[0040]
SUMMARY OF THE INVENTION According to the present invention, there is provided a first
variable filtering means for filtering a collected sound signal collected by a plurality of arbitrarily
arranged sound collection means with different filter coefficients. In the sound pickup apparatus
having the addition means for adding the output signals of the first variable filtering means and
outputting the addition output, instead of setting the virtual target sound source position as a
point, within a predetermined sound collection range A plurality of virtual target sound source
positions are set, and a constraint condition for maintaining the sensitivity within the range is
realized.
[0041]
BEST MODE FOR CARRYING OUT THE INVENTION FIG. 1 is a block diagram showing a sound
collection apparatus CS1 according to a first embodiment of the present invention.
10-04-2019
8
[0042]
The sound pickup device CS1 comprises microphones 111 to 11M, first variable filters 13B1 to
13BM, second variable filters 13A1 to 13AM, spatial characteristic filters 181, 1 to 18J, and M,
and a signal generator. 171 to 17 J, delay units 191 to 19 J, sound collection range setting unit
30, virtual target sound source position setting unit 26, space characteristic estimation unit 27,
adaptation period detection unit 20, adaptation algorithm unit 16, adder 121 to 12 M, 14 A, 14
B, 15, 21 to 21 M, and 22.
[0043]
The sound collection device CS1 is a device for suppressing noise and collecting the target sound
with high quality, and collecting the sound of the sound source within the preset sound collection
range and being outside the sound collection range It is a device that suppresses the sound of the
sound source.
[0044]
The signals collected by the microphones 111 to 11M are respectively filtered by the first
variable filters 13B1 to 13BM, and then added by the adder 14B and output.
[0045]
The first variable filters 13B1 to 13BM have high sensitivity with respect to the sound collection
range set by the sound collection range setting unit 30, and lower sensitivity to noise source
positions outside the sound collection range. In addition, they are learned as described below.
The output of the adder 14B is a high quality sound with a high target sound-to-noise ratio (SN
ratio).
[0046]
The point where the sound collection device CS1 differs from the conventional example is that
the virtual target sound source position is given as a sound collection range, and by doing this,
the target sound source moves within the sound collection range, or Even when the target sound
source position is not accurately known, the target sound component can be stably collected
without causing a large frequency deterioration.
10-04-2019
9
[0047]
Next, the learning method of the first variable filters 13B1 to 13BM in the sound collection
device CS1 will be specifically described.
[0048]
The above "learning" is performed using noise actually collected, a virtual collected signal
synthesized using a virtual target sound source prepared in advance, and a second variable filter.
That is, when observing the actual target sound source, it is always observed as a signal mixed
with noise, and since the target sound and noise can not be distinguished, a virtual target sound
source not mixed with noise is used.
[0049]
First, an operation of synthesizing a virtual collected signal using a virtual target sound source
will be described.
[0050]
The sound collection range setting unit 30 sets a sound collection range (a movement range of a
sound source, a range of a sound source position measurement error, and the like), and the
virtual target sound source position setting unit 26 virtually uniformly within a set range. Set the
target sound source position.
For example, fill the setting range at 5 cm intervals.
The spacing between virtual target sound source positions needs to be sufficiently narrow.
That is, when two microphones farthest from the sound source at a certain virtual target sound
source pick up sound, the time when the first microphone picked up and the time when the
second microphone picked up And the time when the first microphone picks up when the sound
10-04-2019
10
source moves to the adjacent virtual target sound source, and the second microphone picks up
the sound. Assuming that the difference between the time points is a second relative delay time,
the variation in relative delay time (the time difference between the first relative delay time and
the second relative delay time) is the maximum frequency of the collected signal. The interval of
virtual target sound source positions is set to be smaller than the period.
[0051]
The space characteristic estimation unit 27 estimates space characteristics including a delay time
and an attenuation amount until the sound reaches the microphone position from the set virtual
target sound source position, and the space characteristic filters 181, 1 to 18 J, M Set the
coefficient.
[0052]
The mutually uncorrelated and steady signals generated by the signal generators 171 to 17J are
filtered by the spatial characteristic filters 181, 1 to 18J, M, and added by the adders 211 to 21M
for each microphone.
[0053]
Also, the space estimation characteristic estimation unit 27 associates the virtual target sound
source position with the transfer function from that position to each microphone and stores it in
advance, and calls the transfer function based on the virtual target sound source position. .
[0054]
As described above, the spatial characteristic filters 181, 1 to 18 J, M filter virtually each other's
uncorrelated and steady signals generated by the signal generators 171 to 17 J to virtually
combine the collected sound signals. can do.
[0055]
Next, the adders 121 to 12M add the virtually combined picked up signal and the actually picked
up noise signal, and the addition result is filtered by the second variable filters 13A1 to 13AM.
After that, the adder 14A adds.
The output of the adder 14A is an output of a virtually synthesized sound collection signal.
10-04-2019
11
[0056]
If the noise component of the output of the virtually synthesized sound pickup signal is small and
the deterioration of the virtual target sound component is small, it means that sound can be
picked up with high quality, and the adder 15 as the subtraction means The original sound of the
virtual target sound (the output signal of the fourth addition means 14A) is subtracted from the
output signal of the adder 22 as the fifth addition means, and the output of the adder 15 is used
as an error signal, The variable filters 13A1 to 13AM are updated.
[0057]
However, in order to allow for the delay from the input to the output and enable efficient
learning of the second variable filter (learning filter), the delay units 191 to 19 J delay the
original sound of the virtual target sound. After addition, the signal added by the adder 22 is
used for subtraction by the adder 15.
[0058]
The adaptive algorithm section 16 minimizes the mean square error of the error signal based on
the error signal output from the adder 15 and the input signal (learning signal) to the second
variable filters 13A1 to 13AM. Thus, the update vector of the second variable filter is determined.
[0059]
The same filter coefficients as those of the second variable filters 13A1 to 13AM are set in the
first variable filters 13B1 to 13BM, and the sound of the target sound source within the set
sound collection range is collected to suppress the noise. .
[0060]
On the other hand, when the actual target sound is included in the collected sound signals of the
microphones 111 to 11 M, learning is performed so as to lower the sensitivity to the actual
target sound source, so the actual target sound exists. In this case, it is necessary to stop
updating the filter.
The adaptive period detection unit 20 detects the actual presence of the target sound by
10-04-2019
12
monitoring the power of the signals collected by the microphones 111 to 11M, and detects the
first variable filters 13B1 to 13BM and the second variable filter 13A1. Stop the adaptive
operation by ~ 13 AM.
[0061]
Next, the adaptive algorithm section 16 will be described in detail.
[0062]
As an adaptive algorithm, there are an LMS algorithm, an NLMS algorithm, a projection
algorithm, and the like.
In the present specification, taking the NLMS method as an example, the convergence solution of
the filter and the correction equation are derived below.
[0063]
First, symbols used in mathematical expressions will be described.
[0064]
Let n be the time discretized by the sampling period, M be the number of microphones, J be the
number of virtual target sound sources, and xi (n) be the signal collected by the i-th microphone
11i at time n The sample representation is represented by a matrix as x (n) = [xi (n), xi (n-1), ..., xi
(n-L + 1), x2 (n), ..., xM (n- Let L + 1)] T.
[0065]
The output signal of the j-th signal generator 17 j is v j (n), the spatial characteristic filter for the
j-th signal generator 17 j and the i-th microphone 11 i is gi & j (n), and the spatial characteristic
filter output Let ui, j (n) = gi, j (n) * vj (n), and take L samples (samples required by the filter) and
represent them in a matrix as uj (n) = [ui , j (n), ui, j (n-1), ..., ui, j (n-L + 1), u2, j (n), ..., uM, j (n-L +
1)] T.
10-04-2019
13
However, * represents a convolution operation.
[0066]
The second variable filters 13A1 to 13AM and the first variable filters 13B1 to 13BM are L-tap
FIR filters, and the filter coefficients are h (n) = [hi (n), hi (n-1). ,..., Hi (n-L + 1), h2 (n),..., HM (n-L
+ 1)] T.
However, hi (n−p−1) represents the filter coefficient of the p th tap of the filter for the ith
microphone at time n, and for the second variable filters 13A1 to 13AM and the first variable
filters 13B1 to 13BM, The same filter coefficients are used.
[0067]
The output of the adder 14A is y '(n), the output of the adder 14B is y (n), and the output (error)
of the adder 15 is e (n). Let the delay amount at 19 J be τ 0 (usually, τ 0 is half the tap length
of the second variable filter), and τ 0 be all equal.
[0068]
First, the root mean square of the output (error) e (n) of the adder 15 is determined.
A filter that minimizes this root mean square error is the optimal filter.
[0069]
Where, in equation (8), overline means time averaging.
Since the virtual target signal vj (n) is uncorrelated with each other and the virtual target signal
and the noise are uncorrelated, the equation (8) is transformed into the following equation (9).
10-04-2019
14
[0070]
Assuming that the first variable filter h (n) is an L-tap FIR filter (a filter that multiplies each data
by a constant and adds them), and the equation (9) can be expressed as a vector, It becomes like
a following formula (10).
[0071]
Where virtual target signal V j (n) is the average power
[0072]
Assuming that the signal is a stationary signal of
[0073]
Suppose that the following equation is satisfied.
[0074]
Since the filter that minimizes equation (10) is the optimal filter, equation (10) is partially
differentiated with h (n), and is set to 0 to find the minimum point.
[0075]
## EQU13 ## If the above equation (11) is solved for h (n), the optimum filter h (opt, n) which
minimizes the above equation (10) can be obtained.
[0076]
As a method of obtaining the optimum filter of the above equation (12), there are adaptive
algorithms such as LMS algorithm, NLMS algorithm, projection algorithm and the like.
[0077]
In the present specification, the NLMS algorithm will be described as an example, and the
correction equation is expressed by the following equation (13).
[0078]
H (n + 1) = h (n) + 2α [{x '(n) e (n)} / {x' (n) x'T (n)}] ...... Formula (13) where x ' (N) is expressed
by the following equation (14).
10-04-2019
15
[0079]
In the above description, it has been shown that the optimum filter of equation (12) can be
determined using the correction equation of equation (13).
[0080]
The sound collection device CS1 can be used as a sound collection device for voice recognition,
hands-free telephone, television camera, teleconference, remote lecture, abnormal sound
monitoring, etc. By setting multiple sound source positions, constraint conditions that maintain
the sensitivity within that range can be realized, and the target sound source within the sound
collection range can be collected with degradation of low frequency characteristics, and noise
outside the range can be suppressed. can do.
Also, even if the target sound source moves within the range, there is no need for filter
correction, and there is no performance degradation due to the sound source movement.
[0081]
As described above, the above-described embodiment has an excellent feature which is not found
in the conventional example that high-quality sound can be collected even when the target sound
source moves or when the target sound source position can not be accurately known.
[0082]
In other words, the first variable filtering means for filtering the sound collection signals
collected by the plurality of arbitrarily arranged sound collection means with different filter
coefficients, and each of the above first In the sound pickup apparatus having a first addition
means 14B for adding the output signals of the variable filtering means of the above and
outputting an addition output, a sound collection range setting means 30 for setting a
predetermined sound collection range, and the above sound collection range Each virtual target
sound source position setting means 26 for setting a plurality of virtual target sound source
positions, each virtual target sound source position based on each virtual target sound source
position and each A space characteristic estimation means 27 for estimating space characteristics
including a delay time and an attenuation amount until the sound reaches the position of the
sound means, a stationary pseudo target signal which has no correlation with each other, Pseudo
target signal generating means 17 for generating the same number as the number of positions;
10-04-2019
16
spatial property filtering means 18 for filtering each of the pseudo target signals by using each
space property estimated by the space property estimating means as a filter coefficient; A second
addition means 21 for synthesizing a pseudo target sound pickup signal by adding each output
signal of each spatial characteristic filtering means to each of the sound pickup means, and each
pseudo target sound pickup Third addition means 12 for combining learning signals by
respectively adding a signal and each of the above collected sound signals, and second variable
filtering means 13 for filtering the combined learning signals with different filter coefficients. A
fourth addition means 14 for adding the output signals of the second variable filtering means to
each other; The fourth from the delay means 19 for delaying the similar-purpose signal, the fifth
addition means 22 for adding the respective delay output signals from the delay means 19, and
the output signal of the fifth addition means 22 By subtracting the output signal of the adding
means 14, a subtracting means 15 for obtaining an error signal, and a period in which no sound
source is present in the sound collecting range is detected based on the sound collecting signal,
and this detected period is An adaptive period detection unit 20 which detects the period as an
adaptation period, and a period in which no sound source is present in the sound collection
range detected by the adaptive period detection unit so that the mean square value of the error
signal becomes minimum. It is a sound collection device having an adaptive algorithm means 16
for updating the second variable filter coefficient and the first variable filter coefficient.
[0083]
FIG. 2 is a view for explaining the features of the above embodiment in comparison with the prior
art.
[0084]
The conventional example is an apparatus using a single virtual target sound source, while the
above embodiment has a plurality of virtual target signal sources in an apparatus (such as
AMNOR) using a single virtual target sound source. As shown in FIG. 2, by setting a plurality of
virtual target signal sources uncorrelated with each other in a predetermined range, a constraint
condition for maintaining the sensitivity in the range is realized.
[0085]
FIG. 3 is a diagram for explaining the construction of the above embodiment in comparison with
the construction of the prior art.
[0086]
FIG. 3 (1) is a diagram showing the basic configuration of a conventional example (apparatus
using a single virtual object sound source such as AMNOR), while FIG. 3 (2) shows the basic
10-04-2019
17
configuration of the above embodiment. FIG.
[0087]
In the case of AMNOR or the like, learning is performed so as to maintain the sensitivity at the
position of one point. Therefore, when the speaker deviates from the set position, the target
sound is deteriorated in frequency characteristics.
On the other hand, in the above embodiment, a plurality of signal generators that generate
mutually uncorrelated signals are provided, thereby simulating a situation in which a plurality of
virtual target sound sources are present, and limiting conditions for maintaining sensitivity
within the set range. To achieve.
By doing this, the signal of the sound source present in the set range can be picked up without
significant deterioration of the frequency characteristics, and out-of-range noise can be
suppressed.
In addition, even if the sound source moves within the range, there is no need for filter correction
and there is no performance degradation due to the sound source movement.
[0088]
FIG. 4 is a block diagram showing a sound pickup apparatus CS2 according to a second
embodiment of the present invention.
[0089]
In the sound collection device CS1, the sound collection device CS2 replaces the first variable
filters 13B1 to 13BM with semi-fixed filters (filters whose filter coefficients can be rewritten
while retaining the filter coefficients) 231 to 23M, A sound pickup signal storage unit 25 is
provided between the microphones 111 to 11 M and the adders 211 to 21 M, and a filter
coefficient storage unit 24 is provided between the adaptive algorithm unit 16 and the semifixed
filters 231 to 23 M. The point where the detection unit 20 is removed is a point different from
the sound collection device CS1.
10-04-2019
18
[0090]
First, in the sound pickup device CS2, only the noise is stored in the sound pickup signal storage
unit 25 before the target sound is picked up, and then the sound pickup signal stored in the
sound pickup signal storage unit 25 is As in the case of the sound collection device CS1, the
second variable filters 13A1 to 13AM are updated, and learning is performed until the second
variable filters 13A1 to 13AM sufficiently converge.
[0091]
At this time, as described above, since the target sound is not included in the collected sound
collection signal, it is not necessary to stop the adaptation operation, and it is not necessary to
provide the adaptation period detection unit 20.
[0092]
The filter coefficient same as the filter coefficient in the second variable filter 13A1 to 13AM
which has been sufficiently learned is transferred from the adaptive algorithm unit 16 to the
filter coefficient storage unit 24, and the filter coefficient storage unit 24 transmits the abovementioned transferred Store filter coefficients.
The filter coefficient storage unit 24 sets the filter coefficients in the semifixed filters 231 to
23M, and uses the semifixed filters 231 to 23M in a fixed state at the time of target sound
collection.
[0093]
By doing this, the microphones 111 to 11 M, the semifixed filters 231 to 23 M, and the adder 14
B can be used separately from other parts, and portability and space saving can be achieved. Has
the advantage of being superior to
[0094]
In addition, when the process of learning the filter is executed, since it is not necessary to
calculate in real time, it can be configured with a small amount of hardware, and even with a
general purpose computer such as a personal computer, it is for the process of learning the filter.
It is possible to calculate
10-04-2019
19
However, in the sound collection device CS2, since the filter coefficients of the semi-fixed filters
231 to 23M are fixed, there is a disadvantage that the movement of the noise source can not be
followed.
[0095]
The other configuration of the sound collection device CS2 is the same as that of the sound
collection device CS1, so the description will be omitted.
[0096]
The collected signal storage unit 25 is an example of the collected signal storage unit which is
provided between each of the collection means 11 and each of the third addition means 12 and
stores each of the collected signals. .
The filter coefficient storage unit 24 is an example of filter coefficient storage unit which is
provided between the adaptive algorithm unit 16 and each first variable filtering unit 13 and
stores the first variable filter coefficient.
[0097]
FIG. 5 is a block diagram showing a sound pickup apparatus CS3 according to a third
embodiment of the present invention.
[0098]
The sound pickup device CS3 replaces the space characteristic filters 181, 1 to 18J, M with delay
units 281, 1 to 28J, M in the sound pickup device CS1 or the sound pickup device CS2, and the
space characteristic estimation unit 27 is The apparatus is realized by the distance calculation
unit 271 and the relative delay amount calculation unit 272 between microphones.
[0099]
The components other than these are the same as the components in the sound collection device
CS1 or the sound collection device CS2, so they are omitted in FIG.
10-04-2019
20
[0100]
The distance calculation unit 271 calculates the distance between the virtual target sound source
position and the microphone position, and the inter-microphone relative delay amount
calculation unit 272 divides the distance output by the distance calculation unit 271 by the
speed of sound. Then, the delay time is determined, the minimum value of the delay time is
subtracted from each delay time, the inter-microphone relative delay amount is determined, and
the delay units 281, 1 to 28J, M are set.
[0101]
The sound pickup device CS3 has an advantage that the amount of calculation can be reduced
and the hardware can be configured with a small amount of hardware by replacing the space
characteristic with only the delay.
[0102]
The other configuration of the sound collection device CS3 is the same as that of the sound
collection device CS1 or the sound collection device CS2, and thus the description thereof is
omitted.
[0103]
That is, the sound pickup device CS3 comprises: first variable filtering means for filtering the
sound collection signals collected by a plurality of arbitrarily arranged sound collection means
with different filter coefficients; In the sound pickup apparatus having a first addition means 14B
for adding the output signals of the variable filtering means of the above and outputting an
addition output, a sound collection range setting means 30 for setting a predetermined sound
collection range; Each virtual target sound source position setting means 26 for setting a
plurality of virtual target sound source positions, each virtual target sound source position based
on each virtual target sound source position and each It is a space characteristic estimation
means for estimating a space characteristic including a delay time and an attenuation amount
until the sound reaches the position of the sound means, and the position of each of the sound
collection means 11 from the position of each virtual target sound source Calculating means 271
for calculating the distance between the sound collecting means and the relative delay amount
calculating means 272 for calculating the relative delay amount between the sound collecting
means 11 from the distance and the speed of sound calculated by the distance calculating means
271 , Pseudo-target signal generating means 17 for generating stationary pseudo-target signals
having no correlation with each other as many as the number of virtual target sound source
positions, and pseudo-targets output from the signal generating means 17 The plurality of first
delay means 28 for delaying the signal by the relative delay amount calculated by the relative
10-04-2019
21
delay amount calculating means 272, and the output signals of the respective delay means, for
each of the respective sound collecting means The second addition means 21 that synthesizes
the pseudo-target sound pickup signal by adding each, and the learning signal by respectively
adding the pseudo-target sound pickup signal and the sound pickup signals. A third adding
means 12 for combining, a second variable filtering means 13 for filtering the combined learning
signal with different filter coefficients, and an output signal of each of the second variable
filtering means are mutually added No. 4 adding means 14, second delay means 19 for delaying
the pseudo target signals, and fifth adding means 22 for adding the delay output signals from the
second delay means 19; The subtraction unit 15 obtains an error signal by subtracting the output
signal of the fourth addition unit 14 from the output signal of the fifth addition unit 22 and the
above-mentioned sound collection range based on the collection signal The adaptive period
detection unit 20 detects a period in which no sound source is present and detects the detected
period as a period to be adapted, and the adaptive period detection unit detects the period.
Adaptive algorithm means 16 for updating the second variable filter coefficient and the first
variable filter coefficient such that the mean square value of the error signal is minimized during
a period in which no sound source exists within the sound collection range Is an example of a
sound collection device having
[0104]
FIG. 6 is a diagram showing the configuration of a sound collection device CS4 according to a
fourth embodiment of the present invention.
[0105]
In the sound collection device CS1 or the sound collection device CS2, the sound collection device
CS4 has the spatial characteristic filters 181, 1 to 18 J, M, delay devices 281, 1 to 28 J, M and
gains (amplifiers) 291, 1 to 1 The space characteristic estimation unit 27 is realized by the
distance calculation unit 271, the relative delay amount calculation unit 272 between
microphones, and the inter-microphone relative attenuation amount calculation unit 273 instead
of 29J and 29M.
[0106]
The components other than these are the same as the components in the sound collection device
CS1 or the sound collection device CS2, so they are omitted in FIG.
[0107]
10-04-2019
22
The distance calculation unit 271 calculates the distance between the virtual target sound source
position and the microphone position.
The inter-microphone relative delay amount calculation unit 272 divides the distance output by
the distance calculation unit 271 by the speed of sound to obtain the delay time, subtracts the
minimum value of the delay time from each delay time, and calculates the relative distance
between the microphones. Then, the delay units 281, 1 to 28J, and M are set.
[0108]
The inter-microphone relative attenuation amount calculation unit 272 obtains the reciprocal of
the distance output from the distance calculation unit 271, obtains the attenuation amount,
subtracts the attenuation amount of the microphones serving as a reference from each
attenuation amount, and calculates the inter-microphone relative attenuation The quantity is
determined and set in the delay units 281, 1 to 28 J, M.
[0109]
The sound collection device CS4 is a device that replaces the above space characteristic with only
delay and attenuation, thereby reducing the amount of calculation and making it possible to
configure with less hardware.
[0110]
Also, although the sound collection device CS4 has more computational complexity than the
sound collection device CS3, in the case of the arrangement of microphones assuming a spherical
wave model (with respect to the distance between the microphone and the sound source, Even
when the size of the microphone array is long, the spatial characteristics can be well
approximated and good results can be obtained.
[0111]
The other configuration of the sound collection device CS4 is the same as that of the sound
collection device CS1 or the sound collection device CS2, so the description will be omitted.
[0112]
That is, the sound pickup device CS4 comprises: first variable filtering means for filtering the
sound collection signals collected by a plurality of arbitrarily arranged sound collection means
with different filter coefficients; In the sound pickup apparatus having a first addition means 14B
10-04-2019
23
for adding the output signals of the variable filtering means of the above and outputting an
addition output, a sound collection range setting means 30 for setting a predetermined sound
collection range, and the above sound collection range Each virtual target sound source position
setting means 26 for setting a plurality of virtual target sound source positions, each virtual
target sound source position based on each virtual target sound source position and each A space
characteristic estimation means 27 for estimating space characteristics including a delay time
and an attenuation amount until the sound reaches the position of the sound means, and the
positions of the sound collection means from the virtual target sound source positions
Calculating means 271 for calculating the distance between the sound collecting means and the
means for calculating the relative delay amount between the sound collecting means 272 from
the distance and the speed of sound calculated by the distance calculating means 271; Space
characteristic estimation means 27 including relative sound collection means 273 for calculating
relative attenuation between sound collection means from the distance calculated by the distance
calculation means 272, and stationary pseudo without correlation with each other The pseudotarget signal generating means 17 for generating the target signal by the same number as the
number of virtual target sound source positions, and the relative delay amount calculating means
272 for calculating the pseudo-target signal output from the signal generating means The interpickup means relative attenuation amount calculating means 273 finds pseudo target signals
outputted by the plurality of first delay means 28 for delaying by the delay amount and the
plurality of delay means 28 respectively. A second addition for combining the pseudo-target
sound pickup signal by adding each of the plurality of gain means 29 for attenuating by the
relative attenuation amount and the output signals of the gain means for each of the sound
collection means. Means 21, a third addition means 12 for combining learning signals by
respectively adding each of the above-described pseudo-target sound collection signals and each
of the above collection signals, and different filters for the combined learning signals. Second
variable filtering means 13 for filtering by coefficients, fourth addition means 14 for adding the
output signals of the second variable filtering means to each other, and second delay means for
delaying the pseudo target signals. The fifth adding means 22 for adding the delay output signals
from the second delay means 19 and the output signals of the fifth adding means 22 The
subtractor 15 for obtaining an error signal by subtracting the output signal of the adder 14 of 4
and the period in which no sound source is present in the sound collection range is detected
based on the sound collection signal, and this detection is performed An adaptive period
detection unit 20 which detects a period as a period to be adapted, and a mean square value of
the error signal is minimized during a period in which no sound source is present in the sound
collection range detected by the adaptive period detection unit. An example of the sound
collection device is an adaptive algorithm means 16 for updating the second variable filter
coefficient and the first variable filter coefficient.
[0113]
FIG. 7 is a block diagram showing an adaptation period detection unit 20a which is a specific
10-04-2019
24
example of the adaptation period detection unit 20 in each of the above embodiments.
[0114]
The adaptation period detection unit 20a is composed of a short time average power collection
unit 201, a noise power setting unit 202, a threshold coefficient multiplication unit 205, and a
power comparison unit 203.
[0115]
The short-time average power calculation unit 201 obtains and outputs a short-time average
power of one channel or a plurality of channel averages among signals collected by the
microphones 111 to 11M.
The short time is, for example, 10 to 100 msec.
[0116]
The noise power setting unit 202 obtains a long time average of the noise power measured in
advance, and outputs the noise power (constant value).
The long time is, for example, 1 to 10 sec.
[0117]
The threshold coefficient multiplication unit 205 multiplies the output of the noise power setting
unit 202 by a threshold coefficient and sets it as a threshold.
The threshold coefficient is determined according to the magnitude of fluctuation of short-time
average power of noise, and for example, when the short-time average power of noise has 10%
fluctuation around long-time average, the threshold coefficient is 1 It is set to .1.
10-04-2019
25
[0118]
The power comparison unit 203 compares the output of the short time average power
calculation unit 201 with the threshold set by the threshold coefficient multiplication unit 205,
and when the short time average power exceeds the above threshold, the adaptive algorithm unit
An adaptive operation stop signal is output to 16.
[0119]
When the adaptation period detection unit 20a is configured as described above, the target
sound is detected focusing on the stationary property of the noise and the non-stationary
property of the target sound, and the target sound can be detected by a simple process. Have an
advantage.
[0120]
In other words, the adaptation period detection means 20a calculates the short time average
power calculation means 201 for calculating the short time average power of the collected signal,
and the noise power setting means 202 for setting the long time average power of noise
measured in advance. A threshold setting unit 205 that sets a value obtained by multiplying the
noise power by a threshold coefficient as a threshold, and a power comparison unit 203 that
detects the adaptation period by comparing the threshold and the short-term average power; It is
an example.
[0121]
FIG. 8 is a block diagram showing an adaptation period detection unit 20b which is another
specific example of the adaptation period detection unit 20 in each of the above embodiments.
[0122]
The adaptive period detection unit 20 b has a short time average power calculation unit 201, a
long time average power calculation unit 204, a threshold coefficient multiplication unit 205,
and a power comparison unit 203.
[0123]
The short-time average power calculation unit 201 obtains and outputs the average short-time
average power of one or more channels among the signals collected by the microphones 111 to
11M.
10-04-2019
26
[0124]
The long time average power calculation unit 204 obtains the long time average power of one
channel or plural channel average among signals collected by the microphones 111 to 11M.
[0125]
The threshold coefficient multiplication unit 205 multiplies the output of the long time average
power calculation unit 204 by the threshold coefficient and sets it as a threshold.
The threshold coefficient is determined according to the magnitude of fluctuation of short-time
average power of noise, and for example, when the short-time average power of noise has 10%
fluctuation around long-time average, the threshold coefficient is It is set to 1.1.
[0126]
The power comparison unit 203 compares the output of the short time average power
calculation unit 201 with the threshold set according to the threshold coefficient multiplication
unit 205, and when the short time average power exceeds the threshold, An adaptive operation
stop signal is output to the adaptive algorithm unit 16.
[0127]
When the adaptation period detection unit 20b is configured as described above, the target
sound is detected focusing on the fact that the non-stationary property of the target sound is
stronger than the non-stationary property of the noise, and the target sound is processed by a
simple process. Has the advantage of being able to detect
[0128]
The adaptation period detection unit 20b has a processing amount somewhat larger than that of
the adaptation period detection unit 20a, but has the advantage of being able to follow a gradual
change in noise power and not having to measure the noise level in advance. Have.
[0129]
That is, the adaptation period detection unit 20b calculates the short time average power
calculating means 201 for calculating the short time average power of the collected sound signal,
10-04-2019
27
and the long time average power calculation for calculating the long time average power of the
collected sound signal. Means 204, threshold coefficient multiplication means 205 which sets a
value obtained by multiplying the long time average power by a threshold coefficient as a
threshold, and the power comparison unit 203 which detects the adaptation period by comparing
the threshold and the short time average power. And an example of means including
[0130]
FIG. 9 is a block diagram showing an adaptation period detection unit 20c which is a specific
example of the adaptation period detection unit 20a in each of the above embodiments.
[0131]
The adaptation period detection unit 20c is a device in which the threshold coefficient
multiplication unit 205 is realized by the rising threshold coefficient multiplication unit 206, the
falling threshold coefficient coefficient multiplication unit 207, and the rising / falling switching
unit 208.
[0132]
The rising threshold coefficient multiplying unit 206 multiplies the value output from the noise
power setting unit 202 by the rising threshold coefficient, and sets the multiplication result as
the rising threshold.
[0133]
The falling threshold coefficient multiplying unit 207 multiplies the falling threshold coefficient
by the value output from the noise power setting unit 202, and sets the multiplication result as
the falling threshold.
[0134]
The rising threshold coefficient or the falling threshold coefficient is determined according to the
magnitude of fluctuation of the short-time average power of noise, and for example, the shorttime average power of noise is 10% around the long-time average. If there is a variation, the
rising threshold factor is set to 1.1 and the falling threshold factor is set to a value close to the
rising threshold factor.
[0135]
10-04-2019
28
The rising / falling switching unit 208 selects the falling threshold when the power comparison
unit 203 outputs the adaptive operation stop signal, and otherwise selects the rising threshold
and sets it as the threshold. Do.
Usually, the rising and falling of the target sound waveform is expected to be gentle.
For example, in the case of voice, the rising part is a consonant and the power is small, and the
falling is gentle.
For this reason, it is easy to cause error detection in the rising portion and the falling portion.
[0136]
Incidentally, instead of applying the value output from the noise power setting unit 202 to the
threshold coefficient multiplying unit 205 in the adaptation period detecting unit 20c, the value
output from the long-time average power calculating unit 204 is applied. It is also good.
[0137]
That is, the adaptation period detection unit 20c multiplies the output of the noise power setting
unit 202 or the long time average power calculation unit 204 by the rising threshold coefficient
multiplication unit 206, and the noise power setting unit 202 Alternatively, the rising threshold
coefficient multiplication output or the falling threshold coefficient multiplication output may be
selected depending on the falling threshold coefficient multiplying unit 207 which multiplies the
output of the long time average power calculating unit 204 by the falling threshold and the
power comparator output. It is an example of a means including a rising / falling switching
means 208 for selecting and setting the selected output as a threshold.
[0138]
The rising threshold and the falling threshold are set by the noise power setting means 202.
[0139]
FIG. 10 is a diagram for explaining that detection errors are likely to occur at rise and fall of
average power for a short time, and measures therefor.
10-04-2019
29
[0140]
FIG. 10A is a diagram showing a method of using only one threshold value, in which the
detection error occurs 3 in the rising and falling portions of the short-term average power.
This is because the power of the target sound component is slightly increased, and therefore, it is
easily influenced by a minute variation of the short-time average power of noise.
[0141]
By using the adaptive period detection unit 20 shown in FIG. 9, by setting two threshold values of
rising and falling, it becomes difficult to be affected by a minute variation of short-time average
power of noise, and more accurate. Target sound detection becomes possible.
[0142]
In FIG. 10 (2), it can be seen that the detection error is eliminated at the rising and falling
portions of the average power for a short time.
[0143]
Next, simulation results of each of the above embodiments will be shown in the case where the
adaptive period detection unit 20 shown in FIG. 9 is used.
[0144]
As microphone arrays, seven nondirective microphones arranged in a straight line at intervals of
2 cm were used, and positions 50 cm apart in the front direction of the microphone array were
used as virtual sound source positions of the prior art.
[0145]
The sound collection range in each of the above embodiments is, for example, a position of 30
cm to the left and a position of 30 cm to the right from the virtual sound source position in the
conventional example from the virtual sound source position (position of 1 point) in the
conventional example. Range, and 7 virtual target sound source positions were provided at 10
10-04-2019
30
cm intervals.
White noise was used for noise, and the noise source was placed at a position 1 m apart from the
virtual sound source position according to the prior art.
At this time, the frequency characteristic between the sound source and the array output is
shown in FIG. 10 (2) in the prior art and each of the above embodiments.
The target sound source position was set in two ways, that is, the virtual target sound source
position in the prior art and the position 20 cm laterally offset therefrom.
[0146]
FIG. 11 shows the simulation result.
[0147]
FIG. 11A shows the frequency characteristic between the sound source and the array output
when the target sound source position is at the virtual target sound source position according to
the prior art.
FIG. 11 (2) is a diagram showing the frequency characteristic between the sound source and the
array output when the target sound source position is shifted by 20 cm from the virtual target
sound source position of the prior art.
[0148]
In the frequency characteristic shown in FIG. 11 (1), no significant deterioration of the frequency
characteristic occurs in both the prior art and the above embodiment, but in the frequency
characteristic shown in FIG. 11 (2), the frequency characteristic of the prior art is not The high
frequency part is greatly deteriorated.
10-04-2019
31
In each of the above embodiments, the frequency characteristic does not significantly deteriorate
even in the frequency characteristic shown in FIG.
[0149]
From the above results, it has been confirmed that, in the conventional method, when the target
sound source deviates from the virtual sound source position, significant deterioration of the
frequency characteristic occurs.
However, in each of the above embodiments, it was confirmed that even if the target sound
source moves within the set sound collection range, the frequency characteristics do not
significantly deteriorate, and the target sound can be collected stably with high quality. .
[0150]
Also, it was confirmed that the noise suppression performance at this time is 15 dB or more in
both the prior art and the above-described embodiments, and high noise suppression is
performed.
[0151]
From the above simulation results, in each of the above embodiments, even when the target
sound source moves or when the target sound source position is not accurately known, high
quality sound pickup can be achieved by high noise suppression and low frequency characteristic
deterioration. It was confirmed that it was possible.
[0152]
According to the present invention, by setting a plurality of virtual target sound source positions
within the sound collection range, the constraint condition for maintaining the sensitivity within
the range is realized, so that the above-described sound collection range It is possible to pick up
the sound source present inside with low degradation of the frequency characteristic, to suppress
the noise outside the above pick-up range, and to filter correction even if the sound source moves
within the above pick-up range. There is no need for performance degradation due to movement
of the sound source. Therefore, even if the target sound source moves or the target sound source
position is not accurately known, noise suppression is high, degradation of frequency
characteristics is low, and high-quality sound pickup The effect of being able to
10-04-2019
32
[0153]
Brief description of the drawings
[0154]
1 is a block diagram showing a sound collection device CS1 according to a first embodiment of
the present invention.
[0155]
2 is a diagram for explaining the features of the above embodiment in comparison with the
conventional example.
[0156]
<Figure 3> It is the figure which explains the constitution of the above-mentioned example in
comparison with constitution of the former example.
[0157]
4 is a block diagram showing a sound collection device CS2 according to a second embodiment
of the present invention.
[0158]
5 is a block diagram showing a sound collection device CS3 according to a third embodiment of
the present invention.
[0159]
<Figure 6> It is the figure which shows the constitution of sound collection device CS4 which is
4th example of this invention.
[0160]
7 is a block diagram showing an adaptation period detection unit 20a which is a specific example
of the adaptation period detection unit 20 in each of the above embodiments.
[0161]
10-04-2019
33
8 is a block diagram showing an adaptation period detection unit 20b which is another specific
example of the adaptation period detection unit 20 in each of the above embodiments.
[0162]
9 is a block diagram showing an adaptation period detection unit 20c which is a specific example
of the adaptation period detection unit 20a in each of the above embodiments.
[0163]
<Figure 10> It is the figure which explains that it is easy to cause detection error with rise and
fall of average power for a short time, and the countermeasure.
[0164]
11 is a diagram showing a simulation result.
[0165]
12 is a diagram showing a conventional sound collection device CS11.
[0166]
Explanation of sign
[0167]
111 to 11 M: microphone, 14 B: first addition means, 211 to 21 M: second addition means, 121
to 12 M: third addition means, 14 A: fourth addition means, 22: fifth addition means, 15 ...
Subtraction means, 13A1-13AM ... Second variable filter, 13B1-13BM ... First variable filter, 16 ...
Adaptive algorithm unit, 171-17J, 17C ... Signal generator, 181, 1-18J, M, 18C1 to 18 CM: spatial
characteristic filter, 191 to 19 J, 19 C, 281, 1 to 28 J, M: delay device, 20: adaptive period
detection unit, 231 to 23 M: semi-fixed filter, 24: filter coefficient storage unit, 25: collection
Sound signal storage unit 26, 26C: virtual sound source position setting unit 27, 27: space
characteristic estimation unit, 291, 1 to 29 J, M: gain, 30: sound collection range setting unit,
201: short time average power calculation unit, 202 ... noise power setting unit, 2 3 power
comparison unit 204 long-time average power calculation unit 205 threshold coefficient
multiplication unit 206 rising threshold coefficient multiplication unit 207 falling threshold
coefficient multiplication unit 208 rising / falling switching unit 271 distance Calculation unit
272: inter-microphone relative delay amount calculation unit 273: inter-microphone relative
attenuation amount calculation unit
10-04-2019
34
10-04-2019
35
Документ
Категория
Без категории
Просмотров
0
Размер файла
50 Кб
Теги
description, jp2002062895
1/--страниц
Пожаловаться на содержимое документа