close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2009288215

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009288215
The present invention provides an acoustic processing apparatus capable of estimating a
direction even if there is variation in sensitivity of the sound receiving apparatus, and there is a
change with time. SOLUTION: A first beam former 102-1 for obtaining a first output signal by
performing filter processing for forming directivity in a first direction on a sound reception
signal of each reception device 101; A second beam former 102-2 for obtaining a second output
signal by performing filtering processing to form directivity in a second direction different from
the first direction for the sound reception signal, and strength of the first output signal And a
direction estimation unit 105 for estimating the sound source direction based on the intensity
ratio. [Selected figure] Figure 4
Sound processing apparatus and method thereof
[0001]
The present invention relates to an acoustic processing apparatus and method for performing
direction estimation and sound source separation which are robust to changes in balance of
sensitivity of a plurality of sound receiving devices in array technology used in hands-free speech
and voice recognition.
[0002]
Recently, using a plurality of microphones, (1) enhancement of a signal arriving from a specific
direction, (2) estimating the direction when the direction of arrival is unknown, (3) separation of
a plurality of sound sources arriving from different directions Research on microphone array
technology to do
10-04-2019
1
[0003]
The first method of this technology is the simplest microphone array method, which includes
delay-and-sum arrays (see Non-Patent Document 1).
In this method, a predetermined delay is inserted into the signal of each microphone, and
addition processing is performed so that only the signal arriving from the preset direction is
added and emphasized in the same phase, while the other direction is added. Based on the
principle that signals coming from are destructive without being in phase, signals from a specific
direction are emphasized, that is, directivity is formed in that direction.
[0004]
As a second method, in the case of determining whether the sound source is present in the left or
right direction with respect to the array consisting of two microphones, or when separating each
sound from the signal in which the left and right sounds are mixed, There is a method of using
the ratio of signal strength received by two microphones as an index.
This method is based on the phenomenon that the microphone on the sound source side receives
a louder sound than the microphone on the opposite side. Non-Patent Document 2 introduces a
sound source separation method using this principle. J. L. Flanagan, J. D. Johnston, R. Zahn and G.
W. Elko, "Computer-steered microphone arrays for sound transduction in large rooms," J. Acoust.
Soc. Am., Vol. 78, no. 5, pp. 1508-1518, 1985 N. Roman, D. Wang, and G. Brown, "Speech
segmentation based on sound localization," J. Acoust. Soc. Am., Vol. 114, no. 4, pp. 2236-2252,
2003
[0005]
The method based on the above signal strength ratio is premised on that the left and right
microphone sensitivities are the same.
[0006]
However, in reality, the sensitivity of the microphones varies, and the change with time is not
small, and it is difficult to always maintain the same sensitivity.
10-04-2019
2
[0007]
Therefore, there is a problem that the performance of the sound source direction estimation and
the sound source separation is degraded due to the fluctuation of the power ratio.
[0008]
Therefore, the present invention provides an acoustic processing device and method that can
estimate the direction even if there is variation in the sensitivity of a sound receiving device such
as a microphone and there is a change with time.
[0009]
According to the present invention, a plurality of sound receiving devices for receiving a sound
from a sound source and a filtering process for forming directivity in a first direction are applied
to the sound receiving signals of the respective receiving devices to obtain a first output signal.
The first beamformer unit to be determined, and the second sound signal is obtained by
performing a filtering process to form directivity in a second direction different from the first
direction for the sound reception signal of each receiving apparatus. Source direction information
of the sound source is estimated based on the intensity comparison information and an intensity
comparison unit for obtaining intensity comparison information from the beam former, the
intensity of the first output signal and the intensity of the second output signal And a direction
estimation unit.
[0010]
According to the present invention, the sensitivity of the sound receiving device varies, and the
sound source direction can be estimated even if there is a change with time.
[0011]
Hereinafter, a sound processing apparatus according to an embodiment of the present invention
will be described based on the drawings.
[0012]
First Embodiment A sound processing apparatus according to a first embodiment will be
described based on FIGS. 1 to 3.
10-04-2019
3
[0013]
(1) Configuration of Sound Processing Device FIG. 1 is a block diagram of a sound processing
device according to the present embodiment.
[0014]
The sound processing apparatus comprises N sound receiving apparatuses 101-1 to 101-N, and
a first beam former 102-1 and a second beam emphasizing output of a signal coming from a
specific direction by filtering the sound receiving signals. A beam former 102-2, a first power
calculator 103-1 for calculating a power which is an intensity of an output signal thereof, a
second power calculator 103-2, and a power ratio calculator 104 for calculating the ratio thereof.
[0015]
The first beam former 102-1, the second beam former 102-2, the first power calculating unit
103-1, the second power calculating unit 103-2, and the power ratio calculating unit 104 read
the computer stored or transmitted in the computer. It is also possible to implement with
possible programs.
[0016]
(2) Operation Principle of Sound Processing Device Next, the operation principle of the sound
processing device will be described in order.
[0017]
The signals x1 to xN input to the sound receiving devices 101-1 to 101-N are input to the first
beam former 102-1 and the second beam former 102-2.
[0018]
As the beam formers 102-1 and 102-2, there is a method of controlling direct directivity such as
a delay and sum array or a Griffith-Jim type array.
In addition to this, a method of indirectly controlling directivity based on the property of a signal,
such as ICA (Independent Component Analysis), can be used.
10-04-2019
4
[0019]
The two beam formers 102-1 and 102-2 are designed to form directivity in different directions in
advance.
That is, it is designed in advance so that different directions are the target sound directions
(directions to direct directivity).
An example of setting of directivity is shown in FIG.
[0020]
In FIG. 2, the number of microphones as the sound receiving device is 2, and the opposite
directions on the straight line connecting the two microphones 101-1 and 2 are the target sound
direction and the second beam of the first beam former 102-1. The target sound direction of the
former 102-2 is set.
The target sound direction may be any direction other than this as long as it is a direction
different from each other.
[0021]
The first power calculation unit 103-1 and the second power calculation unit 103-2 calculate the
powers of the outputs of the two beam formers 102-1 and 102-2, respectively, and the power
ratio calculation unit 104 calculates two power ratios. Calculate and output.
[0022]
Conventionally, the power ratio generated by the arrival direction of the signal has been used for
sound source direction estimation and sound source separation.
10-04-2019
5
In FIG. 2, the distance r1 from the sound source S to the first microphone 101-1 is shorter than
the distance r2 to the second microphone 101-1, and as a result, a loud sound is received to the
first microphone 101-1 Be done.
Therefore, by observing the power ratio with the first microphone 101-1 and the second
microphone 101-2, it is possible to obtain a clue to know the sound source position.
[0023]
However, the sensitivity of the actual microphones varies, and the change with time is not small,
and it is difficult to always maintain the same sensitivity.
Therefore, the power ratio changes not only due to the difference of the sound source position
but also due to the imbalance of the microphone sensitivity, and there is a problem that the
estimation system of the sound source position is lowered.
[0024]
In this embodiment, the ratio of the outputs of the two beam formers 102-1 and 102-2 is not
taken directly but the ratio of the outputs of the two beam formers 102-1 and 102-2 is robust.
The power ratio can be obtained, and the performance of the sound source direction and the
sound source separation can be maintained even when the microphone sensitivity is unbalanced.
[0025]
(3) Experiment In FIG. 3, the experimental result which compared the conventional method and
the sound processing apparatus of this embodiment in the state of FIG. 2 is shown.
[0026]
As an experimental condition, the first microphone 101-1 and the second microphone 101-2 are
used, the microphone distance is set to d = 5 cm, and the first beam former 102-1 is oriented in
the -90 degree direction (upper direction in FIG. 2) The second microphone 101-2 sets the target
sound direction in the 90 degree direction (downward direction in FIG. 2) and generates a 1 kHz
sine wave as a first microphone 101-1 and a second microphone 101-2. The case of moving from
-90 degrees to 90 degrees while maintaining a distance of 50 cm from the center of the
10-04-2019
6
[0027]
(3-1) Conventional Method First, as a conventional method, the case of using the output power of
the first microphone 101-1 and the second microphone 101-2 will be described based on FIGS. 3
(a) and 3 (b).
[0028]
The ratio Rpow of the output power | X1 | of the first microphone 101-1 and the output power |
X2 | of the second microphone 101-2 is expressed by the following equation.
[0029]
However, r1 is the distance from the first microphone 101-1 to the sound source S, r2 is the
distance from the second microphone 101-1 to the sound source S, A1 is the sensitivity of the
first microphone 101-1, and A2 is the first microphone 101-. It is a sensitivity of 1.
[0030]
Thus, it can be understood that the output of the first microphone 101-1 closer to the sound
source S has a larger power than the opposite side.
The output power when the sensitivity of the first microphone 101-1 is normal is indicated by a
solid line, and the output power when the sensitivity is halved is indicated by a dotted line.
[0031]
In general, the sensitivity of the microphones has a large variation and fluctuation, and a change
in sensitivity of about 1/2 is not uncommon.
[0032]
FIG. 3 (b) shows the power ratio of the microphone output.
The solid line indicates a state in which the sensitivity of the first microphone 101-1 is normal,
10-04-2019
7
and the dotted line indicates a case in which the sensitivity of the first microphone 101-1 is
halved.
[0033]
As shown in FIGS. 3A and 3B, the fluctuation of the power ratio is large with respect to the
sensitivity change of the microphone, and it is difficult to estimate the sound source position
(direction) from the power ratio.
[0034]
(3-2) This Embodiment Next, the case where the sound processing apparatus of this embodiment
is used is demonstrated based on FIG.3 (c) (d).
The beamformer used the delay and sum method.
[0035]
FIG. 3C shows the power of the output of two beam formers (also referred to as “BF” for
short).
The BFA has a −90 degree direction, and the BFB has a 90 degree direction as the target sound
direction.
[0036]
The ratio RBF of BFA to BFB is as follows.
[0037]
Here, ΔS is the arrival time difference due to the sound source position, and ΔA is the delay
time due to the delay and sum array.
[0038]
10-04-2019
8
The solid line in FIG. 3C indicates that the sensitivity of the first microphone 101-1 is normal,
and the dotted line indicates that the sensitivity of the first microphone 101-1 is halved.
[0039]
Here, it should be noted that the influence of the change in sensitivity of the first microphone
101-1 occurs in both beam formers.
[0040]
As a result, in the beamformer output power ratio shown in FIG. 3D, even if a change in the
microphone sensitivity occurs, this is canceled out, and a power ratio (beamformer output power
ratio) in which the value hardly changes is obtained.
[0041]
(4) Modified Example 1 The above method utilizes that the intensity of the output of two beam
formers different in the target direction is different depending on the position of the sound
source, and the difference is robust to the unbalance of the microphone sensitivity.
Thus, using the amplitude instead of the beamformer output power as a measure of strength
would have the same effect.
[0042]
(5) Modification 2 In addition, a non-linear scale such as power or amplitude expressed in
decibels may be used.
[0043]
(6) Modification 3 Furthermore, even if a difference is used instead of a ratio, a difference in
intensity can be obtained.
[0044]
(7) Modification 4 In this embodiment, the directions of 90 degrees and -90 degrees for the two
beam formers are the target sound direction, but this direction may not necessarily be the
direction.
10-04-2019
9
[0045]
If the expected sound source position is known in advance to a certain extent, for example, if the
position of the speaker is limited by the seat position, such as in a car, the sound source position
may be measured in another way in advance by a video conference system or the like. For
example, by setting the target sound direction so that one beamformer has the maximum
sensitivity in the direction and the other has the lowest sensitivity, the beamformer output power
ratio for the speech of the speaker There is also a conceivable way of designing to have the
maximum value.
[0046]
In addition, even if the angle corresponding to the integer sample delay in the time domain does
not correspond to a clean angle, there is no problem even if the angle is used for implementation
reasons.
[0047]
(8) Modification 5 The timing of calculation of the beamformer output power ratio is as follows.
[0048]
The first is a method performed on a sample basis of the signal discrete in the time direction, the
second is a method performed by smoothing this in the time direction, and the third is an
average value, a median value, etc. in a frame consisting of a predetermined number of samples.
Using the representative value of
[0049]
Also in the case of performing in the frequency domain, the same method as in the case of the
time domain can be applied to a signal sequence obtained while shifting a predetermined
analysis window from time to time.
[0050]
Second Embodiment A sound processing apparatus according to a second embodiment will be
described with reference to FIG.
[0051]
10-04-2019
10
(1) Configuration of Sound Processing Device FIG. 4 is a block diagram of a sound processing
device according to the present embodiment.
[0052]
The present embodiment includes a sound source direction estimation unit 105 and a direction
information dictionary 106 in addition to the sound processing apparatus of the first
embodiment.
[0053]
(2) Operation Principle of Sound Processing Device Next, the operation principle of the sound
processing device of the present embodiment will be described.
[0054]
The process until obtaining the output of the power ratio calculation unit 104 is the same as that
of the first embodiment, and thus the description thereof is omitted.
[0055]
The sound source direction estimation unit 105 estimates the sound source direction based on
the beamformer output ratio obtained by the power ratio calculation unit 104 and the
information of the direction information dictionary 106.
[0056]
In particular.
In the direction information dictionary 106, for example, a correspondence table of the sound
source direction and the beamformer output power ratio, such as a solid line in FIG. 3D, is stored.
Then, the sound source direction estimation unit 105 converts the input beamformer output
power ratio (corresponding to the vertical axis in FIG. 3D) into an angle (same horizontal axis),
and outputs this as a sound source direction.
10-04-2019
11
[0057]
(3) Modifications In practice, it is not necessary to obtain accurate angle information, and in
some cases, information on whether the sound source is in the right direction or left direction
may be required.
In that case, information for converting the positive and negative of the beamformer output
power ratio to the left and right of the sound source direction may be stored in the sound source
direction dictionary 106.
[0058]
As described above, information for converting the beamformer output power ratio into the
sound source direction may be stored in the direction information dictionary 106 according to
the application and the required angular resolution.
[0059]
If the correspondence can be expressed analytically, an equation may be used instead of the
correspondence table.
[0060]
(4) Effects As a classical method of estimating the sound source direction, a method of obtaining
the output while changing the target direction little by little from -90 degrees to 90 degrees for
the beamformer and setting the maximum value as the sound source direction (beam former
method) There is.
The disadvantage of the beamformer method is that it is computationally expensive to apply the
beamformer to multiple target directions.
In addition, since the output value changes with respect to the microphone sensitivity change, it
is difficult to simplify the calculation by remembering the output value in advance, and it is
always necessary to search for the maximum value in all directions.
10-04-2019
12
[0061]
On the other hand, in the present embodiment, the amount of calculation is small simply by
applying the beamformer in two directions.
Also, using the ratio (or difference) of the beamformer output cancels out the change of the
output value with respect to the microphone sensitivity change and becomes robust to the
microphone sensitivity change, so referring to the graph of the ratio obtained in advance, the
sound source position Almost identifiable.
[0062]
Third Embodiment A sound processing apparatus according to a third embodiment will be
described with reference to FIG.
[0063]
(1) Configuration of Sound Processing Device FIG. 5 is a block diagram of a sound processing
device according to the present embodiment.
[0064]
The sound processing apparatus according to the present embodiment receives the sound
receiving devices 101-1 to 101-N, the time frequency conversion unit 208, the frequency
selection unit 209, and filters the sound reception signal for each frequency component to obtain
a signal arriving from a specific direction. The first beam former 202-1 and the second beam
former 202-2 which perform emphasis output, the first power calculator 203-1 and the second
power calculator 203-2, which calculate the power of the output signal for each frequency
component, A sound source direction estimation unit 205 that estimates a sound source direction
for each frequency component using a power ratio calculation unit 204 that obtains a ratio for
each frequency component and a direction information dictionary 206, and direction integration
that integrates a sound source direction for each frequency component in one direction It
consists of a part 207.
[0065]
(2) Operation Principle of Sound Processing Device Next, the operation principle of the sound
10-04-2019
13
processing device of the present embodiment will be described.
[0066]
The outline of the operation is the same as that of the second embodiment, but the present
embodiment is different in that the signal is divided into frequency units and processed.
[0067]
(2-1) Time-to-frequency converter 208 First, the time-frequency converter 208 converts the time
signals obtained by the sound receiving devices 101-1 to 101-N into signals in the frequency
domain using discrete Fourier transform.
Assuming that the window length is 2 (L1-1), L1 frequency components are usually obtained.
[0068]
(2-2) Frequency Selection Unit 209 Next, the frequency selection unit 209 selects a frequency
component to which the subsequent processing is to be applied.
By selecting a frequency (for example, 100 Hz to 3 kHz) with high power of the audio signal as a
selection criterion, it is possible to improve estimation accuracy in a noisy environment.
[0069]
In addition, it is also useful to improve estimation accuracy by eliminating low frequencies (for
example, 100 Hz or less) susceptible to noise.
[0070]
The selection criterion of frequency components is to select effective components from the target
signal, and other selection methods are also possible.
10-04-2019
14
In addition, it is also possible to combine adjacent frequency components into one and process
them as sub-bands for reasons such as reduction in calculation amount.
[0071]
When all frequency components are handled, the frequency component selection unit 209 is
unnecessary.
[0072]
(2-3) First Power Calculating Unit 203-1, Second Power Calculating Unit 203-2 The frequency
components of each of the channels L2 selected in this manner are the first beams whose
directivity is directed in different directions. The former and the second beam former are
processed for each frequency component, the output of one channel is outputted for each
frequency component from each beam former, and the first power calculation unit 203-1 and the
second power calculation unit 203-2 respectively The power of the frequency component is
calculated and passed to the power ratio calculator 204.
[0073]
Here, the power ratio is calculated as in the previous embodiments, but at this time, adjacent
frequency components may be collected to obtain an average value.
By averaging, the power ratio is more stable.
[0074]
(2-4) Power Ratio Calculation Unit 204 The power ratio calculation unit 204 outputs the power
ratio of L3 frequency components.
When processing such as averaging is not performed, L2 = L3.
[0075]
10-04-2019
15
(2-5) Sound source direction estimation unit 205 The sound source direction estimation unit 205
uses the relationship between the power ratio and the sound source direction stored in the
direction information dictionary for each frequency component to estimate an estimated sound
source direction for each frequency component. Output.
[0076]
(2-6) Direction Integration Unit 207 The direction integration unit 207 generates a
predetermined number of sound source directions from the L3 estimated sound source
directions.
[0077]
The generation method is to use an average value, a median value or a mode value.
When the predetermined number is 2 or more, clustering may be performed on the obtained
direction to determine the direction.
[0078]
Also, if the direction of each frequency component is sufficient, the direction integration unit 207
is unnecessary.
For example, a method is one in which the frequency components are distributed according to
the estimated sound source direction for each frequency component to separate voices of a
plurality of speakers.
[0079]
Fourth Embodiment A sound processing apparatus according to a fourth embodiment will be
described with reference to FIG.
[0080]
(1) Configuration of Sound Processing Device FIG. 6 shows a block diagram of the sound
10-04-2019
16
processing device according to the present embodiment.
[0081]
The sound processing device according to the present embodiment includes the sound receiving
devices 101-1 to 101-N, the time frequency conversion unit 208, and the filter processing of the
sound receiving signal for each frequency component to enhance and output a signal that arrives
from a specific direction. The beam former 02-1 and the second beam former 202-2, the first
power calculator 203-1 and the second power calculator 203-2 for calculating the power of the
output signal for each frequency component, and the ratio thereof for each frequency component
The power ratio calculation unit 204 to be calculated, the selection unit 304 which selects the
weighting coefficient corresponding to the power ratio from the weighting coefficient dictionary
303, the weighting unit 305 which weights the frequency component of each channel, It is
composed of a time frequency inverse transform unit 307 to be returned.
[0082]
(2) Operation Principle of Sound Processing Device Next, the operation principle of the sound
processing device of the present embodiment will be described.
In this embodiment, an array processing is realized which emphasizes and outputs only an input
signal in a specific direction using a beamformer output power ratio.
[0083]
The procedure for obtaining the beamformer output power ratio by the power ratio calculation
unit 204 from the frequency component output from the time frequency conversion unit 208 is
the same as that of the third embodiment.
[0084]
In the present embodiment, the weighting factor is selected from the weighting factor dictionary
303 by the selecting section 304 using the beamformer output power ratio as the feature
quantity.
10-04-2019
17
That is, the weighting factor dictionary 303 stores the feature quantity and the weighting factor
in correspondence with each other, and the selecting unit 304 selects the weighting factor
corresponding to the feature quantity from the weighting factor dictionary 303.
The weighting factor is set such that the value becomes larger as it approaches the direction in
which the sound source is present.
However, the direction in which the sound source exists in this case means the direction of the
target sound source set in advance.
By this, it is possible to emphasize only a target sound source and output its direction as a
weight, instead of emphasizing all sound sources.
[0085]
The weighting factor is multiplied by the frequency component for each channel in weighting
section 305 and added in addition section 306.
Since the output of the power ratio calculation unit 304 is for each frequency component, the
subsequent processing is also performed in frequency component units.
That is, when the power ratio of the kth frequency component is p (k), the weighting factor W (n,
k) by which the kth frequency component of the nth sound receiving device (channel number n)
is multiplied is (1) It becomes like a formula.
[0086]
W (n, k) = F (n, k, p (k)) (1) where F (n, k, p) is a feature from the weighting factor dictionary 303
of channel number n and frequency component k It is a function that selects a weighting factor
whose quantity (here, beamformer output power ratio) corresponds to p (k).
10-04-2019
18
W (n, k) is multiplied by the input signal X (n, k) in the weighting unit 305-n of channel number
n, added in the addition unit 306, and the output signal Y (k) is obtained.
If this is expressed by the following equation (2),
[0087]
となる。
W (n, k) is generally a complex number.
Y (k) is converted back to a time signal by the time frequency inverse converter 307.
[0088]
The weighting factor dictionary 303 is obtained in advance by the method described in JP-A2007-10897.
As described in this Japanese Patent Application Laid-Open No. 2007-10897, the feature amount
may be multidimensional by combining with other than the beam former output power ratio.
[0089]
Also, as in the third embodiment, the beamformer output power ratio may be used for only a part
of frequency components.
[0090]
In addition, weighted addition is also possible which is an expression in which only weights are
subjected to Fourier inverse transform and convoluted with time signals.
10-04-2019
19
This is based on the basic principle that the product in the frequency domain is expressed by
convolution in the time domain.
[0091]
(Modifications) The present invention is not limited to the above embodiments, and various
modifications can be made without departing from the subject matter of the present invention.
[0092]
FIG. 1 is a block diagram according to a first embodiment of the present invention.
It is a figure which shows the relationship between a sound source and a microphone.
It is an experimental example of a beamformer output power ratio.
It is a block diagram concerning the 2nd Embodiment of this invention.
It is a block diagram concerning the 3rd embodiment of the present invention.
It is a block diagram concerning a 4th embodiment of the present invention.
Explanation of sign
[0093]
101-1-N Sound receiving device 102-1 1st beam former 102-2 2nd beam former 103-1 1st
power calculation part 103-2 2nd power calculation part 104 power ratio calculation part
10-04-2019
20
Документ
Категория
Без категории
Просмотров
0
Размер файла
30 Кб
Теги
description, jp2009288215
1/--страниц
Пожаловаться на содержимое документа