close

Вход

Забыли?

вход по аккаунту

?

JP2012209853

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012209853
The present invention provides a sound collecting apparatus in which phase correction is
automatically performed even if there is an error in a target sound estimation direction, a
microphone interval, or a sound speed, and sound collecting ability is improved. A sound
collection device (1) comprises a frequency analysis unit (100) that performs frequency analysis
on a multi-channel signal, a noise estimation unit (101) that estimates a noise spectrum of a
reference channel, and estimation from sound sources of target sound of each microphone. A
steering vector calculation unit 102 that calculates a steering vector from a target sound
estimation direction indicating a direction, a microphone interval indicating a distance between a
reference microphone and the other microphones, and a temperature, spectrum of each channel
output by the frequency analysis unit 100 A phase correction spectrum synthesis unit 103 that
synthesizes spectra after performing phase correction of each channel using the noise spectrum
of the reference channel output from the noise estimation unit 101 and the steering vector
output from the steering vector calculation unit 102. And [Selected figure] Figure 1
Sound collector and program
[0001]
The present invention relates to a sound collection device and a program, and more particularly
to a technique for appropriately collecting a target sound using a plurality of microphones.
[0002]
In speech recognition and hands-free communication, as the distance between the speaker and
the microphone increases, the influence of noise or echo increases, and there is a problem that
the recognition performance or the intelligibility is degraded.
04-05-2019
1
[0003]
Various sound collection methods have been proposed as techniques for solving such problems.
In general, by using a plurality of microphones and directing directivity to a target sound
direction, the effects of noise and echo can be reduced as compared with the case of using one
microphone.
As a sound collection method using a plurality of microphones, a method of making the same
phase by multiplying the signal of each channel by the complex conjugate of the steering vector
calculated from the target sound estimation direction and the microphone interval (for example,
Patent Document 1 and Non-patent Document 1 Reference 1) is known. The steering vector is a
vector that represents the phase difference between microphones with respect to the target
sound direction.
[0004]
JP, 2009-239500, A
[0005]
Jiro Oga, Yoshio Yamazaki, Yutaka Kanada "Sound System and Digital Processing" The Institute
of Electronics, Information and Communication Engineers,1995
[0006]
However, although the steering vector is calculated using the target sound estimation direction,
the microphone interval, and the sound velocity as parameters in the above-described
conventional technology, an error may occur in the target sound estimation direction, the
microphone spacing, or the sound velocity in an actual environment.
Also, due to the influence of echoes in the use environment, in some cases, it can not be made in
phase only by multiplying the complex conjugate of the steering vector.
04-05-2019
2
There is a problem that the sound collection performance is reduced by them. The present
invention has been made to solve the above-mentioned problems, and a sound collection device
is provided which automatically performs phase correction even if there is an error in the target
sound estimation direction, the microphone interval or the sound speed, and the sound collection
capability is improved An object of the present invention is to provide a program for causing a
computer to function as this device.
[0007]
A sound collection apparatus according to the present invention estimates a noise spectrum of a
reference channel output from a microphone serving as a reference among multiple channels
and a frequency analysis unit that performs frequency analysis on a multichannel signal output
from a plurality of microphones. Steering vector that calculates the steering vector from the
noise estimation unit, the target sound estimation direction that indicates the estimated direction
from the sound source of the target sound from each microphone, the microphone spacing that
indicates the distance between the reference microphone and the other microphones The phase
correction of each channel is performed using the calculation unit, the spectrum of each channel
output by the frequency analysis unit, the noise spectrum of the reference channel output by the
noise estimation unit, and the steering vector output by the steering vector calculation unit A
phase correction spectrum synthesizer that synthesizes the spectrum above It is characterized in
further comprising.
[0008]
According to the present invention, the spectrum synthesis is performed after the phase
correction of each channel is performed based on the frequency analysis result (spectrum of each
channel) of the multichannel signal, the noise spectrum of the reference channel and the steering
vector. Even when there is an error in the target sound estimation direction, the microphone
interval, or the speed of sound, phase correction is automatically performed, and a sound
collection device with improved sound collection capability can be provided.
[0009]
It is a block diagram which shows the structure of the sound collection apparatus based on
Embodiment 1 of this invention.
04-05-2019
3
It is a flowchart which shows the flow of the process in the frequency analysis part of FIG.
It is a flowchart which shows the flow of the process in the noise estimation part of FIG. It is a
flowchart which shows the flow of the process in the phase correction spectrum synthetic |
combination part of FIG. FIG. 6 is an explanatory diagram of a maximum phase difference φmax
between a reference spectrum X0 and a speech spectrum S. FIG. 5 is an explanatory diagram of
the maximum phase difference θmax between a synthesized spectrum X 0 + i of a reference
channel and a channel i and a speech spectrum S. It is a block diagram which shows the structure
of the sound collection apparatus based on Embodiment 2 of this invention. It is a flowchart
which shows the flow of the process in the noise correction part of FIG.
[0010]
Embodiment 1 FIG. 1 is a block diagram showing a configuration of a sound collection device 1
according to Embodiment 1 of the present invention. As shown in FIG. 1, the sound collection
device 1 according to the first embodiment is a device that combines multiple channels of signals
output from a plurality of microphones (not shown) by combining them into an in-phase signal. A
frequency analysis unit 100, a noise estimation unit 101, a steering vector calculation unit 102,
and a phase correction spectrum synthesis unit 103 are provided.
[0011]
The frequency analysis unit 100 inputs multi-channel signals output from a plurality of
microphones, and performs frequency analysis such as FFT (Fast Fourier Transform) on the input
multi-channel signals, for each multi-channel. It is a part which calculates a spectrum and outputs
a spectrum (input spectrum) for each of the calculated multiple channels. The noise estimation
unit 101 is a part that estimates a noise spectrum for each channel of a multi-channel signal and
outputs the estimated noise spectrum for each of the multiple channels. The noise estimation unit
101 also estimates the noise spectrum of the reference channel among multiple channels. Here,
the reference channel refers to a channel output from a microphone serving as a reference
among multiple channels. The noise estimation unit 101 outputs the noise spectrum of the
estimated reference channel.
[0012]
04-05-2019
4
The steering vector calculation unit 102 calculates a theoretical steering vector from the target
sound estimation direction (target sound direction) input from the outside of the sound collection
device 1, microphone spacing and temperature, and outputs the calculated theoretical steering
vector. It is a site. The target sound estimation direction indicates an estimated direction from the
position of the sound source (for example, a speaker) of the target sound at the position of each
microphone. The microphone spacing refers to the distance (distance) between the reference
microphone and each of the other microphones. The phase correction spectrum synthesis unit
103 performs phase correction from the multi-channel spectrum output from the frequency
analysis unit 100, the multi-channel noise spectrum output from the noise estimation unit 101,
and the theoretical steering vector output from the steering vector calculation unit 102. If it is
determined that it is necessary or not, it is a part that synthesizes a spectrum after correcting the
phase using the spectrum of the multichannel, the noise spectrum of the reference channel and
the steering vector.
[0013]
In the frequency analysis unit 100, the noise estimation unit 101, the steering vector calculation
unit 102, and the phase correction spectrum synthesis unit 103 described above, the CPU of the
computer constituting the sound collection device 1 executes a program according to the spirit of
the present invention. The above computer can be realized as a concrete means in which
software and hardware cooperate.
[0014]
Next, the operation will be described.
FIG. 2 shows the flow of processing in the frequency analysis unit 100 of the sound collection
device 1 according to Embodiment 1 of the present invention. The flow of processing of the
frequency analysis unit 100 will be described below with reference to FIG.
[0015]
The frequency analysis unit 100 substitutes 0 for the channel number i (ST100). When the
channel number i is less than the channel number CH_NUM corresponding to the number of a
plurality of microphones (ST101; YES), the frequency analysis unit 100 performs the process of
04-05-2019
5
ST102 and the channel number i is equal to or more than the channel number CH_NUM ( ST101:
NO) The frequency analysis unit 100 ends the processing.
[0016]
The frequency analysis unit 100 applies a window function such as a Hamming window to the
input (L (t-1) <n <Lt) of L samples in the frame t of the multichannel signal xi (n), and then the
frequency such as FFT The analysis is performed to calculate a spectrum Xi, t (f) and a power
spectrum Pi, t (f) (ST 102). Here, f indicates the band number of the frequency. Hereinafter, the
description of the frequency band number (f) as a variable will be omitted as appropriate. The
frequency analysis unit 100 increments the channel number i (ST103) and returns to the process
of ST101.
[0017]
FIG. 3 shows the flow of processing in the noise estimation unit 101 of the sound collection
device 1 according to Embodiment 1 of the present invention. The process flow of the noise
estimation unit 101 will be described below with reference to FIG.
[0018]
Noise estimation section 101 substitutes 0 for channel number i (ST200). When channel number
i is less than channel number CH_NUM (ST201; YES), noise estimation section 101 performs the
process of ST202, and when channel number i is equal to or greater than channel number
CH_NUM (ST201; NO), noise estimation section 101. End the process.
[0019]
Noise estimation section 101 substitutes 0 for frequency band number f (ST 202). If the
frequency band number f is less than the number of FFT points N_FFT (ST203; YES), the noise
estimation unit 101 performs the process of ST204 and if the frequency band number f is equal
to or greater than the number of FFT points N_FFT (ST203; NO), Noise estimation section 101
performs the process of ST207.
04-05-2019
6
[0020]
If the frame number t is less than the number of initialized frames INIT_FRAME, or if the
condition of Pi, t (f) −μi (f) <kσi (f) is satisfied (ST204; YES), the noise estimation unit 101
performs ST205. When the frame number t is equal to or greater than the initialization frame
number INIT_FRAME and the condition of Pi, t (f)-. Mu.i (f) .gtoreq.k.sigma.i (f) is processed (ST
204; NO), the noise estimation unit 101 , ST206 processing.
[0021]
Here, the initialization frame number INIT_FRAME is the number of frames for learning the initial
values of μi (f) and σi (f) in the noise estimation unit 101.
μ i (f) is the average power spectrum of frequency band number f of channel i, and σ i (f) is the
standard deviation of the power spectrum of frequency band number f of channel i. k is an
update parameter, and when the value is large, the followability to noise fluctuation is high, and
when the value is small, the followability to noise fluctuation is low.
[0022]
The noise estimation unit 101 updates the average power spectrum μi (f) and the standard
deviation σi (f) of the power spectrum based on the following equations (1) to (7) (ST205). <img
class = "EMIRef" id = "205624993-000003" />
[0023]
In the above equations (1) to (7), SUM1i (f) and SUM2i (f) are addition buffers for frequency band
number f, BUFSIZE is the number of frames for calculating statistics, cnti (f) is frequency band
number f The counter, oldest represents the oldest frame number being added in the addition
buffer. Noise estimation section 101 increments frequency band number f (ST206) and performs
the process of ST203.
04-05-2019
7
[0024]
It is needless to say that the noise estimation method described in the first embodiment is an
example, and the case of estimating noise using another algorithm is also within the scope of the
present invention.
[0025]
The steering vector calculation unit 102 calculates a target sound estimation direction θ (rad)
given from the outside of the sound collection device 1, a microphone interval di (m) which is a
distance between a microphone related to the channel number i and a microphone serving as a
reference. And an ideal phase difference (steering vector) generated between the microphones is
calculated from the temperature tm (° C.).
For example, when the microphones are installed in a straight line, and the direction of the
vertical line connecting the microphones is defined as 0 (rad), the steering vector ai (f) of channel
i with respect to the reference channel is obtained from the following equations (8) to (10)
Calculate <img class = "EMIRef" id = "205624993-00004" />
[0026]
FIG. 4 shows the flow of processing in the phase correction spectrum synthesizing unit 103 of
the sound collection device 1 according to the first embodiment of the present invention.
Hereinafter, the flow of processing of the phase correction spectrum synthesis unit 103 will be
described with reference to FIG.
[0027]
The phase correction spectrum synthesis unit 103 substitutes 0 into the frequency band number
f (ST300). When the frequency band number f is less than the number of FFT points N_FFT
(ST301; YES), the phase correction spectrum combining unit 103 performs the processing of
ST302, and when the frequency band number f is the number of FFT points N_FFT or more
(ST301; NO), The phase correction spectrum synthesis unit 103 ends the process.
04-05-2019
8
[0028]
The phase correction spectrum synthesizing unit 103 calculates the maximum phase difference
φmax between the spectrum X0 (f) of the reference channel and the speech spectrum S (f)
according to the following equations (11) to (12) (ST302). <img class = "EMIRef" id =
"205624993-000005" />
In equations (11) and (12), N 0 (f) represents the noise spectrum of the reference channel, and μ
0 (f) represents the average power spectrum of the reference channel, respectively.
[0029]
Here, the process of deriving the maximum phase difference φmax between the spectrum X0 (f)
of the reference channel and the speech spectrum S (f) will be described. FIG. 5 is a schematic
view showing the physical meaning of the maximum phase difference φmax. The spectrum X0
of the reference channel is represented as a vector on the complex plane. Since the spectrum X0
of the reference channel is a combination of speech (speech spectrum S) and noise (noise
spectrum N0), the relationship of X0 = S + N0 holds. Therefore, the speech spectrum S is present
on a circle whose center is the spectrum X0 of the reference channel and whose radius is the size
| N0 | of the noise spectrum, and in FIG. It is the range where S exists. The reason that the phase
difference between the spectrum X0 of the reference channel and the speech spectrum S is
maximized is that the speech spectrum S is on the tangent from the origin (the start point of the
vector X0 and the vector S) to the circle shown by an alternate long and short dash line in FIG.
Since it is time, the maximum phase difference φmax can be calculated from equation (11).
[0030]
Returning to FIG. 4, the phase correction spectrum synthesis unit 103 substitutes 0 for the
channel number i (ST 303). If channel number i is less than channel number CH_NUM (ST 304;
YES), phase correction spectrum combining section 103 performs the process of ST 305, and if
channel number i is equal to or greater than channel number CH_NUM (ST 304; NO), phase
correction Spectrum synthesizing section 103 performs the process of ST310.
[0031]
04-05-2019
9
The phase correction spectrum synthesis unit 103 calculates the maximum phase difference
θmax between the synthesized spectrum X0 + i (f) of the reference channel and channel i and
the speech spectrum S (f) according to the following equations (13) to (16) ST 305). In Formula
(13), * represents a complex conjugate. <img class = "EMIRef" id = "205624993-000006" />
[0032]
Here, the process of deriving the maximum phase difference θmax between the synthesized
spectrum X0 + i (f) of the reference channel and channel i and the speech spectrum S (f) will be
described. FIG. 6 is a schematic view showing the physical meaning of the maximum phase
difference θmax.
[0033]
The combined spectrum X0 + i of the reference channel and the channel i is represented as a
vector on the complex plane. The combined spectrum X0 + i of the reference channel and the
channel i is the spectrum of the reference channel and the spectrum of the in-phased channel i
added, so the relationship of X0 + i = 2S + N0 + i holds. Therefore, 2S is present on a circle whose
radius is the combined spectrum of the noise of the reference channel and the noise of the
channel i | N 0 + i |, with the combined spectrum X 0 + i of the reference channel and the
channel i as the center, In FIG. 6, the circumference of a circle indicated by a dashed dotted line
is a range in which 2S exists. The phase difference between the synthesized spectrum X0 + i of
the reference channel and the channel i and the speech spectrum S is maximized by the tangent
of the 2S from the origin (the start point of the vector X0 + i and the vector 2S) in FIG. The
maximum phase difference θmax can be calculated from equation (15).
[0034]
Further, assuming noises observed in the reference channel and channel i to have the same
degree of power and no correlation, | N0 + i | can be calculated from equation (16).
[0035]
04-05-2019
10
Returning to FIG. 4, the phase correction spectrum synthesizing unit 103 calculates the phase
difference ε between the spectrum Xi of channel i and the synthesized spectrum X0 + i of the
reference channel and channel i from Expression (17) (ST 306).
<img class = "EMIRef" id = "205624993-00007" />
[0036]
If the phase difference ε calculated by equation (17) exceeds (φmax + θmax) calculated by
equations (11) to (16) (ST 307; YES), phase correction spectrum synthesizing section 103 needs
to correct the phase. It decides and, processing of ST308 is done. If the phase difference ε is
equal to or less than (φmax + θmax) (ST307; NO), the phase correction spectrum synthesis unit
103 performs the process of ST309.
[0037]
If the phase difference ε exceeds (φmax + θmax) (ST 307; YES), the phase correction spectrum
synthesis unit 103 corrects the phase of the spectrum of channel i by α = ε-(φmax + θmax)
according to the following equation (18) ST 308). <img class = "EMIRef" id = "20562499300008" />
[0038]
Here, the meaning of equation (18) will be described. The maximum phase difference between
the spectrum X0 of the reference channel and the speech spectrum S is φmax from FIG. 5, and
the maximum phase difference between the synthesized spectrum X0 + i of the reference channel
and channel i and the speech spectrum S is θmax from FIG. The maximum phase difference
between the spectrum X0 of the reference channel and the combined spectrum X0 + i of the
reference channel and the channel i is (φmax + θmax). Therefore, if the phase difference ε
between the spectrum Xi of the channel i and the combined spectrum X0 + i of the reference
channel and the channel i is larger than (φmax + θmax), it is considered that there is an error in
the steering vector ai (f). Equation (18) corrects the error of the steering vector ai (f) such that
the phase difference ε between the reference channel X0 and the combined spectrum X0 + i of
the reference channel and channel i falls within the range of the maximum phase difference
04-05-2019
11
(φmax + θmax). Means that
[0039]
Since the phase correction spectrum synthesizing unit 103 according to the first embodiment
corrects the phase of the spectrum of the channel i according to the above equation (18), the
decrease in the sound collecting ability due to the error of the steering vector ai (f) There is an
effect that it is possible to suppress and improve the sound collecting ability of the sound
collecting device 1.
[0040]
Returning to FIG. 4, the phase correction spectrum synthesizing unit 103 increments the channel
number i (ST309) and performs the process of ST304.
[0041]
When the channel number i becomes equal to or more than the channel number CH_NUM
(ST304; NO), the phase correction spectrum synthesis unit 103 synthesizes the final output
spectrum XBF (f) by the following equation (19) (ST310).
<img class = "EMIRef" id = "205624993-00009" />
[0042]
The phase correction spectrum synthesis unit 103 increments the frequency band number f
(ST311) and performs the process of ST301.
[0043]
The above is the operation of the phase correction spectrum synthesis unit 103, but after
performing IFFT (Inverse Fast Fourier Transform) on the synthesized spectrum, it may be output
as a time signal.
[0044]
04-05-2019
12
By configuring in this way, after correcting the phase of each channel using the spectrum Xi (f) of
the multichannel signal, the noise spectrum N0 (f) of the reference channel and the steering
vector ai (f), the spectra are combined Therefore, even if there is an error in the steering vector,
phase correction is automatically performed to improve the sound collection capability.
As a cause of the error of the steering vector, an error or echo in the microphone interval,
temperature or target sound estimation direction can be considered. However, the sound
collection device 1 according to the first embodiment does not depend on the cause of the error.
The correction can be done automatically only from
[0045]
In the first embodiment, although the channel of channel number i = 0 is described as the
reference channel, it goes without saying that any channel is within the scope of the present
invention.
[0046]
As described above, the sound collection device 1 according to the first embodiment is output
from the frequency analysis unit 100 that performs frequency analysis on multi-channel signals
output from a plurality of microphones, and a microphone serving as a reference among multiple
channels. Noise estimation unit 101 that estimates the noise spectrum of the reference channel,
target sound estimation direction that indicates the estimated direction from the sound source of
the target sound of each microphone, microphone spacing and temperature that indicates the
spacing between the reference microphone and the other microphones A spectrum of each
channel output from the frequency analysis unit 100; a noise spectrum of a reference channel
output from the noise estimation unit 101; and a steering vector output from the steering vector
calculation unit 102. Correct the phase of each channel using Since it is configured to include the
phase correction spectrum synthesis unit 103 that synthesizes the spectrum in the above, the
spectrum synthesis is performed after performing phase correction of each channel from the
spectrum for each multichannel, the noise spectrum of the reference channel and the steering
vector By doing this, even when there is an error in the target sound estimation direction, the
microphone interval or the sound speed, it is possible to provide a sound collection device in
which phase correction is automatically performed to improve the sound collection capability.
[0047]
Further, according to Embodiment 1, the phase correction spectrum combining unit 103
calculates the maximum phase difference (φmax + θmax) between the spectrum of the
reference channel and the combined spectrum of the reference channel and each channel, and
04-05-2019
13
calculates the maximum phase difference ( After phase correction of each channel is performed
based on φmax + θmax, the spectrum is synthesized so that phase correction is automatically
performed even if there is an error in the target sound estimation direction, the microphone
interval or the sound speed, and the collection is performed. It is possible to provide a sound
collection device with improved sound capability.
[0048]
Further, according to the first embodiment, the phase correction spectrum synthesis unit 103
sets the maximum phase difference φmax between the spectrum of the reference channel and
the speech spectrum of the target sound to sin <−1> (| N0 | / | X0 |). Calculating the maximum
phase difference θmax between the synthesized spectrum of the reference channel and each
channel and the speech spectrum by sin <−1> {| (√2) N0 | / | X0 + i |}}, Since the maximum
phase difference between the reference channel and the combined spectrum of each channel is
calculated by (φmax + θmax), a phase difference larger than the calculated maximum phase
difference occurs due to an error of the steering vector. A correction can be performed, and a
sound collection device with improved sound collection ability can be provided.
[0049]
Second Embodiment
Since the sound collection device 1 according to the first embodiment calculates the maximum
phase difference (φmax + θmax) from the estimated size | N0 | of the noise spectrum of the
reference channel, the maximum is obtained when there is an error in | N0 | An error may also
occur in the phase difference, which may affect the subsequent phase correction process.
Therefore, in the second embodiment described below, an embodiment in which an error of | N 0
| is corrected will be described.
[0050]
FIG. 7 is a block diagram showing a configuration of a sound collection device 1 according to
Embodiment 2 of the present invention.
04-05-2019
14
In FIG. 7, the same reference numerals as those in FIG. 1 denote the same or corresponding parts,
and the description thereof will be omitted.
[0051]
As shown in FIG. 7, the sound collection device 1 according to the second embodiment includes a
frequency analysis unit 100, a noise estimation unit 101, a steering vector calculation unit 102, a
phase correction spectrum synthesis unit 103, and a noise correction unit 104. It is configured.
The difference from the first embodiment is that a noise correction unit 104 is added.
[0052]
The noise correction unit 104 determines whether it is necessary to correct the size of the noise
spectrum from the multi-channel spectrum output from the frequency analysis unit 100 and the
theoretical steering vector output from the steering vector calculation unit 102. When it is
determined, it is a part that corrects the size of the noise spectrum of the reference channel
output from the noise estimation unit 101 using the spectrum of multiple channels and the
steering vector.
[0053]
In the frequency analysis unit 100, the noise estimation unit 101, the steering vector calculation
unit 102, the phase correction spectrum synthesis unit 103, and the noise correction unit 104
described above, the CPU of the computer constituting the sound collection device 1 has a
program according to the gist of the present invention. By being implemented, in the above
computer, software and hardware can be realized as specific means in cooperation.
[0054]
Next, the operation will be described.
The operations of the frequency analysis unit 100 and the noise estimation unit 101 are the
same as in the first embodiment, and thus the description thereof is omitted.
04-05-2019
15
FIG. 8 shows the flow of processing in the noise correction unit 104 of the sound collection
device 1 according to Embodiment 2 of the present invention.
Hereinafter, the operation of the noise correction unit 104 will be described with reference to
FIG.
[0055]
Noise correction section 104 substitutes 0 for frequency band number f (ST400).
If the frequency band number f is less than the number of FFT points N_FFT (ST401; YES), the
noise correction unit 104 performs the process of ST402, and if the frequency band number f is
equal to or greater than the number of FFT points N_FFT (ST401; NO), noise correction is
performed. The unit 104 ends the process.
[0056]
The noise correction unit 104 calculates the maximum phase difference φmax between the
spectrum X0 (f) of the reference channel and the speech spectrum S (f) according to the above
equations (11) to (12) (ST402).
[0057]
Noise correction section 104 calculates maximum phase difference θmax between synthesized
spectrum X0 + i (f) of reference channel and channel i and speech spectrum S (f) according to the
above equations (13) to (16) (ST 403).
[0058]
The noise correction unit 104 calculates the phase difference ε between the spectrum Xi of
channel i and the combined spectrum X0 + i of the reference channel and channel i from the
above-mentioned equation (17) (ST 404).
04-05-2019
16
If the phase difference ε calculated by equation (17) exceeds (φmax + θmax) calculated by
equations (11) to (16) (ST 405; YES), noise correction unit 104 determines the magnitude of the
noise spectrum of the reference channel. If it is determined that the correction is necessary, the
process of ST406 is performed, and the phase difference ε is equal to or less than (φmax +
θmax) (ST405; NO), the noise correction unit 104 performs the process of ST407.
[0059]
If the phase difference ε exceeds (φmax + θmax) (ST405; YES), the noise correction unit 104
calculates the size | N0 '| of the noise spectrum of the reference channel after correction
according to Equation (20) (ST406).
<img class = "EMIRef" id = "205624993-000010" />
[0060]
Here, the meaning of equation (20) will be described. Although there is a relation of | N0 | = X0
sin φmax from FIG. 5, since it is determined that ε> (φmax + θmax) in ST405, (ε-θmax)>
φmax. Is corrected to be larger than | N0 |. Therefore, equation (20) automatically estimates the
noise spectrum of the reference channel when the noise spectrum size | N 0 | of the reference
channel is estimated to be smaller than the actual value and the maximum phase difference φ
max is also estimated small. It means to correct it.
[0061]
Since the noise correction unit 104 according to the second embodiment corrects the noise
spectrum size of the reference channel according to the above equation (20), there is an error in
the estimated noise spectrum size of the reference channel Even in this case, correction is
automatically performed, and the maximum phase differences φmax and θmax are
automatically corrected to calculate the maximum phase differences φmax and θmax based on
the size of the noise spectrum of the reference channel after the correction. There is an effect
that the sound collecting ability of the device 1 can be further improved.
[0062]
04-05-2019
17
The operations of steering vector calculation unit 102 and phase correction spectrum synthesis
unit 103 are the same as described above except that phase correction spectrum synthesis unit
103 uses the magnitude of the noise spectrum of the reference channel (after correction) output
from noise correction unit 104. The description is omitted because it is the same as that of the
first embodiment.
[0063]
With the above configuration, the noise spectrum size of the reference channel is corrected using
the multi-channel spectrum and the steering vector, so there is an error in the estimated noise
spectrum size. However, correction is automatically performed, and as a result, there is an effect
that the sound collecting ability can be improved.
As a cause of the error of the noise spectrum size of the reference channel, time variation of
noise etc. can be considered, but the present method has a feature of automatically correcting
only from the observation signal without depending on the cause of the error.
[0064]
In the second embodiment, although the channel of channel number i = 0 is described as the
reference channel, it goes without saying that any channel is within the scope of the present
invention.
[0065]
As described above, the sound collection device 1 according to the second embodiment is output
from the frequency analysis unit 100 that performs frequency analysis on multi-channel signals
output from a plurality of microphones, and a microphone serving as a reference among multiple
channels. Noise estimation unit 101 that estimates the noise spectrum of the reference channel,
target sound estimation direction that indicates the estimated direction from the sound source of
the target sound of each microphone, microphone spacing and temperature that indicates the
spacing between the reference microphone and the other microphones A reference vector output
from the noise estimation unit 101 using the steering vector calculation unit 102 that calculates
a steering vector from the above, the spectrum of each channel output from the frequency
analysis unit 100, and the steering vector output from the steering vector calculation unit 102
Noise correction unit that corrects noise spectrum size The phase correction of each channel is
carried out using 04, the spectrum of each channel output by the frequency analysis unit 100,
the noise spectrum of the reference channel output by the noise correction unit 104, and the
04-05-2019
18
steering vector output by the steering vector calculation unit 102. Since the present invention is
configured to include the phase correction spectrum synthesis unit 103 that synthesizes the
spectrum, phase correction is automatically performed even if there is an error in the target
sound estimation direction, the microphone interval, or the sound speed, and Even if there is an
error in the magnitude of the noise spectrum of the estimated reference channel, correction is
automatically performed, and a sound collection device with improved sound collection capability
can be provided.
[0066]
Further, according to the second embodiment, the noise correction unit 104 calculates the
maximum phase difference (φmax + θmax) between the spectrum of the reference channel and
the combined spectrum of the reference channel and any channel, and the calculated maximum
phase difference (φmax + θmax). Since the noise spectrum size of the reference channel is
corrected on the basis of), correction is automatically performed even if there is an error in the
noise spectrum size of the estimated reference channel, and the sound collecting ability Can be
provided.
[0067]
Further, according to the second embodiment, the noise correction unit 104 calculates the
maximum phase difference φmax between the spectrum of the reference channel and the audio
spectrum of the target sound by sin <−1> (| N0 | / | X0 |). The maximum phase difference
θmax between the reference channel and the synthesized spectrum of each channel and the
speech spectrum is calculated by sin <−1> {| (22) N0 | / | X0 + i |}, and the magnitude of the
noise spectrum of the reference channel Is calculated by X0 sin (.epsilon .-. Theta.max), and even
if there is an error in the noise spectrum of the estimated reference channel, it is automatically
corrected based on the calculated maximum phase difference. Can be performed to provide the
sound collection device 1 with an improved sound collection capability.
[0068]
The present invention is useful, for example, in realizing voice recognition performance under
noise environment in a car navigation system, a mobile phone, an information terminal, etc., or
improvement in call quality.
[0069]
In the scope of the invention, the present invention allows free combination of each embodiment,
or modification of any component of each embodiment, or omission of any component in each
embodiment. .
04-05-2019
19
[0070]
1 sound collection apparatus, 100 frequency analysis unit, 101 noise estimation unit, 102
steering vector calculation unit, 103 phase correction spectrum synthesis unit, 104 noise
correction unit.
04-05-2019
20
Документ
Категория
Без категории
Просмотров
0
Размер файла
32 Кб
Теги
jp2012209853
1/--страниц
Пожаловаться на содержимое документа