close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2015164267

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2015164267
Abstract: The present invention provides a sound collection device that extracts sound waves
from a desired sound source more reliably and outputs the sound waves to a speech recognition
device. A sound collection device (11) comprises a first microphone array (121) in which a
plurality of microphones for observing a sound wave coming from a desired sound source are
linearly arranged, and a desired one parallel to the first microphone array. A plurality of
microphones arranged linearly in front of the first microphone array as viewed from the sound
source, and a plurality of microphones for observing sound waves coming from the desired
sound source, and 122 Relationship between the sound wave from the desired sound source
included in one sound wave signal and the sound wave from the desired sound source included
in the plurality of second sound wave signals, and the sound source other than the desired sound
source included in the plurality of first sound wave signals The sound wave coming from the
desired sound source is extracted based on the relationship between the sound wave from the
sound source and the sound wave from the sound source other than the desired sound source
contained in the plurality of second sound wave signals. [Selected figure] Figure 1
Sound collecting device, sound collecting method, and program
[0001]
The present disclosure relates to a sound collection device and a sound collection method, and a
program, and more particularly, to a sound collection device and a sound collection method, and
a program capable of more reliably extracting a sound wave from a desired sound source.
[0002]
11-04-2019
1
Heretofore, in a sound pickup apparatus that picks up sound by receiving sound with a
microphone and capturing it as an electric signal, a technique for extracting (or emphasizing) a
sound coming from a desired sound source is used in order to eliminate an adverse effect due to
ambient noise. ing.
For example, ambient noise has been a cause of a decline in speech recognition rate in speech
recognition technology, a drop in intelligibility in a video conference system, and a lack of sense
of reality.
[0003]
Therefore, in order to solve these causes, research has been conducted for some time using
microphone array processing technology in which a plurality of microphones are arranged. For
example, in microphone array processing technology, the incident angle at which the microphone
array arrives at the sound from a desired sound source such as human voice, the sound of the air
conditioner, the ambient speech, and the noise flowing from a television or radio Array
processing is performed using different things.
[0004]
For example, Patent Document 1 discloses a technique for extracting a sound source signal or the
like with a high signal-to-noise ratio by performing spatial Fourier transform domain processing
on a signal collected by a two-dimensional planar microphone array. There is.
[0005]
However, under normal circumstances, noise is often behind the speaker. For example, assuming
a vending machine that can be purchased by voice recognition, a car may come and go behind
the purchaser of the vending machine. Sound, passing people's voices, street announcements, etc.
will be an environment where sounds come from all directions.
As described above, in an environment where sound and noise from a desired sound source come
from the same direction, as in the prior art, microphone array technology that utilizes the fact
that the incident angles from which those sounds arrive is different is desirable. It was difficult to
11-04-2019
2
extract the sound from the sound source.
[0006]
In addition, as one application of the conventional microphone array processing technology, for
example, studies of acoustic holography have been conducted to determine which part such as an
engine or a vehicle is generating a large amount of sound.
[0007]
For example, Patent Documents 2 and 3 disclose techniques for searching for the position of a
sound source by improving the acoustic holography method.
[0008]
In these techniques, it is assumed how the sound physically propagates, and the propagation
source is traced back to identify the sound source.
In these techniques, a large number of microphones are prepared on a two-dimensional plane, or
a method of acquiring an area by sweeping about two microphones or the like is adopted.
[0009]
JP, 2012-165273, A JP, 6-109528, A JP, 8-233931, A
[0010]
As described above, in the conventional microphone array processing technology, sound is
extracted based on the arrival angle of the sound wave to the microphone array.
For this reason, when the desired sound and the unnecessary sound such as noise come from the
same direction, it is difficult to distinguish between the two, so even if the microphone array
processing technology is applied, the desired sound is obtained. It was difficult to extract reliably.
11-04-2019
3
[0011]
The present disclosure has been made in view of such a situation, and enables sound waves from
a desired sound source to be extracted more reliably.
[0012]
A sound collection device according to one aspect of the present disclosure includes a
predetermined number of sound waves coming from a desired sound source and a sound wave
coming from a sound source other than the desired sound source and observed by a
predetermined number of microphones arranged linearly. A first microphone array for
outputting a first sound wave signal, and a linear arrangement of the first microphone array
arranged substantially parallel to the first microphone array, in front of the first microphone
array as viewed from the desired sound source A second microphone that outputs a
predetermined number of second sound signals obtained by observing a sound wave coming
from the desired sound source and a sound wave coming from a sound source other than the
desired sound source with a predetermined number of microphones arranged in An array, a
component of the sound wave from the desired sound source included in a predetermined
number of the first sound signal and a component of a sound wave from the desired sound
source included in a predetermined number of the second sound signal A relationship, and a
sound wave component from a sound source other than the desired sound source included in a
predetermined number of the first sound signal and a sound wave from a sound source other
than the desired sound source included in a predetermined number of the second sound signal
And an extraction processing unit that extracts a sound wave that comes from the desired sound
source based on the relationship with the component of.
[0013]
A sound collection method or program according to one aspect of the present disclosure is
obtained by observing a sound wave coming from a desired sound source and a sound wave
coming from a sound source other than the desired sound source with a predetermined number
of microphones arranged linearly. A first microphone array for outputting a number of first
sound signals, and a plurality of first microphone arrays arranged substantially parallel to the
first microphone array and in front of the first microphone array as viewed from the desired
sound source; A second number of second sound wave signals are obtained, which are obtained
by observing a sound wave coming from the desired sound source and a sound wave coming
from a sound source other than the desired sound source with a predetermined number of
microphones arranged linearly. A microphone array, a component of the sound wave from the
desired sound source included in a predetermined number of the first sound signal, and the
desired sound source included in a predetermined number of the second sound signal Of the
sound wave from the sound source other than the desired sound source included in the
11-04-2019
4
predetermined number of the first sound signal and the desired sound source included in the
predetermined number of the second sound signal The sound collection method of a sound
collection device comprising a sound extraction device for extracting a sound wave coming from
the desired sound source based on the relationship with the component of sound waves from
other sound sources, or the computer of the sound collection device is made to execute In the
program, in each of the microphones of the first and second microphone arrays, the sound wave
from the desired sound source is observed as a spherical wave propagating while spreading in a
spherical shape, and sound waves from sound sources other than the desired sound source are
flat. Of a predetermined number of first frequency domain signals by performing fast Fourier
transform on a predetermined number of the first sound signals, which are observed as plane
waves propagating A predetermined number of second frequency domain signals are calculated
by performing fast Fourier transformation on each of the predetermined number of the second
sound wave signals, and calculating the predetermined number of the first frequency domain
signals. A first wave number domain signal represented by a function having as an argument the
wave number of the sound wave arriving at the first microphone array by performing a spatial
Fourier transform according to the position of each microphone of the first microphone array By
performing spatial Fourier transformation according to the position of each microphone of the
second microphone array on a predetermined number of the second frequency domain signals,
the sound wave arriving at the second microphone array A second wave number area signal
represented by a function having a wave number as an argument is determined, and the first
sound is calculated from the first wave number area signal and the second wave number area
signal. Calculating a spherical wave component wave number domain signal in which the
component of the spherical wave included in the wave signal or the second sound wave signal is
space Fourier transformed.
[0014]
In one aspect of the present disclosure, a first microphone array having a predetermined number
of microphones arranged linearly is obtained by observing a sound wave coming from a desired
sound source and a sound wave coming from a sound source other than the desired sound
source. A predetermined number of first sound signals are output.
Also, a second microphone array having a predetermined number of microphones disposed
substantially in parallel with the first microphone array and on the front side of the first
microphone array as viewed from the desired sound source and disposed linearly A
predetermined number of second sound wave signals obtained by observing a sound wave
coming from a desired sound source and a sound wave coming from a sound source other than
the desired sound source are output.
11-04-2019
5
Then, the relationship between the component of the sound wave from the desired sound source
included in the predetermined number of first sound wave signals and the component of the
sound wave from the desired sound source included in the predetermined number of second
sound wave signals, Based on the relationship between the component of the sound wave from a
sound source other than the desired sound source included in the sound signal of 1 and the
component of the sound wave from the sound source other than the desired sound source
contained in the predetermined number of second sound signals An incoming sound wave is
extracted.
[0015]
According to one aspect of the present disclosure, sound waves from a desired sound source can
be extracted more reliably.
[0016]
BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram illustrating a configuration
example of an embodiment of a sound collection device to which the present technology is
applied.
It is a spatial sensitivity distribution figure of a single nondirectional microphone.
It is a space sensitivity distribution figure in case the frequency of the sound extracted by the
sound collection apparatus is 500 Hz.
It is a space sensitivity distribution figure in case the frequency of the sound extracted by the
sound collection apparatus is 1 kHz. It is a space sensitivity distribution figure in case the
frequency of the sound extracted by the sound collection apparatus is 2 kHz. It is a flowchart
explaining the process in which a sound collection apparatus extracts the sound from a desired
sound source. Fig. 21 is a block diagram illustrating a configuration example of an embodiment
of a computer to which the present technology is applied.
[0017]
11-04-2019
6
Hereinafter, specific embodiments to which the present technology is applied will be described in
detail with reference to the drawings.
[0018]
FIG. 1 is a block diagram showing a configuration example of an embodiment of a sound
collection device to which the present technology is applied.
[0019]
As shown in FIG. 1, the sound collection device 11 is configured to include two microphone
arrays 121 and 122 and a sound wave extraction processing unit 13.
Then, the sound collection device 11 extracts (or emphasizes) the sound wave from the desired
sound source from the sound wave including both the sound wave coming from the desired
sound source and the sound wave coming from the sound source other than the desired sound
source.
[0020]
Each of the microphone arrays 121 and 122 is similarly configured by linearly arranging a
predetermined number of microphone elements.
That is, the microphone array 121 is configured by arranging M microphone elements 211-1 to
211-M in a linear array, and the microphone array 122 includes M microphone elements 212-1
to 212-M. Arranged in a shape.
[0021]
Also, the microphone arrays 121 and 122 are arranged to be parallel to each other. The
microphone array 122 is disposed on the front side of the microphone array 121 with respect to
the desired sound source, for example, on the left side of the microphone array 121 when the
desired sound source is on the left as shown in FIG.
11-04-2019
7
[0022]
The microphone elements 211-1 to 211-M and the microphone elements 212-1 to 212-M
convert mechanical vibrations of a diaphragm or the like generated by sound waves into
electrical signals (sound signals). That is, the microphone elements 211-1 to 211-M and the
microphone elements 212-1 to 212-M observe sound waves including both sound waves coming
from a desired sound source and sound waves coming from a sound source other than the
desired sound source Output a sound wave signal obtained.
[0023]
The sound wave extraction processing unit 13 includes M fast Fourier transform units 311-1 to
311 -M, M fast Fourier transform units 312-1 to 312 -M, two space Fourier transform units 321
and 322, and spherical wave extraction A processing unit 33, an inverse space Fourier transform
unit 34, a signal determination unit 35, and an inverse fast Fourier transform unit 36 are
provided.
[0024]
The fast Fourier transform units 311-1 to 311 -M perform fast Fourier transform on sound wave
signals supplied from the microphone elements 211-1 to 211 -M, respectively, and the resulting
frequency domain signals are subjected to space Fourier transform unit Supply to 321.
Similarly, the fast Fourier transform units 312-1 to 312-M perform fast Fourier transform on the
sound wave signals supplied from the microphone elements 212-1 to 212-M, respectively, and
the resulting frequency domain signals are The signal is supplied to the Fourier transform unit
322.
[0025]
The spatial Fourier transform unit 321 performs spatial Fourier transform according to the
positions of the microphone elements 211-1 to 211-M on the frequency domain signals supplied
from the fast Fourier transform units 311-1 to 311-M, and the result The obtained wave number
11-04-2019
8
domain signal is supplied to the spherical wave extraction processing unit 33. Similarly, the
spatial Fourier transform unit 322 performs spatial Fourier transform according to the positions
of the microphone elements 212-1 to 212-M on the frequency domain signals supplied from the
fast Fourier transform units 312-1 to 312-M. Then, the wave number domain signal obtained as
a result is supplied to the spherical wave extraction processing unit 33.
[0026]
The spherical wave extraction processing unit 33 uses the wave number domain signal supplied
from the spatial Fourier transform unit 321 and the wave number domain signal supplied from
the spatial Fourier transform unit 322 to generate a wave number domain based on the sound
wave signal arriving from the desired sound source. Extract the signal.
[0027]
Here, the sound waves observed in the sound collection device 11 will be described.
For example, a desired sound source emitting a sound wave to be extracted by the sound
collection device 11 is at a position near the sound collection device 11 and is a sound source
other than the desired sound source (Referred to as a noise source) at a position farther than the
desired sound source. At this time, in the sound collection device 11, the sound wave coming
from the desired sound source is observed in the state of propagating while spreading in a
spherical shape, and the sound wave coming from the noise source is sufficiently spread and
propagates in a plane It is observed. That is, in the sound collection device 11, the sound wave
from the desired sound source is observed as a spherical wave, and the sound wave from the
noise source is observed as a plane wave. Therefore, the wave number domain signal supplied to
the spherical wave extraction processing unit 33 includes both the component corresponding to
the plane wave and the component corresponding to the spherical wave.
[0028]
Therefore, the spherical wave extraction processing unit 33 corresponds to the component
corresponding to the plane wave included in the wave number domain signal supplied from the
spatial Fourier transform unit 321 and the plane wave included in the wave number domain
signal supplied from the spatial Fourier transform unit 322 And the component corresponding to
the spherical wave included in the wave number domain signal supplied from the spatial Fourier
11-04-2019
9
transform unit 321 and the spherical wave included in the wave number domain signal supplied
from the spatial Fourier transform unit 322 Based on the relationship with the corresponding
component, processing of extracting the component of the spherical wave is performed. Then,
the spherical wave extraction processing unit 33 supplies the component corresponding to the
spherical wave included in the wave number domain signal, that is, the spherical wave
component wave number domain signal obtained by subjecting the spherical wave to space
Fourier transform to the inverse space Fourier transform unit 34 Do.
[0029]
The inverse spatial Fourier transform unit 34 performs inverse spatial Fourier transform on the
spherical wave component wave number domain signal supplied from the spherical wave
extraction processing unit 33, and the resultant spherical wave component frequency domain
signal is determined by the signal determination unit 35. Supply to For example, when extracting
a spherical wave observed in the microphone array 121, the inverse space Fourier transform unit
34 performs the inverse space Fourier transformation according to the positions of the
microphone elements 211-1 to 211-M. A plurality of spherical wave component frequency
domain signals are calculated according to the number of -1 to 211-M (i.e., M).
[0030]
Among the plurality of spherical wave component frequency domain signals determined by the
inverse space Fourier transform unit 34, the signal determination unit 35 is a target to which the
inverse fast Fourier transform unit 36 performs an inverse fast Fourier transform to output as a
sound wave signal. To determine the spherical wave component frequency domain signal. For
example, of the plurality of spherical wave component frequency domain signals determined by
the inverse spatial Fourier transform unit 34, the signal determination unit 35 corresponds to the
spherical wave corresponding to the microphone element 211 at an arbitrary position (for
example, the center position). A component frequency domain signal is determined as an object
to be subjected to inverse fast Fourier transform.
[0031]
The inverse fast Fourier transform unit 36 performs an inverse fast Fourier transform on the
spherical wave component frequency domain signal determined by the signal determination unit
35, and the resulting acoustic wave signal of the spherical wave is obtained by a downstream
device not shown (for example, , Voice recognition device, recording device, etc.).
11-04-2019
10
[0032]
In the sound collection device 11 configured in this way, the sound wave from the desired sound
source is selected from among the sound wave from the desired sound source near the
microphone arrays 121 and 122 and the sound wave from the noise source far away. It can be
extracted more reliably.
As described above, by extracting the sound at any place from the plural sounds generated from
the plural positions, the adverse effect due to the ambient noise is eliminated in the telephone,
the videophone, the television relay, the speech recording, etc. For example, a decrease in speech
recognition rate can be suppressed.
[0033]
Next, the process in which the spherical wave extraction processing unit 33 extracts a spherical
wave will be described in detail using mathematical formulas.
[0034]
First, as shown in FIG. 1, the direction in which the microphone elements 21 are linearly
arranged in the microphone arrays 121 and 122 is set as the X-axis direction.
Also, the direction orthogonal to the X-axis direction, that is, the spacing direction of the
microphone arrays 121 and 122 arranged in parallel is set as the Y-axis direction. In FIG. 1, the
desired sound source and the microphone arrays 121 and 122 are arranged at the same position
(height) in the Z direction with respect to the Z direction perpendicular to the paper surface, and
the following description will be made. Then, the parameter in the Z direction is omitted.
[0035]
Further, the microphone elements 21 in the microphone arrays 121 and 122 are arranged at an
interval dx in the X-axis direction, and the microphone arrays 121 and 122 are arranged at an
11-04-2019
11
interval dy in the Y-axis direction. Further, the desired sound source is arranged at the position
(xr, yr) with the center position of the microphone array 121 as the reference (0, 0). In addition, a
plane wave as noise flies from a distance from the desired sound source so as to be at an angle θ
with respect to the X-axis direction.
[0036]
In the following description, sound waves are treated as being separated into single frequencies
by Fourier transform or the like, and the function exp (jωt) representing single frequency sounds
is omitted. Here, ω is an angular frequency, and the angular frequency ω and the frequency f
have a relationship of ω = 2πf. In addition, when combining as a signal that spans the entire
band, combining can be performed by performing inverse Fourier transform after processing for
each frequency.
[0037]
The frequency domain signal D1 obtained by subjecting the sound wave signal observed by the
microphone array 121 to the fast Fourier transform in the fast Fourier transform units 311-1 to
311-M is represented by the following equation (1). Be done. Similarly, a frequency domain
signal D2 obtained by subjecting a sound wave signal observed by the microphone array 122 to
fast Fourier transform in the fast Fourier transform units 312-1 to 312-M is expressed by the
following equation (2): expressed.
[0038]
[0039]
That is, as shown in the equation (1), the frequency domain signal D1 is a plane wave component
frequency domain signal P1 in which the component of the plane wave included in the sound
wave signal is fast Fourier transformed, and the component of the spherical wave included in the
sound wave signal is high speed It is expressed as that on which the Fourier-transformed
spherical wave component frequency domain signal Q1 is superimposed.
11-04-2019
12
Similarly, as shown in the equation (2), the frequency domain signal D2 is a plane wave
component frequency domain signal P2 in which the component of the plane wave included in
the sound wave signal is fast Fourier transformed, and the component of the spherical wave
included in the sound wave signal It can be expressed as a superposition of the fast Fourier
transformed spherical wave component frequency domain signal Q2.
[0040]
Here, in the equation (1), the first argument represents the position x in the X-axis direction of
each of the microphone elements 211-1 to 211 -M that constitute the microphone array 121.
Similarly, in the equation (2), the first argument represents the position x in the X-axis direction
of each of the microphone elements 212-1 to 212 -M that constitute the microphone array 122.
In the equations (1) and (2), the second argument represents the position of each of the
microphone arrays 121 and 122 in the Y-axis direction. Here, the position of the microphone
array 121 is used as a reference in the Y-axis direction, and the position of the microphone array
121 in the Y-axis direction is 0 as shown in equation (1), and microphone array 122 as shown in
equation (2). The position in the Y-axis direction of is the spacing dy.
[0041]
The plane wave component frequency domain signals P1 and P2 have a parameter A
representing the magnitude of the sound from the noise source, a trace wavelength kx in the X
axis direction, and the plane wave coming from the angle θ as shown in FIG. The trace
wavelength ky in the Y-axis direction can be expressed by the following equations (3) and (4).
[0042]
[0043]
Here, the trace wavelength kx in the X-axis direction is expressed by the following equation (5)
with the wave number k = ω / c (ω: angular frequency, c: sound velocity) determined by the
frequency of the sound Have.
Further, the trace wavelength ky in the Y-axis direction has the relationship shown in the
following equation (6) between the trace wavelength kx in the X-axis direction and the wave
11-04-2019
13
number k.
[0044]
[0045]
On the other hand, the spherical wave component frequency domain signal Q1 has a parameter B
representing the magnitude of the sound from the desired sound source and the desired sound
source, as shown in FIG. 1, from the arrival of the spherical wave from the position (xr, yr). The
distance r1 (x) to the microphone element 211 at the position x can be expressed by the
following equation (7).
Similarly, using the parameter B representing the magnitude of the sound from the desired sound
source and the distance r2 (x) from the desired sound source to the microphone element 212 at
position x, the spherical wave component frequency domain signal Q2 is It can be expressed by
equation (8).
[0046]
[0047]
Further, the distance r1 (x) and the distance r2 (x) are expressed by the following equations (9)
and (10) using the position (xr, yr) of the desired sound source.
[0048]
[0049]
By the way, as shown in the equations (1) and (2), in the frequency domain signals D1 and D2
obtained by fast Fourier transform of the sound signals observed by the microphone arrays 121
and 122, the plane wave and the spherical wave are mixed. It is a waveform.
11-04-2019
14
Therefore, in order to extract the spherical wave component frequency domain signal Q1 or Q2
from the frequency domain signals D1 and D2 in the sound wave extraction processing unit 13,
first, the spatial domain transform of the frequency domain signals D1 and D2 along the X-axis
direction Will be applied.
When performing such space Fourier transform, the distance dx in the X-axis direction between
the microphone elements 21 needs to be constant.
Note that the frequency domain signals D1 and D2 are windowed with a Hanning window or the
like in the same manner as normal Fourier transform processing to reduce the level of the signal
observed by the microphone elements 21 at both ends, and both ends are smooth. Can be
processed to lead to outside of the section.
[0050]
Here, the spatial Fourier transform along the X-axis direction is defined as the following equation
(11).
[0051]
[0052]
However, Equation (11) represents, for example, a wave number domain signal S ′ (x, y)
obtained by performing space Fourier transform on the frequency domain signal S (x, y) along
the X-axis direction. In 11), M is the number of the microphone elements 21 arranged in the Xaxis direction.
Further, in the equation (11), k ′ x is a variable when performing inverse Fourier transform on
the trace wavelength kx in the X-axis direction, and can take any value.
[0053]
Thus, defining the spatial Fourier transform along the X-axis direction, the wave number domain
11-04-2019
15
signal D1 ′ obtained as a result of performing the spatial Fourier transform on the frequency
domain signal D1 is expressed by the following equation (12) .
Also, a wave number domain signal D2 'obtained as a result of performing the space Fourier
transform on the frequency domain signal D2 is expressed by the following equation (13).
[0054]
[0055]
Similarly, a plane wave component wave number domain signal P1 'obtained as a result of the
space Fourier transform being performed on the plane wave component frequency domain signal
P1 is expressed by the following equation (14).
Further, a plane wave component wave number area signal P2 'obtained as a result of
performing the space Fourier transformation on the plane wave component frequency area
signal P2 is expressed by the following equation (15).
[0056]
[0057]
Here, in the equations (14) and (15), δ is a delta function.
[0058]
Furthermore, a spherical wave component wave number domain signal Q1 'obtained as a result
of performing space Fourier transformation on the spherical wave component frequency domain
signal Q1 is expressed by the following equation (16).
Further, the spherical wave component wave number domain signal Q2 ′ obtained as a result of
performing the space Fourier transformation on the spherical wave component frequency
11-04-2019
16
domain signal Q2 is expressed by the following equation (17) using the position yr of the desired
sound source in the Y axis direction. It is represented by).
[0059]
[0060]
Here, in the equations (16) and (17), H0 <(2)> is a second-order Hankel function of zeroth order,
and K0 is a modified Bessel function of zeroth order.
Further, as shown in the equations (16) and (17), the spherical wave component wave number
domain signals Q1 ′ and Q2 ′ differ according to the magnitude relationship with the variable
k′x and the absolute value of the wave number k. It becomes.
[0061]
Next, the relationship between the component of the plane wave of the sound wave observed by
the microphone array 121 and the component of the plane wave of the sound wave observed by
the microphone array 122, and the component of the spherical wave of the sound wave observed
by the microphone array 121 The relationship with the component of the spherical wave of the
sound wave observed by the microphone array 122 is considered.
That is, when the parameter A representing the magnitude of the sound from the noise source is
eliminated from the above equations (14) and (15), the following equation (18) holds.
Similarly, when the parameter B representing the magnitude of the sound from the desired sound
source is eliminated from the above equations (16) and (17), the following equation (19) holds.
[0062]
[0063]
11-04-2019
17
And based on the relationship shown to Formula (18) and Formula (19), Formula (13) mentioned
above can be represented by following Formula (20).
[0064]
[0065]
Here, in the equation (20), G0 is a zero-order second-order Hankel function H0 <(2)> according
to the magnitude relation between the absolute value of the variable k'x and the absolute value of
the wave number k as described above It is a function to replace one of the zeroth-order modified
Bessel function K0.
[0066]
Then, using the equation (20) and the above equation (12), the spherical wave extraction
processing unit 33 calculates a spherical wave component by computing the following equation
(21). Wave number domain signal Q1 'can be extracted.
[0067]
[0068]
Thereafter, the inverse space Fourier transform unit 34 performs an inverse space Fourier
transform on the spherical wave component wave number region signal Q1 ′ obtained as a
result of the spherical wave extraction processing unit 33 computing equation (21) to obtain a
spherical wave component. The frequency domain signal Q1 can be determined.
Then, the inverse fast Fourier transform unit 36 performs an inverse fast Fourier transform on
the spherical wave component frequency domain signal Q1 obtained by the inverse space Fourier
transform unit 34 to obtain the component of the spherical wave contained in the sound wave
signal. Can be output.
[0069]
As described above, the sound wave extraction processing unit 13 extracts the component of the
spherical wave based on the relationship between the plane waves included in the sound waves
11-04-2019
18
observed by the microphone arrays 121 and 122 and the relationship between the spherical
waves, ie, , Sound waves from the desired sound source can be extracted.
[0070]
By the way, as described above, among the plurality of spherical wave component frequency
domain signals obtained by the inverse spatial Fourier transform unit 34, the spherical wave
component frequency domain signal to be subjected to the inverse fast Fourier transform by the
inverse fast Fourier transform unit 36 Is determined by the signal determination unit 35.
At this time, the signal determination unit 35 determines the spherical wave component
frequency domain signal corresponding to the microphone element 21 at an arbitrary position as
a target of the inverse fast Fourier transform, for example, each of a plurality of spherical wave
component frequency domain signals. The signals summed after the timing may be determined
as a target of the inverse fast Fourier transform.
That is, in the microphone element 21 linearly arranged from the desired sound source, the
spherical wave that spreads in a spherical shape is delayed more than the microphone element
21 closer to the desired sound source in the microphone element 21 farther from the desired
sound source Observed.
[0071]
Therefore, for example, the signal determination unit 35 assumes a position of a desired sound
source, and configures a delay-and-sum array in which the delay from the position until the
sound wave reaches each of the microphone elements 211-1 to 211-M is considered.
And the signal determination part 35 can determine the signal which took the sum after
compensating each delay as object of an inverse fast Fourier transform.
[0072]
11-04-2019
19
Here, the delay and sum array will be described.
For example, assuming that the desired sound source is located at the position (xr, yr), the
distance r1 (x) from the desired sound source to each of the microphone elements 211-1 to 211M is given by the following equation (22) expressed.
Further, the time difference τ (x) for reaching each of the microphone elements 211-1 to 211-M
from the desired sound source is expressed by the following equation (23) using the sound
velocity c.
[0073]
[0074]
Therefore, the signal determination unit 35 determines the inverse time transfer function exp
(jωτ (x)) that compensates for the time difference τ (x) according to the position of each of the
microphone elements 211-1 to 211-M. A wave number domain signal Q1 'can be multiplied at
each corresponding position x to calculate a sum signal, which can be determined as a target of
inverse fast Fourier transform.
[0075]
As described above, the signal determining unit 35 uses the delay-and-sum array to calculate the
spherical wave component wave number domain signal Q1 ′ to be subjected to the inverse fast
Fourier transform in order to output as a sound wave signal. It can be amplified, for example, to
improve the signal-to-noise ratio.
[0076]
Now, assuming that the function representing the relationship between the microphone arrays
121 and 122 for the plane wave in the above equation (20) is a relationship function E (k'x, dy),
the relationship function E (k'x, dy) is It is expressed by the following equation (24).
[0077]
[0078]
11-04-2019
20
Similarly, assuming that the function representing the relationship between the microphone
arrays 121 and 122 for the spherical wave in the above equation (20) is a relationship function F
(k'x, dy), the relationship function F (k'x, dy) Is expressed by the following equation (25).
Then, according to the magnitude relationship between the absolute value of the variable k'x and
the absolute value of the wave number k, using the function G0 to replace the second kind
Hankel function and the modified Bessel function, the relationship function F (k'x, dy) Is
expressed by the following equation (26).
[0079]
[0080]
Therefore, from the equation (24) and the equation (26), the above equation (20) can be
expressed as the following equation (27).
[0081]
[0082]
Then, based on simultaneous equations of this equation (27) and the above equation (12), the
spherical wave component wave number region signal Q1 'can be expressed as the following
equation (28).
[0083]
[0084]
Here, as described above, the relation function E (k'x, dy) represents the relation between the
microphone arrays 121 and 122 for plane waves, and the relation function F (k'x, dy) is It
represents the relationship between the microphone arrays 121 and 122 for spherical waves.
11-04-2019
21
Therefore, as long as the relationship between the microphone arrays 121 and 122 is fixed, the
relationship does not change, and the relationship function E (k'x, dy) and the relationship
function F (k'x, dy) As an output value obtained by computing 値, a value obtained by prior
observation or the like can be used.
[0085]
Specifically, in a state in which only a plane wave is observed in advance using the sound
collection device 11, a plane wave arriving from any angle θ is observed, and variables k′x for
each angle θ are in the plane wave component wave number region The output value of the
relational function E (k'x, dy) can be obtained by inputting to the signals P1 'and P2' and
calculating the following equation (29).
Similarly, in a state where only the spherical wave is observed in advance using the sound
collection device 11, the spherical wave coming from every angle θ and the position yr is
observed, and the variable k′x and the position for each angle θ The output value of the
relational function F (k′x, dy) can be determined by inputting yr to the spherical wave
component wave number domain signals Q1 ′ and Q2 ′ and operating the following equation
(30).
[0086]
[0087]
Then, the spherical wave extraction processing unit 33 holds the output values of the relation
function E (k'x, dy) and the relation function F (k'x, dy) thus obtained in advance, By calculating
(28), the spherical wave component wave number region signal Q1 'can be determined.
[0088]
Thus, in the sound collection device 11, when observing the sound wave to be extracted, the
relational function E (k'x, dy) and the relational function F obtained by calculating the abovementioned equations (24) and (26) Besides using (k'x, dy), from the desired sound source using
the output values of the relationship function E (k 'x, dy) and the relationship function F (k' x, dy)
obtained by prior measurement etc. Sound waves may be extracted.
11-04-2019
22
[0089]
Next, with reference to FIGS. 2 to 5, an effect of extracting a desired sound source by the sound
collection device 11 will be described using a spatial sensitivity distribution chart.
The spatial sensitivity distribution chart indicates how large the sound at the position (x, y) can
be collected when the sound is collected by the microphone.
[0090]
FIG. 2 shows the spatial sensitivity distribution of a single omnidirectional microphone.
In FIG. 2, the microphones are arranged at the position (0, 0), and the spatial sensitivity
distribution map is represented as a contour map.
In the space sensitivity distribution chart of FIG. 2, for example, when the sound source is at a
distance of 1.5 m from the microphone, it is shown that the sound is collected as a small sound at
-30 dB.
[0091]
FIGS. 3 to 5 show spatial sensitivity distribution maps of the sound collection device 11.
In these spatial sensitivity distributions, the microphone arrays 121 and 122 are arranged at
intervals of 10 cm (interval dy = 10 cm), and in the microphone arrays 121 and 122, 64
microphone elements 21 are 5 cm in the X-axis direction. It is calculated | required by computer
simulation noting that it arrange | positions by space | interval (space | interval dx = 5 cm).
The center of the microphone array 121 is assumed to be position (0, 0).
11-04-2019
23
[0092]
Also, since the spatial sensitivity distribution has different characteristics for each frequency, it is
shown for frequencies (500 Hz, 1 kHz, 2 kHz) that are important for human voice as a
representative frequency.
That is, FIG. 3 shows a spatial sensitivity distribution chart when the frequency of the collected
sound is 500 Hz.
Further, FIG. 4 shows a space sensitivity distribution chart when the frequency of the collected
sound is 1 kHz, and FIG. 5 shows a space sensitivity distribution map when the frequency of the
collected sound is 2 kHz. It is shown.
[0093]
As shown in FIGS. 3 to 5, the sound collection device 11 can pick up sound from a sound source
located at a position distant from a predetermined distance as being small.
For example, as shown in FIG. 3, when the frequency of the sound to be collected is 500 Hz, the
sound collection device 11 has a sound volume that is reduced by -30 dB or more for a sound
source located at a distance of 0.75 m. It can be picked up as
Thus, it is shown that only the sounds close to the microphone arrays 121 and 122 can be
extracted even if the close sounds and the distant sounds are mixed.
As shown in FIGS. 4 and 5, when the frequencies of the collected sounds are 1 kHz and 2 kHz,
only the sounds close to the microphone arrays 121 and 122 can be extracted with a larger
difference.
[0094]
As described above, the sound collection device 11 can suppress noise (plane wave) coming from
a distance and extract only sound (spherical wave) coming from a desired sound source near the
11-04-2019
24
microphone arrays 121 and 122.
[0095]
Next, FIG. 6 is a flowchart for explaining a process of the sound collection device 11 of FIG. 1
extracting a sound from a desired sound source.
[0096]
For example, sound waves are observed by the microphone arrays 121 and 122, and sound wave
signals are supplied from the microphone elements 211-1 to 211-M to the fast Fourier transform
units 311-1 to 311-M, and the microphone elements 212-1 to 212- Processing is started when a
sound signal is supplied from M to the fast Fourier transform units 312-1 to 312-M.
[0097]
In step S11, the fast Fourier transform units 311-1 to 311-M perform spatial Fourier transform
unit 321 on frequency domain signals obtained by performing fast Fourier transform on sound
wave signals supplied from the microphone elements 211-1 to 211-M. Supply to
Similarly, the fast Fourier transform units 312-1 to 312 -M transmit, to the spatial Fourier
transform unit 322, frequency domain signals obtained by performing fast Fourier transform on
sound wave signals supplied from the microphone elements 212-1 to 212 -M. Supply.
[0098]
In step S12, the spatial Fourier transform unit 321 performs spatial Fourier transform according
to the positions of the microphone elements 211-1 to 211-M with respect to the frequency
domain signals supplied from the fast Fourier transform units 311-1 to 311-M. The applied wave
number domain signal is supplied to the spherical wave extraction processing unit 33.
Similarly, the spatial Fourier transform unit 322 performs spatial Fourier transform according to
the position of the microphone elements 212-1 to 212-M on the frequency domain signals
supplied from the fast Fourier transform units 312-1 to 312-M. The wave number domain signal
is supplied to the spherical wave extraction processing unit 33.
11-04-2019
25
[0099]
In step S 13, the spherical wave extraction processing unit 33 uses the wave number domain
signal supplied from the spatial Fourier transform unit 321 and the wave number domain signal
supplied from the spatial Fourier transform unit 322 to obtain a sound wave signal that has
arrived from the desired sound source Extract a wave number domain signal based on
For example, the spherical wave extraction processing unit 33 can extract the spherical wave
component wave number region signal Q1 ′ included in the sound wave signal output from the
microphone array 121 by computing the following equation (21) described above .
[0100]
In step S14, the inverse space Fourier transform unit 34 performs inverse space Fourier
transform according to the positions of the microphone elements 211-1 to 211-M on the
spherical wave component wave number area signal supplied from the spherical wave extraction
processing unit 33. To calculate a plurality of spherical wave component frequency domain
signals.
[0101]
In step S <b> 15, the signal determination unit 35 obtains spherical wave component frequency
domain signals obtained using the delay sum array as described above for the plurality of
spherical wave component frequency domain signals obtained by the inverse space Fourier
transform unit 34. In order to output as a sound wave signal, the inverse fast Fourier transform
unit 36 determines as a target to be subjected to the inverse fast Fourier transform.
[0102]
In step S16, the inverse fast Fourier transform unit 36 performs inverse fast Fourier transform
on the spherical wave component frequency domain signal determined by the signal
determination unit 35, and outputs the acoustic wave signal of the spherical wave obtained as a
result.
[0103]
11-04-2019
26
As described above, the sound wave extraction processing unit 13 can extract the component of
the spherical wave included in the sound wave signal observed by the microphone arrays 121
and 122, that is, the sound wave from the desired sound source.
In other words, in the sound collection device 11, by introducing the concept of acoustic
holography on the basis of a linear microphone array that has been used so far to extract human
voice, it is not a two-dimensional plane, but two in front of and behind The arrayed microphone
arrays 121 and 122 can be used to extract sound waves from the desired sound source.
[0104]
In particular, in the sound collection device 11, even when the desired sound source and the
noise source are in the same direction, the sound from the desired sound source can be extracted
based on the relationship between the spherical wave and the plane wave.
Specifically, the voice recognition rate of the vending machine can be improved by adopting the
sound collecting device 11 as a vending machine and, for example, picking up only utterances
close to the vending machine.
In addition, the sound collection device 11 can also be used by a store clerk to hear a voice, for
example, in a speech order such as a drive-through.
Furthermore, by adopting the sound collection device 11 in a video conference or the like, it is
possible to capture the voices of only the participants and not to collect the sound at a certain
distance.
[0105]
As described above, the sound collection device 11 can extract the sound at any place from the
plural sounds generated from the plural positions, such as a telephone, a videophone, a television
relay, a conversation recording, etc. It can be applied to the technology of collecting sound.
[0106]
11-04-2019
27
In the present embodiment, the description has been made assuming that the desired sound
source is at a position near a predetermined distance from the sound collection device 11.
However, for example, it is assumed that the desired sound source is at a distant position away
from A sound from a distant desired sound source may be extracted.
That is, instead of calculating the above equation (21) to obtain the spherical wave component
wave number domain signal Q1 ′, the plane wave component wave number domain signal P1 is
obtained in simultaneous equations using the above equations (12) and (20). By obtaining ', it is
possible to extract a plane wave coming from a distant desired sound source.
[0107]
Note that the processes described with reference to the above-described flowchart do not
necessarily have to be processed in chronological order according to the order described as the
flowchart, and processes performed in parallel or individually (for example, parallel processes or
objects Processing) is also included. The program may be processed by one CPU or may be
distributed and processed by a plurality of CPUs. Furthermore, the program may be transferred
to a remote computer for execution.
[0108]
Further, the series of processes (information processing method) described above can be
performed by hardware or software. When a series of processes are executed by software, the
various functions are executed by installing a computer in which a program constituting the
software is incorporated in dedicated hardware or various programs. The program can be
installed, for example, on a general-purpose personal computer from a program recording
medium on which the program is recorded.
[0109]
FIG. 7 is a block diagram showing an example of a hardware configuration of a computer that
executes the series of processes described above according to a program.
11-04-2019
28
[0110]
In the computer, a central processing unit (CPU) 101, a read only memory (ROM) 102, and a
random access memory (RAM) 103 are mutually connected by a bus 104.
[0111]
Further, an input / output interface 105 is connected to the bus 104.
The input / output interface 105 includes an input unit 106 including a keyboard, a mouse and a
microphone, an output unit 107 including a display and a speaker, a storage unit 108 including a
hard disk and a non-volatile memory, and a communication unit 109 including a network
interface. A drive 110 for driving a removable medium 111 such as a magnetic disk, an optical
disk, a magneto-optical disk, or a semiconductor memory is connected.
[0112]
In the computer configured as described above, for example, the CPU 101 loads the program
stored in the storage unit 108 into the RAM 103 via the input / output interface 105 and the bus
104 and executes the program. Processing is performed.
[0113]
The program executed by the computer (CPU 101) is, for example, a magnetic disk (including a
flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile
Disc), etc.), a magneto-optical disk, or a semiconductor It is recorded on a removable medium
111 which is a package medium including a memory or the like, or is provided via a wired or
wireless transmission medium such as a local area network, the Internet, and digital satellite
broadcasting.
[0114]
The program can be installed in the storage unit 108 via the input / output interface 105 by
mounting the removable media 111 in the drive 110.
The program can be received by the communication unit 109 via a wired or wireless
11-04-2019
29
transmission medium and installed in the storage unit 108.
In addition, the program can be installed in advance in the ROM 102 or the storage unit 108.
[0115]
The present embodiment is not limited to the above-described embodiment, and various
modifications can be made without departing from the scope of the present disclosure.
[0116]
11 sound pickup apparatus, 121 and 122 microphone arrays, 13 sound wave extraction
processing units, 211-1 to 211-M and 212-1 to 212-M microphone elements, 311-1 to 311-M
and 312-1 to 312-M Fast Fourier Transform Unit, 321 and 322 Space Fourier Transform Unit,
33 Spherical Wave Extraction Processing Unit, 34 Inverse Space Fourier Transform Unit, 35
Signal Determination Unit, 36 Inverse Fast Fourier Transform Unit
11-04-2019
30
Документ
Категория
Без категории
Просмотров
0
Размер файла
42 Кб
Теги
description, jp2015164267
1/--страниц
Пожаловаться на содержимое документа