close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2012244211

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012244211
PROBLEM TO BE SOLVED: To provide an acoustic signal processing device, an acoustic signal
processing method, and an acoustic signal processing program capable of estimating the number
of sound sources even when the sound sources are located closer to a distance between
microphones. SOLUTION: An acoustic signal processing device is based on a first input unit and a
second input unit to which an acoustic signal arriving from a sound source is input, and an
acoustic signal input to the first input unit and the second input unit. A mapping unit that maps
the amplitude value of the acoustic signal input to the first input unit and the amplitude value of
the acoustic signal input to the second input unit to a coordinate system having the amplitude
values as coordinate axes A detection unit that detects a linear component included in the
distribution of amplitude values on the coordinate system mapped by the mapping unit; and an
estimation unit that estimates the number of sound sources based on the number of linear
components detected by the detection unit. , And. [Selected figure] Figure 1
Acoustic signal processing apparatus, acoustic signal processing method, and acoustic signal
processing program
[0001]
The present invention relates to an acoustic signal processing device, an acoustic signal
processing method, and an acoustic signal processing program.
[0002]
In the technical field of collecting a target acoustic signal in a noisy environment using a device
11-04-2019
1
such as a telephone or voice recorder, the target acoustic signal is collected more clearly by
reducing the noise and extracting the target acoustic signal. It is required to do.
As a method of extracting a target acoustic signal in a noisy environment, a method has been
proposed in which an acoustic signal arriving from a target sound source is extracted by
estimating the number of sound sources and their directions from the acoustic signal and
separating and extracting each sound source. ing.
[0003]
For example, in acoustic signal processing disclosed in Patent Document 1, amplitude data of
acoustic signals input to two microphones is decomposed into phase differences for each
frequency component and analyzed. By dividing the phase difference for each frequency
component into groups in the same direction, the number of sound sources is estimated, and a
target acoustic signal is extracted.
[0004]
Unexamined-Japanese-Patent No. 2006-340391
[0005]
In Patent Document 1, acoustic signal processing is performed based on the assumption that the
sound source is located far enough from the microphone as compared with the distance between
the two microphones.
When using a telephone or voice recorder, it is highly likely that the speaker who is the sound
source is located near the microphone. As described above, when the above-described
assumption does not hold, the sound signal processing of Patent Document 1 may not be able to
estimate the number of sound sources well.
[0006]
The present invention has been made in view of the above-described point, and an acoustic signal
processing apparatus, an acoustic signal processing method, and an acoustic signal capable of
11-04-2019
2
estimating the number of sound sources even when the sound sources are located closer to a
distance between microphones. A signal processing program is provided.
[0007]
An acoustic signal processing device according to the present invention is based on a first input
unit and a second input unit to which an acoustic signal arriving from a sound source is input,
and an acoustic signal input to the first input unit and the second input unit. A mapping unit that
maps the amplitude value of the acoustic signal input to the first input unit and the amplitude
value of the acoustic signal input to the second input unit to a coordinate system having the
amplitude values as coordinate axes A detection unit that detects a linear component included in
the distribution of amplitude values on the coordinate system mapped by the mapping unit; and
an estimation unit that estimates the number of sound sources based on the number of linear
components detected by the detection unit. , And.
[0008]
In the acoustic signal processing method according to the present invention, an amplitude value
of the acoustic signal input to the first input unit based on the acoustic signal input through the
first input unit and the second input unit, and Mapping the amplitude values of the acoustic
signal input to the second input unit to a coordinate system having the respective amplitude
values as coordinate axes; distribution of the amplitude values on the coordinate system mapped
in the mapping step It is characterized by comprising: a detection step of detecting a contained
linear component; and an estimation step of estimating the number of the sound sources based
on the number of linear components detected by the detection step.
[0009]
Further, according to an acoustic signal processing program according to the present invention,
an amplitude value of the acoustic signal inputted to the first input unit based on the acoustic
signal inputted through the first input unit and the second input unit, and A mapping procedure
for mapping the amplitude value of the acoustic signal input to the second input unit to a
coordinate system having each of the amplitude values as a coordinate axis; and a distribution of
amplitude values on the coordinate system mapped by the mapping procedure It is a program for
causing a computer to execute a detection procedure for detecting a contained linear component
and an estimation procedure for estimating the number of sound sources based on the number of
linear components detected by the detection procedure.
[0010]
According to the present invention, the number of sound sources can be estimated even when
11-04-2019
3
the sound sources are located closer to the microphones than the distance between the
microphones.
[0011]
FIG. 1 is a diagram showing an acoustic signal processing device according to a first embodiment.
The figure which shows the acoustic signal input into the acoustic signal processing apparatus
which concerns on 1st Embodiment.
The figure which shows the amplitude value distribution of the acoustic signal which concerns on
1st Embodiment.
The figure which shows the simulation result of the acoustic signal processing apparatus which
concerns on 1st Embodiment.
The figure which shows the simulation result of the acoustic signal processing apparatus which
concerns on 1st Embodiment.
The figure which shows the simulation result of the acoustic signal processing apparatus which
concerns on 1st Embodiment. The figure which shows the simulation result of the acoustic signal
processing apparatus which concerns on 1st Embodiment. The figure which shows the Hough
conversion result of the acoustic signal processing apparatus which concerns on 1st
Embodiment. The figure which shows the acoustic signal processing apparatus which concerns
on 2nd Embodiment. The figure which shows the frequency decomposition part which concerns
on 2nd Embodiment. The figure which shows the acoustic signal processing apparatus which
concerns on 3rd Embodiment. The figure which shows the acoustic signal processing apparatus
which concerns on 4th Embodiment. The figure which shows the acoustic signal processing
apparatus which concerns on 5th Embodiment. The figure which shows the hardware
constitutions of the acoustic signal processing apparatus which concerns on 1st Embodiment.
[0012]
11-04-2019
4
First Embodiment FIG. 1 is a diagram showing an acoustic signal processing device 1 according
to a first embodiment. The acoustic signal processing device 1 estimates the number of sound
sources based on an acoustic signal arriving from a sound source (not shown) located at a
location distant from the acoustic signal processing device 1.
[0013]
The acoustic signal processing apparatus 1 includes two microphones 101A and 101B as input
units, an A / D conversion unit 102 that A / D converts an acoustic signal received through the
microphones 101A and 101B, and amplitude values of the acoustic signal. Mapping section 103
for mapping, detection section 104 for detecting a linear component included in the
simultaneous distribution of the amplitude values mapped by mapping section 103, and
estimation section 105 for estimating the number of sound sources based on the linear
component detected by detection section 104. And
[0014]
The microphone 101A operates as a first input unit to which an acoustic signal arriving from a
sound source (not shown) is input.
The microphone 101A converts the input acoustic signal into an electrical signal, and passes it to
the A / D conversion unit 102 in the subsequent stage as a first acoustic signal. The microphone
101B operates as a second input unit to which an acoustic signal arriving from a sound source
(not shown) is input. The microphone 101B converts the input acoustic signal into an electrical
signal, and passes it to the A / D conversion unit 102 in the subsequent stage as a second
acoustic signal. The two microphones 101A and 101B are installed at predetermined intervals.
[0015]
The A / D converter 102 performs signal processing on the analog first and second acoustic
signals received via the microphones 101A and 101B to generate digital first and second
acoustic signals. The A / D conversion unit 102 passes the generated digital first and second
acoustic signals to the mapping unit 103.
11-04-2019
5
[0016]
The mapping unit 103 maps the amplitude values of the first and second acoustic signals into a
two-dimensional space to generate a simultaneous distribution of the amplitude values. The
mapping unit 103 passes the generated simultaneous distribution of amplitude values to the
detection unit 104.
[0017]
The detection unit 104 detects a linear component from the simultaneous distribution generated
by the mapping unit 103. The estimation unit 105 estimates the number of sound sources of the
acoustic signal from the number of linear components detected by the detection unit 104.
Specifically, the number of linear components is estimated as the number of sound sources. The
estimation unit 105 passes the estimated number of sound sources to the upper layer (not
shown). The upper layer reduces noise from the acoustic signal and calculates a target acoustic
signal by calculating the direction of the sound source based on the number of sound sources.
[0018]
Next, details of each part and the principle of estimating the number of sound sources will be
described with reference to FIGS. 2 to 8. FIG. 2 is a diagram showing the relationship between the
sound sources 10A and 10B and the microphones 101A and 101B. Here, in order to simplify the
explanation, the number of sound sources will be described as two.
[0019]
As shown in FIG. 2, acoustic signals 11A and 11B are output from the two sound sources 10A
and 10B, respectively. The acoustic signal 11A output from the sound source 10A is a signal in
which a signal having a predetermined amplitude value and a signal having an amplitude value of
zero are repeated. Specifically, for example, when the sound source 10A is a person and the voice
emitted by the person is the acoustic signal 11A, first, the amplitude value of the acoustic signal
11A differs between the consonant and the vowel. In addition, there are periods of silence due to
breaks in sentences and word breaks. The silent period is the period of the signal whose
11-04-2019
6
amplitude value is zero.
[0020]
A sound signal output from the sound source 10A has an amplitude value for a certain period
and the rest is a signal (amplitude value other than the voice emitted by a person) Is a signal that
is zero). As described above, the acoustic signal 11A output from the sound source 10A is a
signal including a signal having a predetermined amplitude value and a signal having an
amplitude value of zero. Similarly to the acoustic signal 11A, the acoustic signal 11B output from
the sound source 10B is also a signal including a signal having a predetermined amplitude value
and a signal having an amplitude value of zero. In the following description, in order to simplify
the description, it is assumed that the acoustic signals 11A and 11B output from the sound
sources 10A and 10B are signals in which a sine wave signal and a signal with zero amplitude
value are alternately repeated. In FIG. 2, the acoustic signal 11A is indicated by a solid line, and
the acoustic signal 11B is indicated by a broken line.
[0021]
As shown in FIG. 2, the period of zero amplitude value of the sound signal 11A output from the
sound source 10A and the period of zero amplitude value of the sound signal 11B output from
the sound source 10B do not overlap. That is, when the sound signal 11A of a sine wave is output
from the sound source 10A, the sound signal 11B of zero amplitude value is output from the
sound source 10B, and when the sound signal 11A of zero amplitude value is output from the
sound source 10A, the sound source 10B. Output a sinusoidal acoustic signal 11B.
[0022]
The acoustic signals 11A and 11B output from the sound sources 10A and 10B are input to the
microphones 101A and 101B, respectively. The amplitudes of the acoustic signals 11A and 11B
are attenuated while propagating in the space between the sound sources 10A and 10B and the
microphones 101A and 101B. The sound 12A and the sound signal 12B are input to the
microphone 101A. The sound signal 12A is a signal obtained by attenuating the sound signal
11A output from the sound source 10A, and the sound signal 12B is a signal obtained by
attenuating the sound signal 11B output from the sound source 10B. The sound 13A and the
sound signal 13B are input to the microphone 101B. The sound signal 13A is a signal obtained
11-04-2019
7
by attenuating the sound signal 11A output from the sound source 10A, and the sound signal
13B is a signal obtained by attenuating the sound signal 11B output from the sound source 10B.
[0023]
The attenuation factor of the amplitude of the sound signals 11A and 11B is proportional to the
square of the distance between the sound sources 10A and 10B and the microphones 101A and
101B. In the example of FIG. 2, since the distance between the sound source 10A and the
microphone 101B is greater than the distance between the sound source 10A and the
microphone 101A, the amplitude is input to the microphone 101B from the amplitude of the
acoustic signal 12A input from the sound source 10A to the microphone 101A. The amplitude of
the received acoustic signal 13A is smaller. Similarly, since the distance between the sound
source 10B and the microphone 101A is greater than the distance between the sound source
10B and the microphone 101B, the sound input to the microphone 101A from the amplitude of
the acoustic signal 13B input from the sound source 10B to the microphone 101B. The
amplitude of the signal 12B is smaller.
[0024]
The microphone 101A converts the first acoustic signal 12 obtained by superposing the input
acoustic signals 12A and 12B into an electrical signal, and outputs the electrical signal to the A /
D conversion unit 102. The first acoustic signal 12 is a signal in which sine waves having
different amplitudes are continuous.
[0025]
The microphone 101B converts the second acoustic signal 13 obtained by superimposing the
input acoustic signals 13A and 13B into an electrical signal, and outputs the electrical signal to
the A / D conversion unit 102. The second acoustic signal 13 is a signal in which sine waves
having different amplitudes are continuous.
[0026]
11-04-2019
8
Next, as shown in FIG. 3A, the A / D converter 102 samples the first and second audio signals 12
and 13 at a predetermined sampling period T, and the audio signals 12 and 13 are analog to
digital signals. Convert to The A / D conversion unit 102 outputs the first and second acoustic
signals 12 and 13 converted to digital signals to the mapping unit 103.
[0027]
The mapping unit 103 maps the amplitude values of the first and second acoustic signals 12 and
13 in a coordinate system in which each amplitude value is a coordinate axis. The mapping unit
103 sets the sampling period T of the A / D conversion unit 102 in a two-dimensional orthogonal
coordinate system in which the amplitude value x1 of the first acoustic signal 12 is the x axis and
the amplitude value x2 of the second acoustic signal 13 is the y axis. The amplitude values (x 1
(nT), x 2 (nT)) (n is an integer) of the first and second acoustic signals 12 and 13 sampled in the
above are mapped.
[0028]
The schematic diagram at the time of mapping the amplitude value (x1 (nT), x2 (nT)) of the 1st,
2nd acoustic signal 12 and 13 to the coordinate system which makes each amplitude value a
coordinate is shown in FIG.3 (b). Show. As shown in FIG. 3B, the amplitude values (x1 (nT), x2
(nT)) of the first and second acoustic signals 12 and 13 are mapped on two straight lines.
[0029]
The acoustic signal 11A output from the sound source 10A is attenuated while propagating in
space. The attenuation factor of the acoustic signal 11A is proportional to the square of the
distance. The attenuation factor of the sound signal 11A from the sound source 10A to the
microphone 101A is 1 / a, and the attenuation factor of the sound signal 11A from the sound
source 10A to the microphone 101B is 1 / b. Assuming that the amplitude value of the sound
signal 11A output from the sound source 10A is x, the amplitude value x1A of the sound signal
12A input to the microphone 101A is x1A = a × x, and the amplitude of the sound signal 13A
input to the microphone 101B. The value x2A is x2A = b × x. When the amplitude values (x1A,
x2A) = (a × x, b × x) of the acoustic signals 12A and 13A are mapped to a coordinate system
having the respective amplitude values as coordinate axes, the amplitude values (a × x, b × x)
are , And is mapped on a straight line (b / a) × x having an inclination depending on the
11-04-2019
9
attenuation factors 1 / a and 1 / b of the acoustic signal 11A and passing through the origin.
[0030]
Similarly, the attenuation factor of the sound signal 11B from the sound source 10B to the
microphone 101A is 1 / c, and the attenuation factor of the sound signal 11B from the sound
source 10B to the microphone 101B is 1 / d. Assuming that the amplitude value of the audio
signal 11B output from the sound source 10B is x, the amplitude value x1B of the audio signal
12B input to the microphone 101A is x1B = c × x, and the amplitude of the audio signal 13B
input to the microphone 101B The value x2B is x2B = d x x. When the amplitude values (x1B,
x2B) = (c × x, d × x) of the acoustic signals 12B and 13B are mapped to a coordinate system
having the respective amplitude values as coordinate axes, the amplitude values (c × x, d × x)
are , And is mapped on a straight line (d / c) × x passing through the origin with a slope
depending on the attenuation factors 1 / c and 1 / d of the acoustic signal 11B.
[0031]
As shown in FIG. 2, while one of the acoustic signal 11A output from the sound source 10A and
the acoustic signal 11B output from the sound source 10B outputs a sine wave signal, the other
outputs a signal of zero amplitude value. There is. Therefore, in the first and second acoustic
signals 12 and 13, the acoustic signals 12A and 13A from the sound source 10A and the acoustic
signals 12B and 13B from the sound source 10B do not overlap, and the first and second
acoustic signals 12 and 13 do not overlap. If one of the sound signals (for example, the sound
signals 12A and 13A) appears, the other sound signal (for example, the sound signals 12B and
13B) does not appear.
[0032]
When the amplitude values of the first and second acoustic signals 12 and 13 are mapped to a
coordinate system having coordinate values of the amplitude values, the acoustic signals 12A and
13A input from the sound source 10A to the microphones 101A and 101B and the microphones
from the sound source 10B The acoustic signals 12B and 13B input to 101A and 101B are
mapped respectively. As described above, the acoustic signals 12A and 13A are mapped on a
straight line passing through the origin with an inclination depending on the attenuation factors
a and b of the acoustic signal 11A, and the acoustic signals 12B and 13B are attenuation factors
11-04-2019
10
c of the acoustic signal 11B. , And d are mapped on a straight line passing through the origin
with an inclination depending on.
[0033]
When there is a period in which only the acoustic signal arriving from one of the sound source
10A or the sound source 10B is included in the first and second acoustic signals 12 and 13 input
to the microphones 101A and 101B, the acoustic signals in this period have respective
amplitudes It will be represented as a straight line in a coordinate system with values as
coordinate axes. Therefore, the number of linear components represented in the coordinate
system matches the number of sound sources. Therefore, the detection unit 104 detects a linear
component from the amplitude values of the first and second acoustic signals 12 and 13 mapped
by the mapping unit 103, and the estimation unit 105 detects the number of linear components
detected by the detection unit 104. By estimating, the acoustic signal processing apparatus 1 can
estimate the number of sound sources.
[0034]
Although in FIG. 2 the acoustic signals 11A and 11B have been described as those in which the
sine wave component of the sound signal 11A and the sine wave component of the sound signal
11B do not overlap for the sake of simplicity, actually, the sound sources 10A and 10B output In
many cases, the sine wave components of the received acoustic signals 11A and 11B overlap. For
example, when the acoustic signals 11A and 11B are voices (voices) emitted by a person, when
the acoustic signals 11A and 11B are simultaneously output from the sound sources 10A and
10B, the acoustic signals 11A and 11B are superimposed on the microphones 101A and 101B.
Be done.
[0035]
However, as described above, human voices and noises are not always signals having amplitude
values, but signals in which silence periods exist. Therefore, for example, even if a person emits
voice simultaneously from the sound sources 10A and 10B, the first and second acoustic signals
12 and 13 input to the microphones 101A and 101B are reached from any one of the sound
sources 10A and 10B. There is a period in which only the acoustic signals 11A and 11B are
included. The acoustic signal processing device 1 according to the present embodiment detects a
11-04-2019
11
straight line based on the first and second acoustic signals 12 and 13 including only one of the
acoustic signals 11A and 11B to estimate the number of sound sources. Therefore, even if the
sound signals 11A and 11B output by the sound sources 10A and 10B simultaneously reach the
microphones 101A and 101B, for example, when a person emits a voice from the sound source
at the same time, the sound signal processing device 1 of this embodiment It can be estimated.
[0036]
Next, simulation results in which the number of sound sources is estimated using the acoustic
signal processing device 1 according to the present embodiment will be described using FIGS. 4
to 7. The simulation was performed with the sampling period of the A / D conversion unit 102
set to 8 kHz.
[0037]
FIG. 4 is a diagram showing a simulation result in the case where a person makes a voice from
one sound source 10A. FIG. 4A is a diagram showing an acoustic signal 11A output from the
sound source 10A. When the acoustic signal 11A shown in FIG. 4 (a) is input to the acoustic
signal processing device 1, the microphone 101A obtains the first acoustic signal 12 shown in
FIG. 4 (b). In the microphone 101B, the second acoustic signal 13 shown in FIG. 4 (c) is obtained.
The distance from the sound source 10A is such that the amplitude of the second acoustic signal
13 is smaller than that of the first acoustic signal 12 because the microphone 101B is farther
than the microphone 101A.
[0038]
FIG. 4D is a diagram showing a simultaneous distribution of amplitude values to which the
amplitude values of the first and second acoustic signals 12 and 13 are mapped. In this
simulation, the number of sound sources is one, so the distribution of the amplitude values is one
straight line. FIG.4 (e) is a figure which shows the histogram of FIG.4 (d). The horizontal axis of
the graph shown in FIG. 4 (e) represents the azimuth angle φ (radian) of the mapped amplitude
values of the first and second acoustic signals 12 and 13, and the vertical axis represents the
number of amplitude values whose azimuth angle is φ. Is shown. The histogram shown in FIG. 4
(e) has one peak, and it can be seen that the distribution of the amplitude values is one straight
line. Thus, the number of sound sources (one here) can be estimated by estimating the number of
11-04-2019
12
linear components (one here).
[0039]
FIG. 5 is a diagram showing simulation results in the case where a person makes a voice from
each of the two sound sources 10A and 10B. FIG. 5A shows an acoustic signal 11B output from
the sound source 10B. An acoustic signal 11A shown in FIG. 4A is output from the sound source
10A. When the acoustic signals 11A and 11B shown in FIGS. 4 (a) and 5 (a) are input to the
acoustic signal processing apparatus 1, the microphone 101A obtains the first acoustic signal 12
shown in FIG. 5 (b). In the microphone 101B, the second acoustic signal 13 shown in FIG. 5 (c) is
obtained. Since different sound signals 11A and 11B are simultaneously output from the two
sound sources 10A and 10B, the first and second sound signals 12 and 13 become signals having
different amplitude values.
[0040]
FIG. 5D is a diagram showing a simultaneous distribution of amplitude values to which the
amplitude values of the first and second acoustic signals 12 and 13 are mapped. As shown in FIG.
5D, it can be seen that two linear components are included in the distribution of the amplitude
value. FIG. 5 (e) shows the histogram of FIG. 5 (d). The histogram is calculated in the same
manner as in FIG. The histogram shown in FIG. 5 (e) has two peaks, and it can be understood
from the histogram that the simultaneous distribution of the amplitude values also includes two
linear components. As described above, the number of sound sources (two in this case) can be
estimated by estimating the number of linear components (in this case, two).
[0041]
Next, as shown in FIG. 6, simulation results in the case where a person makes a voice from three
sound sources 10A, 10B and 10C will be described. Acoustic signals (here, voices emitted by
people) output from the three sound sources 10A, 10B, and 10C are input to the microphones
101A and 101B through propagation paths respectively different in distance.
[0042]
11-04-2019
13
FIG. 7A shows an acoustic signal output from the sound source 10C. The sound source 10A
outputs the acoustic signal 11A shown in FIG. 4A, and the sound source 10B outputs the acoustic
signal 11B shown in FIG. 5A. When the acoustic signals shown in FIGS. 4A, 5A, and 7A are input
to the acoustic signal processing apparatus 1, the microphone 101A obtains the first acoustic
signal 12 shown in 7B. . In the microphone 101B, the second acoustic signal 13 shown in FIG. 7C
is obtained. Since different sound signals are simultaneously output from the three sound sources
10A, 10B and 10C, the first and second sound signals 12 and 13 become signals having different
amplitude values.
[0043]
FIG. 7D is a diagram showing a simultaneous distribution of amplitude values to which the
amplitude values of the first and second acoustic signals 12 and 13 are mapped. FIG.7 (e) is a
figure which shows the histogram of FIG.7 (d). The histogram is calculated in the same manner as
in FIG. The histogram shown in FIG. 7E has three peaks, and it can be seen that the simultaneous
distribution of amplitude values includes three linear components. As described above, by
estimating the number of linear components (here, three), it is possible to estimate the number of
sound sources (here, three).
[0044]
Thus, the number of sound sources can be easily estimated by using the acoustic signal
processing device 1 according to the present embodiment. As shown in FIG. 7, even if the
number of sound sources is larger than the number of microphones, estimation of the number of
sound sources is possible. Also, since the number of sound sources is estimated by focusing
attention on the fact that the attenuation of the acoustic signal differs depending on the distance
between the sound source and the microphone, the distance between the sound source and the
microphone is not sufficiently large compared to the distance between the microphones. That is,
the number of sound sources can be estimated even when the sound sources are located close to
the distance between the microphones.
[0045]
As shown in FIG. 4E, the detection unit 104 calculates a histogram to detect a linear component,
and the estimation unit 105 performs, for example, threshold determination of the peak of the
histogram to estimate the number, that is, estimate the number of sound sources. You may do so.
11-04-2019
14
[0046]
Besides, there is also a method of detecting a linear component by performing Hough transform
processing on the simultaneous distribution of amplitude values, for example.
In this case, first, the detection unit 104 detects a linear component by performing Hough
transform processing on the simultaneous distribution of the amplitude values of the first and
second acoustic signals 12 and 13 generated by the mapping unit 103. FIG. 8 shows the result of
the Hough transform processing on the simultaneous distribution of the amplitude values shown
in FIG. As shown in FIG. 8, many of the curves generated from each point of the simultaneous
distribution of amplitude values intersect the x-axis at two points. The number of points where
the curve intersects the x axis is the number of linear components, ie, the number of sound
sources. The estimation unit 105 can estimate the number of sound sources, which is the number
of linear components, by estimating the number of points where the curve intersects the x axis
using, for example, the principle of majority decision.
[0047]
The positional relationship between the sound sources 10A, 10B, and 10C and the microphones
101A and 101B is not limited to the cases shown in FIGS. The sound sources 10A, 10B, and 10C
do not have to be arranged in a line as shown in FIG. Also, the sound sources 10A, 10B, 10C may
be arranged such that the respective sound sources face each other with the microphones 101A,
101B interposed therebetween.
[0048]
Second Embodiment An acoustic signal processing device 2 according to a second embodiment
will be described with reference to FIG. The acoustic signal processing device 2 according to the
present embodiment is different from the first embodiment in that the amplitude values of the
first and second acoustic signals are mapped for each frequency. The other components are the
same as those of the first embodiment, so the same reference numerals are given to the same
components and the description will be omitted.
11-04-2019
15
[0049]
As shown in FIG. 9, the acoustic signal processing device 2 further includes a frequency
decomposition unit 206 in addition to the configuration of the acoustic signal processing device
1. The frequency decomposition unit 206 decomposes the first and second acoustic signals 12
and 13 for each frequency, and outputs an acoustic signal for each frequency to the mapping
unit 203.
[0050]
The details of the frequency decomposition unit 206 will be described with reference to FIG. The
frequency resolution unit 206 includes a Fourier transform unit 601, a resolution unit 602, and
an inverse Fourier transform unit 603. First, when the first acoustic signal 12 is input to the
frequency decomposition unit 206, the Fourier transform unit 601 converts the time domain
signal into a frequency domain signal (frequency acoustic signal).
[0051]
Next, the frequency decomposition unit 206 decomposes the frequency acoustic signal into each
frequency. Here, it is decomposed into three of a first frequency acoustic signal of frequency f1, a
second frequency acoustic signal of frequency f2, and a third frequency acoustic signal of
frequency f3. Note that the frequencies f1 to f3 may be used as center frequencies, and the
signals may be decomposed into first to third frequency acoustic signals having a constant
bandwidth. The decomposition number is not limited to three and may be decomposed into two
or more numbers. Alternatively, the decomposition number may be 1, and only a specific
frequency may be extracted.
[0052]
The first to third frequency acoustic signals decomposed by the decomposition unit 602 are
converted from the frequency domain signal to the time domain signal by the inverse Fourier
transform unit 603. The inverse Fourier transform unit 603 transforms the first to third
11-04-2019
16
frequency acoustic signals into time domain signals and generates first to third time signals. The
inverse Fourier transform unit 603 outputs the generated first to third time signals to the
mapping unit 203.
[0053]
The decomposition unit 602 performs the same process on the second acoustic signal 13, and
generates three of the fourth frequency acoustic signal of frequency f1, the fifth frequency
acoustic signal of frequency f2, and the sixth frequency acoustic signal of frequency f3. To
generate fourth to sixth time signals from fourth to sixth frequency acoustic signals.
[0054]
Return to FIG.
The mapping unit 203 maps the amplitude values of the first and second acoustic signals 12 and
13 for each frequency based on the amplitude values of the first to sixth time signals, and
generates a simultaneous distribution of the amplitude values. Specifically, the mapping unit 203
maps the amplitude value of the first time signal and the amplitude value of the fourth time
signal to a coordinate system in which each amplitude value is a coordinate axis, whereby the
first and second acoustic signals 12 at the frequency f1 are generated. , 13 simultaneous
distribution (hereinafter referred to as amplitude distribution). Generate). The mapping unit 203
maps the amplitude values of the second time signal and the fifth time signal to a coordinate
system having the respective amplitude values as coordinate axes, whereby the amplitude values
of the first and second acoustic signals 12 and 13 at the frequency f2 are obtained. Generate a
distribution. The mapping unit 203 maps the amplitude values of the third time signal and the
sixth time signal to a coordinate system having the respective amplitude values as coordinate
axes, whereby the amplitude values of the first and second acoustic signals 12 and 13 at the
frequency f3 are obtained. Generate a distribution.
[0055]
The detection unit 204 detects linear components included in the frequency value distributions
of the frequencies f1 to f3. The method of detection is the same as in the first embodiment. The
detection unit 204 outputs the detected linear component to the estimation unit 205 for each
frequency. The estimation unit 205 estimates the number of linear components detected by the
11-04-2019
17
detection unit 204 for each frequency. The method of estimating the number of linear
components is the same as that of the first embodiment. The estimation unit 205 estimates the
number of linear components included in the amplitude value distribution of the first and second
acoustic signals 12 and 13, ie, the number of sound sources, using the principle of majority
decision from the estimated number of linear components for each frequency. . Alternatively, the
average value of the number of linear components for each frequency may be used as the
number of sound sources.
[0056]
As described above, the acoustic signal processing apparatus 2 according to the second
embodiment obtains the same effect as that of the first embodiment, and performs frequency
resolution on the first and second acoustic signals 12 and 13 input to the microphones 101A and
101B. The number of sound sources can be more accurately estimated by calculating the
amplitude value distribution for each frequency.
[0057]
In particular, when the type of the target acoustic signal to be input to the microphones 101A
and 101B is known in advance, for example, the acoustic signal processing device 2 is a
telephone, and the target acoustic signal to be input to the microphones 101A and 101B is
human voice. If it is known beforehand, the noise other than the target acoustic signal is reduced
by decomposing the first and second acoustic signals into frequencies specific to the type (sound)
of the acoustic signal, and then the number of sound sources is estimated. It is possible to further
improve the accuracy of estimation of the number of sound sources.
[0058]
Third Embodiment Next, an acoustic signal processing device 3 according to a third embodiment
will be described with reference to FIG.
In the acoustic signal processing device 3 according to the present embodiment, the mapping
unit 303 does not generate the amplitude value distribution for each frequency, but the
amplitude values of the first and second acoustic signals based on the first to sixth time signals. It
differs from the second embodiment in that one distribution is generated.
The same components as those in the first and second embodiments will be assigned the same
11-04-2019
18
reference numerals and descriptions thereof will be omitted.
[0059]
The mapping unit 303 generates amplitude value distributions of the first and second acoustic
signals 12 and 13 based on the first to sixth time signals input from the frequency
decomposition unit 206. The mapping unit 303 sets the amplitude values (A1, A4) of the first
and fourth time signals, the amplitude values (A2, A5) of the second and fifth time signals, and
the amplitude values (A3) of the third and sixth time signals. , A6) are mapped to one coordinate
system with each amplitude value as a coordinate axis. Thereby, the amplitude value
distributions of the first and second acoustic signals 12 and 13 in which the amplitude value
distributions of the frequencies f1 to f3 of the second embodiment are integrated into one are
obtained.
[0060]
The method of estimating the number of sound sources from the amplitude value distribution of
the first and second acoustic signals 12 and 13 generated by the mapping unit 303 is the same
as that in the first embodiment, and therefore the description thereof is omitted.
[0061]
As described above, in the acoustic signal processing device 3 according to the third
embodiment, the same effect as that of the second embodiment can be obtained, and only one
amplitude value distribution is generated by the mapping unit 303. The calculation time of the
sound source number estimation process can be shortened.
[0062]
Fourth Embodiment An acoustic signal processing device 4 according to a fourth embodiment
will be described with reference to FIG.
The acoustic signal processing device 4 according to the present embodiment is different from
the first embodiment in that the mapping unit 403 maps the phases of the first and second
acoustic signals 12 and 13 to a coordinate system having the phases as coordinate axes. .
11-04-2019
19
[0063]
The acoustic signal processing device 4 includes microphones 101A and 101B, an A / D
conversion unit 102, a frequency resolution unit 406, a mapping unit 403, a detection unit 104,
and an estimation unit 105.
The A / D conversion unit 102 generates the digital first and second acoustic signals 12 and 13
as in the first embodiment.
[0064]
The frequency decomposition unit 406 performs Fourier transform processing on the first and
second acoustic signals 12 and 13 input from the A / D conversion unit 102, and converts a time
domain signal into a frequency domain signal. The first and second acoustic signals 12 and 13 in
the frequency domain are referred to as first and second frequency acoustic signals. The
frequency decomposition unit 406 calculates the phase at each frequency of the first and second
frequency acoustic signals. The frequency decomposition unit 406 outputs the phase at each
frequency of the first frequency acoustic signal as the phase of the first acoustic signal 12 to the
mapping unit 403. The frequency decomposition unit 406 outputs the phase at each frequency
of the second frequency acoustic signal as the phase of the second acoustic signal 13 to the
mapping unit 403.
[0065]
The mapping unit 403 maps a set of phases of the same frequency of the first and second
acoustic signals 12 and 13 on a coordinate system having each phase as a coordinate axis, and a
simultaneous distribution of phases (hereinafter referred to as a phase distribution). Generate).
The detection unit 104 detects a line segment from the phase distribution, and the estimation
unit 105 estimates the number of sound sources. The method of detecting the line segment and
the method of estimating the number of sound sources are the same as in the first embodiment,
and thus the description thereof is omitted.
[0066]
11-04-2019
20
As described with reference to FIG. 2 in the first embodiment, the amplitudes of the acoustic
signals 11A and 11B output from the sound sources 10A and 10B are attenuated at a constant
attenuation rate according to the distance to the microphones 101A and 101B. Similarly, the
phases of the acoustic signals 11A and 11B rotate at a constant rate in accordance with the
distances to the microphones 101A and 101B.
[0067]
The difference between the amplitudes of the respective sound signals 12A and 13A received by
the microphones 101A and 101B of the sound signal 11A, that is, the difference between the
attenuations of the amplitudes of the sound signal 11A becomes linear in the amplitude value
distribution of the first and second sound signals. expressed. Similarly, the difference between
the phases of the respective acoustic signals 12A and 13A received by the microphones 101A
and 101B of the acoustic signal 11A, that is, the difference in the amount of rotation of the phase
of the acoustic signal 11A is a line segment on the phase distribution of the first and second
acoustic signals. Is represented.
[0068]
That is, when the phase of the acoustic signal 12A is x-axis, the phase of the acoustic signal 13A
is y-axis, and the phases of the acoustic signals 12A and 13A at each frequency are mapped, they
have inclination according to the phase difference of the acoustic signals 12A and 13B. Each
phase is mapped on a line segment. The same applies to the acoustic signals 12B and 13B, so
even if the phase distribution is used as in this embodiment instead of the amplitude value
distribution of the first embodiment, only the number of line segments included in the phase
distribution is detected. The number of sound sources can be estimated by
[0069]
Fifth Embodiment Next, an acoustic signal processing apparatus 5 according to a fifth
embodiment will be described with reference to FIG. The acoustic signal processing device 5
according to the present embodiment has a configuration in which the acoustic signal processing
device 1 and the acoustic signal processing device 4 are combined.
11-04-2019
21
[0070]
The acoustic signal processing device 5 shown in FIG. 13 includes a mapping unit 103, a
frequency decomposition unit 406, and a mapping unit 403. That is, the acoustic signal
processing device 5 of this embodiment receives the signal input from the A / D conversion unit
102 via the frequency decomposition unit 406 and the mapping unit 103 that receives the input
of the signal directly from the A / D conversion unit 102. And a mapping unit 403 that receives
the The first and second acoustic signals 12 and 13 converted to digital signals by the A / D
conversion unit 102 are input to the mapping unit 103 and the frequency decomposition unit
406, respectively. The mapping unit 103 generates an amplitude value distribution of the input
first and second acoustic signals 12 and 13, and outputs the distribution to the detection unit
504. The frequency decomposition unit 406 calculates the phases of the first and second
acoustic signals 12 and 13, and the mapping unit 403 generates phase distributions of the first
and second acoustic signals 12 and 13. The phase distribution is input to the detection unit 504.
[0071]
The detection unit 504 detects line segments included in the amplitude value distribution input
from the mapping unit 103. The detection unit 504 detects line segments included in the phase
distribution input from the mapping unit 403. The estimation unit 505 estimates the number of
sound sources based on the number of line segments included in the amplitude value distribution
detected by the detection unit 504 and the number of line segments included in the phase
distribution. The estimation unit 505 estimates, for example, the average value of the number of
line segments included in each of the amplitude value distribution and the phase distribution as
the number of sound sources.
[0072]
As described above, by combining the acoustic signal processing device 1 and the acoustic signal
processing device 4 to configure the acoustic signal processing device 5, the number of sound
sources can be estimated more accurately.
[0073]
11-04-2019
22
In the present embodiment, the acoustic signal processing devices 1 and 4 are combined, but any
of the acoustic signal processing devices 1 to 4 may be combined.
It is also possible to combine three or more acoustic signal processors.
[0074]
FIG. 14 is a diagram showing a hardware configuration of the acoustic signal processing device 1
according to the first embodiment. The acoustic signal processing device 1 includes a ROM 61
storing an acoustic signal processing program for estimating the number of sound sources and
the like, a CPU 62 for controlling each part of the acoustic signal processing device 1 according
to the program in the ROM 61 It comprises a RAM 63 for storing various data necessary for
control, a communication I / F 64 for communication by connecting to a network, and a bus 65
for connecting each part.
[0075]
Also, the audio signal processing program may be provided by being stored in a computer
readable storage medium such as a CD-ROM, a flexible disk (FD), a DVD or the like in an
installable format or an executable format file.
[0076]
In this case, the acoustic signal processing program is loaded onto the main storage device of the
acoustic signal processing device 1 by reading out from the storage medium and executing the
program, and each part of the software configuration shown in FIG. It is supposed to be formed.
[0077]
Further, the sound signal processing program of the present embodiment may be stored on a
computer connected to a network such as the Internet and provided by being downloaded via the
network and communication I / F 64.
[0078]
The hardware configuration is not limited to the acoustic signal processing device 1, and the
acoustic signal processing devices 2 to 5 can be similarly configured.
11-04-2019
23
[0079]
Finally, the description of each embodiment described above is an example of the present
disclosure, and the present disclosure is not limited to the above-described embodiment.
For this reason, even if it is a range which does not deviate from the technical idea concerning
this indication even if it is other than each embodiment mentioned above, it is needless to say
that various change is possible according to a design etc.
[0080]
Reference Signs List 10 sound source 101 microphone 102 A / D conversion unit 103, 203, 303,
403 mapping unit 104, 204, 504 detection unit 105, 205, 505 estimation unit 206, 406
frequency decomposition unit
11-04-2019
24
Документ
Категория
Без категории
Просмотров
0
Размер файла
36 Кб
Теги
description, jp2012244211
1/--страниц
Пожаловаться на содержимое документа