close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2009089315

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009089315
The present invention provides a method of estimating a sound source from sounds collected by
a plurality of microphones and synthesizing a sound at an arbitrary position. An acoustic signal
estimation and synthesis apparatus according to the present invention includes a band division
unit, a sound source estimation unit, a recording unit, a band signal component estimation unit, a
band signal component addition unit, and a band integration unit. The sound source estimation
unit estimates the position or direction, the strength, and the phase of the sound source for each
frequency band. Then, the signal from the sound source is removed from the band signal for each
channel to obtain a residual band signal. The band signal component estimation unit estimates a
band signal from each sound source at a designated position from the position or direction of
each sound source and the intensity and phase for each frequency band. The band signal
component adder calculates the band signal at the designated position by performing weighted
addition of the band signal from each of the estimated sound sources and the residual band
signal of each channel. [Selected figure] Figure 8
Acoustic signal estimation apparatus, acoustic signal synthesis apparatus, acoustic signal
estimation synthesis apparatus, acoustic signal estimation method, acoustic signal synthesis
method, acoustic signal estimation synthesis method, program using these methods, and
recording medium
[0001]
The present invention estimates a position or direction of a sound source from sound signals of a
plurality of channels, an intensity and a phase, and synthesizes an acoustic signal at an arbitrary
position, an acoustic signal synthesizer, an acoustic signal estimation synthesizer, an acoustic
signal The present invention relates to an estimation method, an acoustic signal synthesis
method, an acoustic signal estimation synthesis method, a program using these methods, and a
10-04-2019
1
recording medium.
[0002]
Techniques for picking up a three-dimensional acoustic signal with a plurality of microphones to
separate sound sources and suppressing noise are well known.
The position of the sound source can be collected by a sensor. Also, it is possible to separate and
collect individual sounds with an array microphone. As the means, the SAFIA method (non-patent
document 1) and the CSCC method (non-patent document 2) are known. Mariko Aoki, Yoshikazu
Yamaguchi, Ken-ichi Furuya, Akitoshi Kataoka, "Separation and extraction of proximity sound
source under high noise using sound source separation method SAFIA," Journal of the Institute of
Electronics, Information and Communication Engineers A, Vol. J88-A, No. 4, pp. 468-479, 2005.
Atsushi Matsumoto, Junki Ono, Shigeki Hatakeyama, "Study on noise suppression by the phaseconstrained complex spectral circle (CSCC) method," Proceedings of the Acoustical Society of
Japan, 3-1-11, pp.499-500, 2006.
[0003]
In general, a plurality of microphones are placed at a distance from the sound source, and
constantly pick up sound. However, the position and number of sound sources are not clear, and
it is also assumed that they change with time. In such a case, in order to obtain a sound picked
up at an arbitrary position, it is impossible to cope with the method of separating the sound
source on the assumption of several parameters. It is an object of the present invention to solve
such problems and to provide a method of estimating a sound source from sounds collected by a
plurality of microphones and synthesizing a sound at an arbitrary position.
[0004]
An acoustic signal estimation apparatus according to the present invention comprises a band
division unit and a sound source estimation unit. The band dividing unit divides a plurality of
channels of acoustic signals collected by a plurality of microphones into predetermined
frequency bands for each channel to generate band signals. The sound source estimation unit
estimates the position or direction, the strength, and the phase of the sound source for each
10-04-2019
2
frequency band. Then, the signal from the sound source is removed from the band signal for each
channel to obtain a residual band signal. That is, the frequency band in which one or more sound
sources can be estimated removes the signal from the sound source from the band signal for
each channel to obtain a residual band signal, and the frequency band in which the sound source
can not be estimated is a band signal of each channel Let be a residual band signal.
[0005]
The acoustic signal synthesis apparatus of the present invention comprises a band signal
component estimation unit, a band signal component addition unit, and a band integration unit,
and the position or direction of each sound source, the intensity and phase for each frequency
band, and the residual band signal of each channel , The position to synthesize the sound is input.
The band signal component estimation unit estimates a band signal from each sound source at a
designated position from the position or direction of each sound source and the intensity and
phase for each frequency band. The band signal component adder calculates the band signal at
the designated position by performing weighted addition of the band signal from each of the
estimated sound sources and the residual band signal of each channel. The band integration unit
converts the band signal at the designated position into a time domain signal.
[0006]
The acoustic signal estimation and synthesis apparatus according to the present invention
includes the above-described acoustic signal estimation apparatus, a recording unit, and an
acoustic signal synthesis apparatus. The recording unit records the position or direction of each
sound source output from the acoustic signal estimation device, the intensity and phase for each
frequency band, and the residual band signal of each channel. The acoustic signal synthesis
apparatus inputs the position or direction of each estimated sound source recorded in the
recording unit, the intensity and phase for each frequency band, the residual band signal of each
channel, and the position for synthesizing the collected sound. Do. The acoustic signal estimation
device and the acoustic signal synthesis device may have the above-described recording unit.
[0007]
According to the acoustic signal estimation apparatus of the present invention, position and
direction of one or more sound sources and intensity and phase for each frequency band are
10-04-2019
3
estimated from acoustic signals of a plurality of channels collected by a plurality of microphones,
and residuals of each channel Find a band signal. Therefore, the sound can be divided into a
sound that can be estimated and a sound that can not be estimated, such as noise. According to
the sound signal synthesis apparatus of the present invention, for the sound whose sound source
has been estimated, it is possible to calculate the sound collected at the position designated from
the position or direction of the sound source. In addition, for sounds that can not be estimated by
the sound source, it is possible to calculate the sound collected at a specified position from the
residual band signal of each channel (a signal whose sound source can not be specified in the
band signal). And since these are weighted and added, the sound in the designated position can
be synthesized.
[0008]
According to the acoustic signal estimation and synthesis apparatus of the present invention,
since the effects of the above-described acoustic signal estimation apparatus and acoustic signal
synthesis apparatus are obtained, it is possible to synthesize a plurality of channels of acoustic
signals collected by a plurality of microphones at designated positions. it can. With such an
effect, it becomes possible, for example, to combine sound signals corresponding to a free
viewpoint video system that combines images and videos of arbitrary viewpoints from cameras at
a plurality of places.
[0009]
The principles and embodiments of the present invention will be described below with reference
to the drawings. Principle FIG. 1 shows an example of sound from a sound source far enough that
the sound propagated by the four microphones can be approximated to a plane wave. In general,
when the sound source is separated by 10 times or more from the distance between the
microphones farthest from each other, it can be approximated as a plane wave. In FIG. 1, the four
microphones 501 to 504 are linearly arranged. It is assumed that the sound from the sound
source A comes from a direction perpendicular to the arrangement of the microphones. In this
case, since the wave front of the arriving sound wave is aligned, the input signals from the sound
source A to the respective microphones are the same. It is assumed that the sound from the
sound source B comes from a direction that is not perpendicular to the arrangement of the
microphones. In this case, the arrival time of the sound from the sound source B to each
microphone is different. Further, when viewed in band signal components for each band, the
phases are different. In FIG. 2, the example of the spectrum of the sound propagated from the
sound source A in the places 501-504 in which the microphones 501-504 are installed is shown.
10-04-2019
4
FIG. 3 shows an example of the spectrum of the sound propagated from sound source B at
locations 501-504. 3 (A) shows the spectrum of the sound from the sound source B at the place
501, (B) shows the spectrum of the sound from the sound source B at the place 502, and (C)
shows the spectrum of the sound from the sound source B at the place 503. (D) is the spectrum
of the sound from source B at location 504. 4 to 6 show the spectra of the sound from the sound
source A and the sound source B at the places 501, 502 and 503. FIG. 4 shows the spectrum of
the sound from the sound source A and the sound source B at the place 501, FIG. 5 shows the
spectrum of the sound from the sound source A and the sound source B at the place 502, and
FIG. 6 shows the spectrum from the sound source A and the sound source B at the place 503. It is
the spectrum of the sound.
[0010]
In the present invention, the direction of the sound source and the spectrum of the sound source
are estimated from the spectrum of the sound collected by the plurality of microphones as
described above. In the case of assuming a spherical wave as shown in FIG. 7, the position of the
sound source is estimated instead of estimating the direction of the sound source. Then, the
spectrum of the sound from the sound source estimated by each microphone is calculated, and
the signal (residual signal) remaining as a residual is treated as noise whose sound source can
not be identified. Then, from the estimated position and spectrum of the sound source, the
spectrum of the sound from each sound source at the position where the acoustic waveform is
desired (designated position) is obtained. Also, the spectrum of the residual signal at the
designated position is obtained by weighted addition of the residual signal of the microphone
near the designated position in consideration of the distance between the designated position
and the microphone. By adding these, the acoustic waveform at the designated position is
synthesized.
[0011]
A conventionally existing method may be used to estimate the direction and spectrum of the
sound source. For example, there is a method of using the phase difference of the sound collected
by each microphone. For example, in the case of two microphones, it is possible to determine that
there is one sound source if the peaks clearly appear in the time difference which is the
calculation of the cross-correlation function. In the case of two or more microphones, for
example, if one sound source is assumed to solve simultaneous equations or the phase difference
is evaluated in the frequency domain, it can be determined whether the obtained result can be
regarded as one sound source. That is, in general, if there are two or more microphones, it is
10-04-2019
5
possible to estimate the sound source direction from the difference in the phase in the individual
frequency bands of the collected sound.
[0012]
In the SAFIA method, it is assumed that there is one major sound source component in each
band, and the position of the sound source and the sound from the sound source are obtained.
The spectrum of the sound source has strong parts and weak parts, and focusing on a certain
band, it is relatively rare that major components come from multiple sound sources. For example,
as shown in FIGS. 4 to 6, in the spectrum of the sound from the source A and the spectrum of the
sound from the source B, most of the frequencies at which the spectrum exists are different (for
example, the band of interest in FIGS. 4 to 6) a and the attention band c). Therefore, in the case of
band division, in one band, one sound of the sound source A or the sound source B is dominant,
and the other is hardly present. The SAFIA method utilizes such characteristics.
[0013]
In the CSCC method, when the input spectrum from another sound source becomes constant, or
when converted as such, the sound source direction and its signal component are obtained from
the arrangement on the complex plane of the spectrum from a single sound source for multiple
microphones. Separately estimate. As in the case of the band of interest a, there are few
components from the sound source A, or cases where signal components from the sound source
A can be converted to be common to all the microphones by delaying each signal, etc. The
direction of the sound source B can be accurately estimated from the components of the sound
source B in the places 501 to 504. The estimation accuracy of the position of the sound source
depends on how much the other sounds are present. In the case of the focused band c, there is
almost no component from the sound source B, so the direction of the sound source A can be
accurately estimated from the components of the sound source A at the places 501 to 504. In
this case, since the spectrum of any place is the same, it can be seen that the sound source A
exists in the direction perpendicular to the installation direction of the microphone. In the case of
the attention band b, since the component of the sound source A and the component of the
sound source B are both strong, simple separation is difficult. In this case, the component from
the sound source A and the component from the sound source B are estimated using the position
of the sound source estimated in the band (for example, the target band a and the target band c)
where the reliability of the estimation of the sound source direction is high. In this example, the
component from the sound source A can be regarded as a constant because it does not depend
on the location of the microphone.
10-04-2019
6
[0014]
In addition to this, there is a technique for separating a plurality of sound sources from the
signals of the number of microphones equal to or more than the number of sound sources
(Japanese Patent Laid-Open No. 2006-243664). Further, if the band is divided, the frequency
components generated by the sound source are biased, so separation is possible even if the
number of microphones is small (Japanese Patent Application Laid-Open No. 2007-198977).
[0015]
Also in the present invention, the direction of the sound source (or position) and the spectrum of
the sound source are estimated by separating the signals collected by the plurality of
microphones on the premise that there are a plurality of sound sources for each sound source.
Therefore, they are common in that the above-described signal separation method and similar
method are used, and which method to use may be appropriately selected. However, an object of
the present invention is to synthesize sounds at arbitrary positions, not to separate sounds for
each sound source. That is, in the present invention, it is more important to be able to synthesize
as a sound at a designated position as a result than to be able to accurately separate the sound.
Therefore, in the present invention, the position or direction of the sound source and the sound
source band signal (intensity and phase for each frequency band) are estimated to the extent
possible by any of the above-described methods, and the remaining signal can not be identified.
Treat as a residual signal. The residual signal is determined for each microphone. Then, a sound
source band signal (complex spectrum) for each direction and frequency band for each sound
source and a residual signal (residual band signal) for each frequency band for each microphone
(channel) are recorded. In the synthesis of sound at a designated position, a band signal from
each sound source at a designated position is estimated from the position or direction of each
sound source and the sound source band signal (intensity and phase for each frequency band).
Then, the band signal from each of the estimated sound sources and the residual band signal of
each channel are weighted and added to obtain the band signal at the specified position. Finally,
the band signal at the specified position is converted to a time domain signal.
[0016]
First Embodiment FIG. 8 shows an example of the functional configuration of the acoustic signal
10-04-2019
7
estimation and synthesis apparatus of the present invention. Further, FIG. 9 shows an example of
the processing flow of the acoustic signal estimation and synthesis apparatus. The acoustic signal
estimation / synthesis apparatus 100 of the present invention comprises a band division unit
110, a sound source estimation unit 120, a recording unit 130, a band signal component
estimation unit 140, a band signal component addition unit 150, and a band integration unit
160. The band division unit 110 is provided for each channel of the K channel acoustic signals x
1 (t), x 2 (t),..., X K (t) collected by K (K is an integer of 2 or more) microphones. Are divided into
predetermined frequency bands ω to generate band signals X 1 (ω), X 2 (ω),..., X K (ω) (S110).
The acoustic signal x 1 (t) is one sample value (scalar quantity) in a frame consisting of T
samples, and t takes values 0,..., T−1. A band signal X 1 (ω) for each predetermined frequency
band is obtained from such an acoustic signal x 1 (t). The band signal X 1 (ω) is, for example, a
complex spectrum. Although the band signal X 1 (ω) may be a band division complex signal, it
will be described below as a complex spectrum. As shown in the following equation, the frame for
each T point in the time domain is complex Fourier transformed, and the complex Fourier
coefficient of T / 2 points is obtained as the band signal X 1 (ω). Here, ω = 0,..., T / 2, j is an
imaginary unit, and π is a circle ratio. The band signal X 1 (ω) indicates the amplitude and
phase for each frequency band ω of the signal at the position of the first microphone (first
channel). When the sampling frequency is f Hz, it can be regarded as a band signal whose center
frequency is ω f / T Hz. The input to the band division unit 110 may be an analog acoustic
signal, and the value sampled in the band division unit 110 may be the acoustic signal x 1 (t). In
each case, the output is the same.
[0017]
The sound source estimation unit 120 is a conventional method, and the position or direction of
the sound source D ω, 1, D ω, 2,..., D ω, M ω and the sound source band signal S ω, 1, S ω for
each frequency band ω , 2,..., S ω and Mω are estimated (Mω is the number of sound sources in
the frequency band ω and is an integer of 0 or more). The sound source band signal S ω, 1 is
information (for example, complex spectrum) of intensity and phase for calculating a signal
generated near the microphone by sound propagated from the first sound source in the
frequency band ω. For example, if D ω, 1 indicates the position of the sound source and the
sound is a spherical wave, the sound source band signal S ω, 1 may be a complex spectrum
indicating the intensity and phase at the position of the sound source. Also, if D ω, 1 indicates
the direction of the sound source and the sound is approximated to a plane wave, then the sound
source band signal S ω, 1 has an intensity at a certain position (not necessarily the position of
the sound source) It may be a complex spectrum showing the phase. In the process of this
estimation, the signals U k, ω, m from the respective sound sources at the positions of the
microphones are also determined (k indicates the microphone number and is an integer of 1 to
K). Signal U k, ω, m indicates a signal from the m-th sound source of frequency band ω at the
10-04-2019
8
position of the k-th microphone (m is the number of the sound source assigned to each frequency
band ω, 0 It is an integer of ~ M ω). For example, in the case of approximation by a plane wave,
the inner product of the vector connecting the position of the sound source band signal S ω, 1
and the position of the microphone k and the unit vector in the sound propagation direction (how
far in the sound propagation direction Phase difference between the position of the sound source
band signal S ω, 1 and the position of the microphone k, and the signal obtained by shifting the
phase of S ω, 1 by the phase difference is a signal at the position of the microphone k It may be
set as U k, ω, 1.
[0018]
The frequency band ω where one or more sound sources can be estimated is a residual band
signal N 1 (ω), N 2 (ω),..., N by removing the signal from the sound source from the band signal
for each channel Find K (ω). Also, the frequency band ω where the sound source could not be
estimated is the band signal of each channel by calculating N k (ω) = X k (ω) for all k
(microphones) and ω (frequency bands) Let X k (ω) be a residual band signal N 1 (ω), N 2
(ω),..., N K (ω) (S120). That is, the residual band signals N 1 (ω), N 2 (ω),..., N K (ω) are
obtained by subtracting the signal from the estimated source at the position of the microphone
from the band signal for each channel. I'm asking. As described above, in the present invention,
since the signal whose position of the sound source could not be estimated is treated as a
residual band signal, it is not necessary to forcibly allocate the signal whose position of the sound
source can not be identified to one of the sound sources.
[0019]
Whether to estimate the position or direction of the sound source depends on whether the sound
is assumed to be a spherical wave or a plane wave. This assumption is determined in advance. In
addition, what kind of method the position or direction of the sound source and the strength and
phase are estimated may be appropriately selected from the above-described methods and the
like. As described above, in the present invention, it is more important that the finally synthesized
sound looks like a sound at a designated position than when the position (or direction) or
spectrum of the sound source is accurately estimated. is there. The position or direction of each
sound source estimated in step S120, the intensity and phase for each frequency band, and the
residual band signal of each channel are recorded in the recording unit 130. The information to
be recorded may be encoded information.
10-04-2019
9
[0020]
When the position P is designated, the band signal component estimation unit 140 specifies the
position or direction D ω, 1, D ω, 2,..., D ω, M ω of the sound source for each frequency band ω
and the sound source band signal S ω, A band signal Z (ω) obtained by synthesizing the sounds
from all the sound sources at the designated position P is estimated from 1, Sω, 2,..., Sω, Mω
(S140). For example, for each frequency band ω, the signal U P, ω, m from each sound source at
the position P is determined (m is the number of the sound source assigned to each frequency
band ω and is an integer from 0 to Mω) . The method of obtaining the signal U P, ω, m may be
the same as the method of obtaining the signal U k, ω, m from the sound source at the position
of each microphone of the sound source estimation unit 120. The band signal Z (ω) can be
obtained by adding up the signals UP, ω, m from the respective sound sources at the position P
for each frequency band ω as follows. Band signal component addition section 150 combines
band signals Z (ω) obtained by synthesizing the sounds from all the estimated sound sources and
residual band signals N 1 (ω), N 2 (ω),. By performing weighted addition with (ω), the band
signal Y (ω) at the designated position P is obtained (S150). For example, as shown in the
following equation, a band signal Z (ω) obtained by combining sounds estimated from all the
sound sources is multiplied by a weight of 1, and residual band signals of each channel are
weighted to all channels. The weight may be set according to the distance (for example, inversely
proportional) between the microphone of each channel and the designated position P such that
the sum of 1 becomes 1, and the weights may be multiplied and added. Here, d k is the distance
between the k-th microphone and the position P.
[0021]
The band integration unit 160 converts the band signal Y (ω) at the designated position P into a
signal y (t) in the time domain (S160). For example, the signal y (t) is one sample value in a frame
of T samples, and t takes values 0, ..., T-1. Since the acoustic signal estimation / synthesis
apparatus 100 of the present invention has such a configuration, it can be divided into a sound
that can be estimated by the sound source and a sound that can not be estimated by the sound
source such as noise. Then, for the sound whose sound source has been estimated, the sound at
the designated position P can be calculated from the position or direction of the sound source. In
addition, as for the sound whose sound source can not be estimated, the sound at the designated
position P can be calculated from the residual band signal of each channel (a signal whose sound
source can not be identified included in the band signal). Then, since these are weighted and
added, the sound at the designated position P can be synthesized. With such an effect, it becomes
possible, for example, to combine sound signals corresponding to a free viewpoint video system
that combines images and videos of arbitrary viewpoints from cameras at a plurality of places.
10-04-2019
10
[0022]
Modified Example In the first embodiment, the acoustic signal estimation and synthesis
apparatus 100 has been described. However, it is good also as one device (acoustic signal
estimation device) until it estimates the position or direction of each sound source, the sound
source band signal for every frequency band, and the residual band signal of each channel. In
addition, from the position or direction of each sound source to the sound source band signal for
each frequency band and the residual band signal for each channel, the process from when the
sound at the designated position P is synthesized is one device (acoustic signal synthesis device)
Also good.
[0023]
The acoustic signal estimation device 200 includes, for example, a band division unit 110 and a
sound source estimation unit 120. The recording unit 130 may be provided inside or outside the
acoustic signal estimation apparatus 200. The acoustic signal synthesis apparatus 300 includes,
for example, a band signal component estimation unit 140, a band signal component addition
unit 150, and a band integration unit 160. As described above, even if the system is divided into
several apparatuses to form an acoustic signal estimation / synthesis apparatus as a whole, the
same effect as that of the first embodiment can be obtained.
[0024]
FIG. 10 shows an example of the functional configuration of a computer. The acoustic signal
estimation and synthesis method, acoustic signal estimation method, and acoustic signal
synthesis method of the present invention cause the recording unit 2020 of the computer 2000
to read a program for operating the computer 2000 as each component of the present invention.
The computer can be made to operate by operating the input unit 2030, the output unit 2040,
and the like. In addition, as a method of reading into a computer, a program is recorded in a
computer readable recording medium, and a method of reading into a computer from the
recording medium, a program recorded in a server or the like is read into the computer through
a telecommunication line or the like. There is a way to
10-04-2019
11
[0025]
The figure which shows the mode of the sound of the plane wave which propagated from four
microphones and a distant sound source. The figure which shows the example of the spectrum of
the sound propagated from the sound source A in the places 501-504. The figure which shows
the example of the spectrum of the sound propagated from the sound source B in the places 501504. The figure which shows the spectrum of the sound from the sound source A and the sound
source B in the place 501. FIG. The figure which shows the spectrum of the sound from the
sound source A and the sound source B in the place 502. FIG. The figure which shows the
spectrum of the sound from the sound source A and the sound source B in the place 503. FIG.
The figure which shows the appearance of the sound of the spherical wave which propagated
from four microphones and a sound source. The figure which shows the function structural
example of an acoustic signal estimation synthetic | combination apparatus. The figure which
shows the example of the processing flow of an acoustic signal estimation synthetic |
combination apparatus. The figure which shows the function structural example of a computer.
10-04-2019
12
Документ
Категория
Без категории
Просмотров
0
Размер файла
24 Кб
Теги
description, jp2009089315
1/--страниц
Пожаловаться на содержимое документа