close

Вход

Забыли?

вход по аккаунту

?

JP2009200569

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009200569
PROBLEM TO BE SOLVED: To provide a method and an apparatus for estimating a sound source
direction with practically sufficient estimation accuracy using an estimation algorithm of an
elevation angle of a sound source utilizing a human direction perception mechanism. SOLUTION:
A sound source direction estimation device 500 has an amplitude spectrum generating means
504 for obtaining an amplitude spectrum in an acoustic signal obtained by a dummy head
microphone 502, and two spectrum notches existing in a frequency range of 4000 Hz to 17000
Hz in the amplitude spectrum. Notch frequency candidate detection means 506 for detecting
frequency candidates, database 507 storing the relationship between the frequency of the
spectral notch and the elevation angle of the sound source created based on the auditory angle
perception mechanism, frequency candidate of the detected spectral notch and database And an
elevation angle estimation means 509 for estimating the elevation angle of the sound source by
collating. [Selected figure] Figure 1
Sound source direction estimation method and apparatus
[0001]
The present invention relates to a method of estimating a sound source direction used for a
robot, a monitoring system, a recording system or the like, and an apparatus therefor.
[0002]
Conventionally, as a method of estimating a sound source direction, for example, the method
described in Patent Document 1 is known.
10-05-2019
1
In this method, a microphone enhancer is attached to the microphone, and the microphone
enhancer deforms the acoustic signal to generate an interference pattern that is conceptually
similar to the pattern generated by the human pinna and causes the microphone to receive the
acoustic signal. The microphone enhancer interacts with the incoming acoustic signal to generate
unique frequency characteristics for each spatial direction and sends these signals to the
microphone. These interaction or interference patterns appear as spectral notches in the
spectrum of each signal. With this spectral notch, the azimuth and elevation are estimated from
the search table using a "suitable algorithm". Japanese Patent Publication No. 2003-523060
gazette
[0003]
The above-described conventional sound source direction estimation method requires a
microphone enhancer of a special structure. Therefore, a microphone mounted at the entrance of
both ears such as a dummy head and a humanoid robot, which is generally used for sound
recording (binaural recording) including the spatial characteristics of the sound field and
measurement of the spatial characteristics. A microphone placed in an environment similar to
that received by the human ear, such as (in this application, defined as "dummy head
microphone". Can not use). Also, an algorithm for searching for azimuth and elevation angles
from the spectral notch generated by the microphone enhancer is not specifically shown.
Therefore, it is unclear the accuracy of the technology, that is, the accuracy with which the
estimation of the azimuth angle and the elevation angle is possible. Therefore, the invention
according to this application includes: (1) recording with two microphones disposed at a site such
as the entrance of both ears of a dummy head generally used for recording and measurement of
spatial characteristics of the sound field Providing a method and an apparatus capable of
estimating the sound source direction using the recorded acoustic signal, (2) using an estimation
algorithm of elevation angle and azimuth angle of the sound source using human direction
perception mechanism clarified by the study of hearing; It is an object of the present invention to
provide a method and an apparatus for estimating a sound source direction which realizes a
practically sufficient estimation accuracy.
[0004]
In order to solve the above problems, in the invention according to this application, an amplitude
spectrum generating means for obtaining an amplitude spectrum in an acoustic signal obtained
10-05-2019
2
by a dummy head microphone, and two spectra existing in a frequency range of 4000 Hz to
17000 Hz in the amplitude spectrum. Notch frequency candidate detection means for detecting
candidate notch frequencies, a database storing the relationship between the frequency of the
spectral notch and the elevation angle of the sound source created based on the auditory
perceptual angle perception mechanism, and the frequency of the detected spectral notch A
sound source direction estimation apparatus is configured by including elevation angle
estimation means for estimating the elevation angle of the sound source by collating the
candidate with the database. Further, in the invention according to this application, a first step of
obtaining an amplitude spectrum of an acoustic signal obtained by a dummy head microphone,
and candidates for frequencies of two spectral notches present in a frequency range of 4000 Hz
to 17000 Hz in the amplitude spectrum. And a database storing the relationship between the
frequency of the spectral notch and the elevation angle of the sound source created based on the
elevation angle perception mechanism of the auditory, and a candidate of the frequency of the
detected spectral notch collated with Estimating a sound source direction by a method including
a third step of estimating an elevation angle.
[0005]
According to the apparatus and method of the invention of this application, the elevation and
azimuth angles of the sound source are practically sufficient from the two-channel input acoustic
signal recorded by the microphone disposed at the site such as the entrance of both ears of the
dummy head. It is possible to estimate with accuracy.
[0006]
Embodiments of the present invention will be described with reference to the drawings.
1 is a conceptual block diagram of a sound source direction estimation apparatus according to an
embodiment of the present invention, FIG. 2 is a graph showing an envelope of an amplitude
spectrum of sound, and FIG. 3 is a graph showing a relationship between notch frequency and
elevation angle of sound source. 4 is a graph showing an example of comparison between the
estimated value of the elevation angle of the sound source and the measured value, FIG. 5 is an
explanatory view of a camera tracking system, FIG. 6 is an explanatory view of an image
recognition system, and FIG. 8 is an explanatory view of a microphone tracking system, FIG. 9 is
an explanatory view of a voice recognition system, and FIG. 10 is an explanatory view of a
recording system.
10-05-2019
3
[0007]
A conceptual block diagram of a sound source direction estimation apparatus 500 according to
an embodiment of the present invention is shown in FIG. The elevation angle of the sound source
is calculated as follows. First, an acoustic signal emitted from the sound source 501 is recorded
by the dummy head microphone 502. At least one input acoustic signal of the two input acoustic
signals recorded by the dummy head microphone 502 is converted into a frequency domain by a
time-frequency converter 503 using Fourier transform or the like. Further, spectral information
is obtained by the amplitude spectrum calculator 504. In order to detect the notch frequency by
removing the influence of unnecessary noise, an envelope of an amplitude spectrum as shown in
FIG. 2 is calculated by a spectral envelope extractor 505 using a moving average, a Gaussian
filter or the like. Next, notch frequency candidate detector 506 detects, from the envelope of the
amplitude spectrum, candidates for the frequencies of the two spectral notches N1 and N2 which
are clues to human elevation angle perception. That is, the notch frequency candidate detector
506 calculates a plurality of minimum values in the frequency range of 4000 Hz to 12000 Hz
and the frequency range of 8000 Hz to 17000 Hz to extract frequency candidates of two notches
N1 and N2. An example of detecting the frequencies of the notches N1 and N2 is shown in FIG.
Here, the notches N1 and N2 are frequency notches serving as clues to human perception of the
elevation angle. From the two notches N1 and N2 frequency candidates thus obtained, the
elevation angle of the sound source is estimated using the database 507 and the collator 508 to
obtain an elevation angle estimated value 509. The database 507 stores the relationship between
the elevation angle of the sound source and the notch frequency created in advance based on
actual measurement.
[0008]
The collator 508 estimates the elevation angle of the sound source using, for example, the
following equations (1) and (2). err (i, j, θ) = | fN1C (i)-fN1 (θ) | + | fN2C (j)-fN2 (θ) | (1) Θ = θ
| min (err (i, j, θ)) (2) where fN1C (i): frequency of i-th candidate of notch N1 fN2C (j): frequency
of j-th candidate of notch N2 fN1 (θ): elevation angle θ stored in database 507 in advance
Frequency fN2 (θ) of notch N1: Frequency err (i, j, θ) of notch N2 of elevation angle θ stored
in advance in database 507: estimation error, ie, frequency of i-th candidate of notch N1 and
elevation angle θ Sum of absolute values of the difference between the frequency of notch N1
and the frequency of the j-th candidate of notch N2 and the frequency of notch N2 of elevation
angle θ | min (err (i, j, θ)): Elevation angle 最小: Elevation angle estimation value that minimizes
estimation error
10-05-2019
4
[0009]
An example of the database 507 which shows the relationship between the frequency of notch
N1, N2 and the elevation angle of a sound source is shown in FIG. FIG. 3 shows the frequencies of
the two notches N1 and N2 when the elevation angle of the sound source changes in the upper
hemisphere from 0 ° (forward direction) to 90 ° (upward direction) to 180 ° (backward
direction). The ■ and は are measured values, and the dotted and broken lines represent the
frequencies of the notches N1 and N2 as a function of the elevation angle of the sound source,
and are approximate values by a quartic function. As described above, an elevation angle at
which the difference between the frequencies of the two notches N1 and N2 and the frequency
obtained by the approximation equation by the quartic function indicated by the dotted line and
the broken line becomes the smallest is determined.
[0010]
An example of the result of estimating the elevation angle of the sound source using the above
procedure is shown in FIG. The horizontal axis is the elevation of the actual sound source, and
the vertical axis is the estimated elevation. There are five types of sound sources: white noise,
pink noise, female voice announcement, male voice announcement, and pop music. It can be seen
that the elevation angle can be estimated with practically sufficient accuracy with few exceptions.
[0011]
Next, the following operation is performed to calculate the azimuth of the sound source. The time
difference calculator 510 and the level difference calculator 520 calculate time differences and
level differences from the two-channel input sound signals recorded by the dummy head
microphone 502, respectively. The calculated time difference and level difference are determined
by the database 511 showing the relationship between the azimuth angle of the sound source
and the time difference created in advance and the database 521 showing the relationship
between the azimuth angle and the level difference of the sound source created in advance and
the collator 512, 522 The azimuth estimation value 523 is calculated. Various methods are
known for estimating the azimuth angle of the sound source based on the level difference
between the left and right ears and the time difference of the sound source.
10-05-2019
5
[0012]
Next, the operation of the device in which the camera tracking system is operatively connected
and the output from the direction estimation means operates the camera tracking system to
direct the camera towards the sound source is shown in FIG. The elevation angle estimated value
509 and the azimuth angle estimated value 523 calculated by the sound source direction
estimation apparatus 500 are input to the elevation angle control unit 601 and the azimuth angle
control unit 602, respectively, and control the orientation of the video camera 603.
[0013]
This allows the camera to be automatically directed to the speaker, for example, in a
teleconferencing system. Further, in the surveillance camera system, the camera can be
automatically directed in the direction of the suspicious noise.
[0014]
As shown in FIG. 6, the video camera 603 in the embodiment of FIG. 5 is further connected to the
image recognizer 701, and an output image of the video camera 603 is input to the image
recognizer 701 to configure an image recognition system for recognizing an image. can do.
[0015]
Next, a robot system equipped with the sound source direction estimation device 500 is shown in
FIG.
The elevation angle estimated value 509 and the azimuth angle estimated value 523 calculated
by the sound source direction estimation apparatus 500 are input to the elevation angle
controller 601 and the azimuth angle controller 602, respectively, to control the orientation of
the robot head 801 and the robot trunk 802. . As a result, the robot is automatically directed to
the direction of the sound source, and for example, when the human and the robot communicate
by voice, the robot can always face the human face. In addition, in the surveillance robot, by
automatically turning the face and body in the direction of suspicious noise, it is possible to
collect sounds with high accuracy and to point the eye (camera) in the direction of the sound and
obtain visual information. .
10-05-2019
6
[0016]
Next, a microphone tracking system equipped with the sound source direction estimation
apparatus 500 is shown in FIG. The elevation estimated value 509 and the azimuth estimated
value 523 calculated by the sound source direction estimating apparatus 500 are input to the
elevation controller 601 and the azimuth controller 602, respectively, to control the direction
and directivity of the microphone 901. Thereby, the direction and directivity of the microphone
901 can be automatically directed to the direction of the sound source, and for example, in the
remote conference system, the microphone can be automatically directed to the direction of the
speaker and sound can be accurately collected. Further, in the monitoring system, the
microphone 901 can be automatically directed in the direction of the suspicious noise to collect
the sound accurately.
[0017]
Next, a speech recognition system equipped with the sound source direction estimation device
500 is shown in FIG. The elevation estimated value 509 and the azimuth estimated value 523
calculated by the sound source direction estimating apparatus 500 are input to the elevation
controller 601 and the azimuth controller 602, respectively, to control the direction and
directivity of the microphone 901. The output voice of the microphone 901 is input to the
recorder 921 to record the voice. As a result, for example, in a robot, a car navigation system, a
monitoring system, etc., it is possible to automatically recognize the voice with high accuracy by
directing the direction and directivity of the microphone toward the direction of the sound
source emitting the voice even under noise environment.
[0018]
The sound recording system shown in FIG. 10 records the output sound signal of the microphone
901 in the above embodiment as an input signal to the sound recorder 921. The microphone
direction and directivity are automatically directed to the direction of the sound source, and the
recorder 921 records the sound signal of this sound source. Thus, for example, in a robot, a
monitoring system, etc., it is possible to automatically record the direction and directivity of the
microphone in the direction of the sound source with high accuracy even in a noisy environment.
10-05-2019
7
[0019]
It is a conceptual block diagram of the sound source direction estimation apparatus of
embodiment of this invention. It is a graph which shows the envelope of the amplitude spectrum
of a sound. It is a graph which shows the relationship between the frequency of a notch and the
elevation angle of a sound source. It is a graph which shows an example of the comparison with
the estimated value of the elevation angle of a sound source, and actual value. It is explanatory
drawing of a camera tracking system. It is an explanatory view of an image recognition system. It
is explanatory drawing of a robot system. It is explanatory drawing of a microphone tracking
system. It is an explanatory view of a speech recognition system. It is explanatory drawing of a
sound recording system.
Explanation of sign
[0020]
500 sound source direction estimation device 501 sound source 502 microphone 503 timefrequency converter 504 amplitude spectrum calculator 505 spectrum envelope extractor 506
notch frequency candidate detector 507 database 508 collator 509 elevation angle estimated
value 510 time difference calculator 511 database 512 collator 520 level difference calculator
521 database 522 collator 523 azimuth angle estimated value 601 elevation controller 602
azimuth controller 603 video camera 701 image recognizer 801 robot head 802 robot body 901
microphone 911 speech recognizer 921 recorder N1, N2 frequency notch
10-05-2019
8
Документ
Категория
Без категории
Просмотров
0
Размер файла
17 Кб
Теги
jp2009200569
1/--страниц
Пожаловаться на содержимое документа