Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2009200569 PROBLEM TO BE SOLVED: To provide a method and an apparatus for estimating a sound source direction with practically sufficient estimation accuracy using an estimation algorithm of an elevation angle of a sound source utilizing a human direction perception mechanism. SOLUTION: A sound source direction estimation device 500 has an amplitude spectrum generating means 504 for obtaining an amplitude spectrum in an acoustic signal obtained by a dummy head microphone 502, and two spectrum notches existing in a frequency range of 4000 Hz to 17000 Hz in the amplitude spectrum. Notch frequency candidate detection means 506 for detecting frequency candidates, database 507 storing the relationship between the frequency of the spectral notch and the elevation angle of the sound source created based on the auditory angle perception mechanism, frequency candidate of the detected spectral notch and database And an elevation angle estimation means 509 for estimating the elevation angle of the sound source by collating. [Selected figure] Figure 1 Sound source direction estimation method and apparatus [0001] The present invention relates to a method of estimating a sound source direction used for a robot, a monitoring system, a recording system or the like, and an apparatus therefor. [0002] Conventionally, as a method of estimating a sound source direction, for example, the method described in Patent Document 1 is known. 10-05-2019 1 In this method, a microphone enhancer is attached to the microphone, and the microphone enhancer deforms the acoustic signal to generate an interference pattern that is conceptually similar to the pattern generated by the human pinna and causes the microphone to receive the acoustic signal. The microphone enhancer interacts with the incoming acoustic signal to generate unique frequency characteristics for each spatial direction and sends these signals to the microphone. These interaction or interference patterns appear as spectral notches in the spectrum of each signal. With this spectral notch, the azimuth and elevation are estimated from the search table using a "suitable algorithm". Japanese Patent Publication No. 2003-523060 gazette [0003] The above-described conventional sound source direction estimation method requires a microphone enhancer of a special structure. Therefore, a microphone mounted at the entrance of both ears such as a dummy head and a humanoid robot, which is generally used for sound recording (binaural recording) including the spatial characteristics of the sound field and measurement of the spatial characteristics. A microphone placed in an environment similar to that received by the human ear, such as (in this application, defined as "dummy head microphone". Can not use). Also, an algorithm for searching for azimuth and elevation angles from the spectral notch generated by the microphone enhancer is not specifically shown. Therefore, it is unclear the accuracy of the technology, that is, the accuracy with which the estimation of the azimuth angle and the elevation angle is possible. Therefore, the invention according to this application includes: (1) recording with two microphones disposed at a site such as the entrance of both ears of a dummy head generally used for recording and measurement of spatial characteristics of the sound field Providing a method and an apparatus capable of estimating the sound source direction using the recorded acoustic signal, (2) using an estimation algorithm of elevation angle and azimuth angle of the sound source using human direction perception mechanism clarified by the study of hearing; It is an object of the present invention to provide a method and an apparatus for estimating a sound source direction which realizes a practically sufficient estimation accuracy. [0004] In order to solve the above problems, in the invention according to this application, an amplitude spectrum generating means for obtaining an amplitude spectrum in an acoustic signal obtained 10-05-2019 2 by a dummy head microphone, and two spectra existing in a frequency range of 4000 Hz to 17000 Hz in the amplitude spectrum. Notch frequency candidate detection means for detecting candidate notch frequencies, a database storing the relationship between the frequency of the spectral notch and the elevation angle of the sound source created based on the auditory perceptual angle perception mechanism, and the frequency of the detected spectral notch A sound source direction estimation apparatus is configured by including elevation angle estimation means for estimating the elevation angle of the sound source by collating the candidate with the database. Further, in the invention according to this application, a first step of obtaining an amplitude spectrum of an acoustic signal obtained by a dummy head microphone, and candidates for frequencies of two spectral notches present in a frequency range of 4000 Hz to 17000 Hz in the amplitude spectrum. And a database storing the relationship between the frequency of the spectral notch and the elevation angle of the sound source created based on the elevation angle perception mechanism of the auditory, and a candidate of the frequency of the detected spectral notch collated with Estimating a sound source direction by a method including a third step of estimating an elevation angle. [0005] According to the apparatus and method of the invention of this application, the elevation and azimuth angles of the sound source are practically sufficient from the two-channel input acoustic signal recorded by the microphone disposed at the site such as the entrance of both ears of the dummy head. It is possible to estimate with accuracy. [0006] Embodiments of the present invention will be described with reference to the drawings. 1 is a conceptual block diagram of a sound source direction estimation apparatus according to an embodiment of the present invention, FIG. 2 is a graph showing an envelope of an amplitude spectrum of sound, and FIG. 3 is a graph showing a relationship between notch frequency and elevation angle of sound source. 4 is a graph showing an example of comparison between the estimated value of the elevation angle of the sound source and the measured value, FIG. 5 is an explanatory view of a camera tracking system, FIG. 6 is an explanatory view of an image recognition system, and FIG. 8 is an explanatory view of a microphone tracking system, FIG. 9 is an explanatory view of a voice recognition system, and FIG. 10 is an explanatory view of a recording system. 10-05-2019 3 [0007] A conceptual block diagram of a sound source direction estimation apparatus 500 according to an embodiment of the present invention is shown in FIG. The elevation angle of the sound source is calculated as follows. First, an acoustic signal emitted from the sound source 501 is recorded by the dummy head microphone 502. At least one input acoustic signal of the two input acoustic signals recorded by the dummy head microphone 502 is converted into a frequency domain by a time-frequency converter 503 using Fourier transform or the like. Further, spectral information is obtained by the amplitude spectrum calculator 504. In order to detect the notch frequency by removing the influence of unnecessary noise, an envelope of an amplitude spectrum as shown in FIG. 2 is calculated by a spectral envelope extractor 505 using a moving average, a Gaussian filter or the like. Next, notch frequency candidate detector 506 detects, from the envelope of the amplitude spectrum, candidates for the frequencies of the two spectral notches N1 and N2 which are clues to human elevation angle perception. That is, the notch frequency candidate detector 506 calculates a plurality of minimum values in the frequency range of 4000 Hz to 12000 Hz and the frequency range of 8000 Hz to 17000 Hz to extract frequency candidates of two notches N1 and N2. An example of detecting the frequencies of the notches N1 and N2 is shown in FIG. Here, the notches N1 and N2 are frequency notches serving as clues to human perception of the elevation angle. From the two notches N1 and N2 frequency candidates thus obtained, the elevation angle of the sound source is estimated using the database 507 and the collator 508 to obtain an elevation angle estimated value 509. The database 507 stores the relationship between the elevation angle of the sound source and the notch frequency created in advance based on actual measurement. [0008] The collator 508 estimates the elevation angle of the sound source using, for example, the following equations (1) and (2). err (i, j, θ) = | fN1C (i)-fN1 (θ) | + | fN2C (j)-fN2 (θ) | (1) Θ = θ | min (err (i, j, θ)) (2) where fN1C (i): frequency of i-th candidate of notch N1 fN2C (j): frequency of j-th candidate of notch N2 fN1 (θ): elevation angle θ stored in database 507 in advance Frequency fN2 (θ) of notch N1: Frequency err (i, j, θ) of notch N2 of elevation angle θ stored in advance in database 507: estimation error, ie, frequency of i-th candidate of notch N1 and elevation angle θ Sum of absolute values of the difference between the frequency of notch N1 and the frequency of the j-th candidate of notch N2 and the frequency of notch N2 of elevation angle θ | min (err (i, j, θ)): Elevation angle 最小: Elevation angle estimation value that minimizes estimation error 10-05-2019 4 [0009] An example of the database 507 which shows the relationship between the frequency of notch N1, N2 and the elevation angle of a sound source is shown in FIG. FIG. 3 shows the frequencies of the two notches N1 and N2 when the elevation angle of the sound source changes in the upper hemisphere from 0 ° (forward direction) to 90 ° (upward direction) to 180 ° (backward direction). The ■ and は are measured values, and the dotted and broken lines represent the frequencies of the notches N1 and N2 as a function of the elevation angle of the sound source, and are approximate values by a quartic function. As described above, an elevation angle at which the difference between the frequencies of the two notches N1 and N2 and the frequency obtained by the approximation equation by the quartic function indicated by the dotted line and the broken line becomes the smallest is determined. [0010] An example of the result of estimating the elevation angle of the sound source using the above procedure is shown in FIG. The horizontal axis is the elevation of the actual sound source, and the vertical axis is the estimated elevation. There are five types of sound sources: white noise, pink noise, female voice announcement, male voice announcement, and pop music. It can be seen that the elevation angle can be estimated with practically sufficient accuracy with few exceptions. [0011] Next, the following operation is performed to calculate the azimuth of the sound source. The time difference calculator 510 and the level difference calculator 520 calculate time differences and level differences from the two-channel input sound signals recorded by the dummy head microphone 502, respectively. The calculated time difference and level difference are determined by the database 511 showing the relationship between the azimuth angle of the sound source and the time difference created in advance and the database 521 showing the relationship between the azimuth angle and the level difference of the sound source created in advance and the collator 512, 522 The azimuth estimation value 523 is calculated. Various methods are known for estimating the azimuth angle of the sound source based on the level difference between the left and right ears and the time difference of the sound source. 10-05-2019 5 [0012] Next, the operation of the device in which the camera tracking system is operatively connected and the output from the direction estimation means operates the camera tracking system to direct the camera towards the sound source is shown in FIG. The elevation angle estimated value 509 and the azimuth angle estimated value 523 calculated by the sound source direction estimation apparatus 500 are input to the elevation angle control unit 601 and the azimuth angle control unit 602, respectively, and control the orientation of the video camera 603. [0013] This allows the camera to be automatically directed to the speaker, for example, in a teleconferencing system. Further, in the surveillance camera system, the camera can be automatically directed in the direction of the suspicious noise. [0014] As shown in FIG. 6, the video camera 603 in the embodiment of FIG. 5 is further connected to the image recognizer 701, and an output image of the video camera 603 is input to the image recognizer 701 to configure an image recognition system for recognizing an image. can do. [0015] Next, a robot system equipped with the sound source direction estimation device 500 is shown in FIG. The elevation angle estimated value 509 and the azimuth angle estimated value 523 calculated by the sound source direction estimation apparatus 500 are input to the elevation angle controller 601 and the azimuth angle controller 602, respectively, to control the orientation of the robot head 801 and the robot trunk 802. . As a result, the robot is automatically directed to the direction of the sound source, and for example, when the human and the robot communicate by voice, the robot can always face the human face. In addition, in the surveillance robot, by automatically turning the face and body in the direction of suspicious noise, it is possible to collect sounds with high accuracy and to point the eye (camera) in the direction of the sound and obtain visual information. . 10-05-2019 6 [0016] Next, a microphone tracking system equipped with the sound source direction estimation apparatus 500 is shown in FIG. The elevation estimated value 509 and the azimuth estimated value 523 calculated by the sound source direction estimating apparatus 500 are input to the elevation controller 601 and the azimuth controller 602, respectively, to control the direction and directivity of the microphone 901. Thereby, the direction and directivity of the microphone 901 can be automatically directed to the direction of the sound source, and for example, in the remote conference system, the microphone can be automatically directed to the direction of the speaker and sound can be accurately collected. Further, in the monitoring system, the microphone 901 can be automatically directed in the direction of the suspicious noise to collect the sound accurately. [0017] Next, a speech recognition system equipped with the sound source direction estimation device 500 is shown in FIG. The elevation estimated value 509 and the azimuth estimated value 523 calculated by the sound source direction estimating apparatus 500 are input to the elevation controller 601 and the azimuth controller 602, respectively, to control the direction and directivity of the microphone 901. The output voice of the microphone 901 is input to the recorder 921 to record the voice. As a result, for example, in a robot, a car navigation system, a monitoring system, etc., it is possible to automatically recognize the voice with high accuracy by directing the direction and directivity of the microphone toward the direction of the sound source emitting the voice even under noise environment. [0018] The sound recording system shown in FIG. 10 records the output sound signal of the microphone 901 in the above embodiment as an input signal to the sound recorder 921. The microphone direction and directivity are automatically directed to the direction of the sound source, and the recorder 921 records the sound signal of this sound source. Thus, for example, in a robot, a monitoring system, etc., it is possible to automatically record the direction and directivity of the microphone in the direction of the sound source with high accuracy even in a noisy environment. 10-05-2019 7 [0019] It is a conceptual block diagram of the sound source direction estimation apparatus of embodiment of this invention. It is a graph which shows the envelope of the amplitude spectrum of a sound. It is a graph which shows the relationship between the frequency of a notch and the elevation angle of a sound source. It is a graph which shows an example of the comparison with the estimated value of the elevation angle of a sound source, and actual value. It is explanatory drawing of a camera tracking system. It is an explanatory view of an image recognition system. It is explanatory drawing of a robot system. It is explanatory drawing of a microphone tracking system. It is an explanatory view of a speech recognition system. It is explanatory drawing of a sound recording system. Explanation of sign [0020] 500 sound source direction estimation device 501 sound source 502 microphone 503 timefrequency converter 504 amplitude spectrum calculator 505 spectrum envelope extractor 506 notch frequency candidate detector 507 database 508 collator 509 elevation angle estimated value 510 time difference calculator 511 database 512 collator 520 level difference calculator 521 database 522 collator 523 azimuth angle estimated value 601 elevation controller 602 azimuth controller 603 video camera 701 image recognizer 801 robot head 802 robot body 901 microphone 911 speech recognizer 921 recorder N1, N2 frequency notch 10-05-2019 8
1/--страниц