close

Вход

Забыли?

вход по аккаунту

?

JP2005260743

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2005260743
PROBLEM TO BE SOLVED: To provide a microphone array capable of improving SNR. SOLUTION:
In a microphone array provided with three or more microphones, the distance between the
microphones is set to a distance proportional to the graduation distance of the shortest Golomb
ruler. The three or more microphones may be arranged in a straight line or in an arc. When the
number of microphones is four, for example, the spacing between the microphones is set to a
spacing of 1: 3: 2. [Selected figure] Figure 5
マイクロホンアレー
[0001]
The present invention relates to a microphone array.
[0002]
In recent years, Automatic Speech Recognition (ASR) has been applied to anthropomorphic
agents and car navigation systems.
In the real environment, since the recognition rate is greatly reduced due to the effects of noise
and reverberation, research has been conducted to aim at an ASR system that is robust against
noise and reverberation (see reference [1]). By using the microphone array, it is possible to
improve the recognition performance of the remote speech by using the spatial phase difference
between the target sound source and the noise source and suppressing the noise and the
04-05-2019
1
reverberation.
[0003]
Reference [1]: NAKAMURA Tetsu, "Aiming for Robust Speech Recognition in Real Acoustic
Environments," Technical Report of SP 2002-12, pp. 31-36, 2002.
[0004]
There are various techniques in the microphone array, but in an adaptive microphone array such
as Griffith-Jim and AMNOR, it is necessary to input and learn a non-voice section in advance (see
reference [2]).
[0005]
Reference [2]: Oga, J. et al .: Sound system and digital processing, The Institute of Electronics,
Information and Communication Engineers, 1995.
[0006]
However, when speech recognition is actually performed, it is not always easy to detect a
speechless section for learning in a noise environment.
Also, although it is possible to perform noise removal that is robust against stationary noise, the
performance decreases for non-stationary noise.
In an environment where such noise and reverberation change from moment to moment, the
recognition performance is degraded.
[0007]
Therefore, the inventors of the present application aim to achieve both the performance and
convenience of the ASR, and as an example thereof, a delay sum (DS) type that does not require
learning and is effective in suppressing noise and reverberation. We focused on the microphone
array and tried to improve the microphone spacing and placement.
04-05-2019
2
[0008]
The following description will be made on the case of using a delay and sum (DS) microphone
array.
As is well known, a delay-and-sum microphone array is a microphone array that performs
processing (hereinafter referred to as delay-and-sum processing) of adding a delay to each signal
received by each microphone and calculating the sum of them. is there. FIG. 1 shows an example
of the configuration of a delay-and-sum microphone array. In FIG. 1, Mi (i = 1, 2,..., M) is a
microphone arranged in a straight line. Di is a delay unit that adds a delay amount di to the
signal xsi (t) received by each microphone Mi. S is an adder that adds the output signal xsi (t-di)
of the delay device Di and outputs an output signal y (t). Nakamura, Tetsu, "Aiming for Robust
Speech Recognition in Real Acoustic Environments," IEICE Technical Report, SP 2002-12, pp. 3136, 2002. Ohga, J. et al .: Sound system and digital processing, The Institute of Electronics,
Information and Communication Engineers, 1995. Kiyohiro Kano et al .: Speech recognition
system, Ohmsha, 2001.
[0009]
An object of the present invention is to provide a microphone array capable of improving SNR.
[0010]
The invention according to claim 1 is characterized in that in the microphone array provided
with three or more microphones, the distance between the microphones is set to a distance
proportional to the graduation distance of the shortest Golomb ruler.
[0011]
The invention according to claim 2 is characterized in that, in the invention according to claim 1,
the three or more microphones are arranged in a straight line.
[0012]
The invention according to claim 3 is characterized in that, in the invention according to claim 1,
the three or more microphones are arranged in an arc shape.
04-05-2019
3
[0013]
The invention according to claim 4 is characterized in that, in the invention according to claims 1
to 3, four microphones are provided, and the distance between the microphones is set to a
distance of one to three to two. I assume.
[0014]
According to the present invention, a microphone array capable of improving the SNR is realized.
[0015]
An embodiment in which the present invention is applied to a delay-and-sum microphone array
will be described below with reference to the drawings.
[0016]
[1] Preliminary examination
[0017]
As a preliminary study, in order to compare the performance of delay-and-sum microphone
arrays under various conditions, speech recognition experiments were performed by simulation.
In particular, we focused on the number of microphones, the microphone spacing, the angle of
the noise to the microphone array, and the SNR.
The microphones of the delay-and-sum microphone array used in the preliminary study are
assumed to be arranged at equal intervals along a straight line.
[0018]
As voice data for evaluation, ATR BTEC test set 01 was used.
04-05-2019
4
This evaluation data is a reading of a conversation used during travel, and has 510 sentences in
total, and is recorded at 16 kHz sampling.
[0019]
The noise was assumed to come from the front of the microphone array, and the same noise was
added to the voice with an appropriate time difference as the microphone reception signal.
As noise, a random band noise from 125 Hz to 6 kHz was used according to the frequency band
of speech.
[0020]
The SNR obtains the energy of the signal from the average amplitude of the section excluding the
non-voice section of voice data, and changes the noise amplitude so that the SNR of the sound
reception signal of the microphone (SNR of the input signal) becomes the target SNR. The
After that, noise-suppressed speech was recognized by delay-and-sum processing.
[0021]
As a result, the recognition rate (word correct accuracy) improved as the number of microphones
increased.
Furthermore, it has been found that the SNR after delay-and-sum processing changes in relation
to the angle of the voice with respect to the microphone array according to the microphone
spacing, and the recognition rate improves as the SNR of the input signal increases.
[0022]
04-05-2019
5
The number of microphones in the delay-and-sum microphone array was two, and their intervals
(microphone intervals) were changed to 5 cm, 10 cm, and 15 cm.
The change in SNR after delay-sum processing due to the change in the angle between the sound
source and the noise source at each microphone spacing (5 cm, 10 cm, 15 cm) is shown in FIG.
FIG. 2 shows the change in SNR after delay-and-sum processing when the sound source direction
is changed every five degrees from −90 degrees to +90 degrees, with the SNR of the input signal
being 20 dB.
[0023]
If the sound source direction and the noise direction are known from the results obtained from
the preliminary examination, it is possible to improve the SNR after the delay-and-sum process by
adjusting the microphone spacing according to FIG. Therefore, if a large number of microphones
are prepared in advance, similar effects can be obtained by selecting a pair of microphones with
appropriate spacing.
[0024]
If there is an arrangement in which various intervals can be obtained without increasing the
number of microphones as much as possible, it is possible to select an optimal distance in
accordance with the sound source direction and the noise direction.
[0025]
[2] Introduction of shortest Golomb ruler In order to satisfy the above-mentioned requirements,
in this embodiment, the graduation spacing of the shortest Golomb Ruler (OGR) is introduced in
the distance between the microphones of the delay-and-sum microphone array.
[0026]
The shortest Golomb ruler (OGR) is used for the placement of X-ray sensors and the placement of
radio telescopes.
04-05-2019
6
This interval is such that even if the number of sensors is small, the types of distances that can be
measured increase.
For example, in the case where there are four arrangement targets, their arrangement positions
are {0-1-4-6}, and in the case where there are ten arrangement targets, their arrangement
positions are {0-1-6-6 10-23-26-34-41-53-55.
[0027]
By using the shortest Golomb ruler, it is possible to obtain more types of spacing than at regular
intervals. As shown in FIG. 3A, in the case where four arrangement targets are arranged at equal
intervals, their arrangement positions become {0-2-4-6}, and the kind of interval is , {2, 4, 6}. On
the other hand, as shown in FIG. 3 (b), when four arrangement objects are arranged according to
the shortest Golomb periodical interval, their arrangement positions become {0-1-4-6}, and There
are six types {1, 2, 3, 4, 5, 6}.
[0028]
The scale of the Golomb Ruler is a set of positive integers whose difference between two sets of
numbers is not identical. When there are M arrangement objects, “δ ij = a j −ai (1 ≦ i ≦ j ≦
m) are all different, and a number sequence a k (k = 1, 2) satisfying 0 = a 1 <a 2 <. , ..., m) ”is a
golomb ruler. This shortest aM is called the shortest Golomb ruler.
[0029]
The speech recognition rate is improved over that of a regular equally-spaced delay-and-sum
microphone array in which the microphones are equally spaced by using the graduation spacing
of the shortest Golomb ruler as the spacing of the microphones constituting the delay-and-sum
microphone array. Can.
[0030]
04-05-2019
7
In this embodiment, the distance between the microphones of the delay-and-sum microphone
array is set to a distance proportional to the graduation distance of the shortest Golomb ruler.
For example, the scale of the shortest Golomb ruler in the case of m = 4 is {0-1-4-6}, and the scale
interval is 1, 3, 2. Therefore, in the case of four microphones, four microphones are arranged
such that the distance between adjacent microphones is 1: 3: 2.
[0031]
Furthermore, in order to emphasize the optimal spacing, after the delay processing
corresponding to the microphone pair that becomes the microphone spacing at which the SNR
after the delay-and-sum processing becomes high at that angle according to the estimated angle
between the sound source and the noise To give a large weight to the signal of 、, add a small
weight to the signal after delay processing that corresponds to the microphone pair that results
in a low microphone spacing after delay-and-sum processing, and then sum them up. It is
preferable to
[0032]
However, assuming that the weight for each microphone is ki (i = 1, 2,..., M), ki is set so as to
satisfy the condition that the sum of ki is 1.
Under this condition, the amplitude of the sound source does not change.
[0033]
FIG. 4 shows a delay-and-sum microphone array of this embodiment.
[0034]
In FIG. 4, Mi (i = 1, 2, 3, 4) is a microphone.
That is, in this example, four microphones are provided. Di is a delay unit that adds a delay
amount di to the signal xsi (t) received by each microphone Mi. Pi is a multiplier that multiplies
04-05-2019
8
the output signal xsi (t-di) of the delay device Di by the weight ki. S is an adder that adds the
output signal ki · xsi (t−di) of the multiplier Pi and outputs an output signal y (t).
[0035]
As shown in FIG. 5, the four microphones M1 to M4 are arranged in a straight line. The
arrangement positions of the microphones M1 to M4 are set to {0 cm-3 cm-12 cm-18 cm} so that
the distance between the microphones becomes a distance proportional to the graduation
distance of the shortest Golomb ruler. Therefore, the distance W12 between M1 and M2 is 3 cm,
the distance W23 between M2 and M3 is 9 cm, and the distance W34 between M3 and M4 is 6
cm.
[0036]
[3] Evaluation Experiment [3.1] Experimental Condition In order to confirm the effect of the
method (proposed method) in the above example, a performance evaluation experiment by
speech recognition rate was performed.
[0037]
The speech signal under noise environment when microphone array was used was created by
computer simulation, and speech recognition experiment was conducted based on the data.
[0038]
As the parameters of the microphone array, voice was input from the front to the row of
microphones, and the same noise as in the preliminary examination was input from a direction
inclined by 30 degrees.
We compared the delay-and-sum microphone array (hereinafter referred to as DS array), which is
a conventional method with four microphones, and the delay-and-sum microphone array (OGRDS array) having the shortest Golomb ruler arrangement, which is the proposed method .
[0039]
04-05-2019
9
Here, in order to make the microphone spacing the same in the two methods, in the DS array, the
microphones are disposed at {0 cm-6 cm-12 cm-18 cm}, and in the OGR-DS array, as shown in
FIG. The microphone was placed at {0 cm-3 cm-12 cm-18 cm}.
In addition, as a control experiment, the recognition rate was also determined when the
microphone array was not used, that is, when one microphone was used.
[0040]
The weight (ki (i = 1, 2, 3, 4)) of each microphone is changed by 0.1 for each utterance, and the
voice section after delay-and-sum processing and the non-voice section are detected and
compared. The highest SNR among the 84 possible SNRs was the input to speech recognition.
[0041]
As speech recognition engine, Julius 3.lp2 was used, and 200 sentences of newspaper reading
speech of IPA-testset were used as evaluation data (see reference [3]).
[0042]
Reference [3]: Shirohiro Kano et al .: Speech recognition system, Ohmsha, 2001.
[0043]
The acoustic feature quantities were analyzed in the 12th-order MFCC and the 25 dimensions of
the ΔMFCC and ΔPower, with a frame length of 25 ms and a frame shift of 10 ms.
[0044]
[3.2] Results and Discussion Table 1 shows the results of speech recognition experiments.
[0045]
[0046]
The OGR-DS array was able to improve the recognition rate despite the simple method of
changing the arrangement of microphones and weighting.
04-05-2019
10
[0047]
As a result of finding the recognition rate even in a DS array in which five microphones are
arranged at {0 cm-6 cm-12 cm-18 cm-24 cm} under a 10 dB noise environment, the recognition
rate was 46.9%.
On the other hand, in the OGR-DS array, as shown in Table 1, even with four microphones, the
recognition rate is 51.1%.
In this case, the size of the microphone array was smaller than that of the OGR-DS array by 6 cm.
[0048]
Thus, the proposed method makes it possible to reduce the number of microphones and to
reduce the size of the microphone array.
[0049]
Under the conditions of this experiment, the weight of each microphone is 0.3 for the
microphones arranged at 0 cm and 3 cm, and 0.2 for the microphones arranged at 12 cm and 18
cm. I got up to the sentence.
From this, it is thought that processing speed can be improved if formulation of weights is
possible.
[0050]
[4] Others In the above embodiment, the microphones in the microphone array are arranged in a
straight line, but may be arranged in an arc as shown in FIG.
In this case, since four microphones M1 to M4 are provided, the distance between M1 and M2
04-05-2019
11
(length along the arc) and the distance between M2 and M3 (length along the arc), The ratio of
the distance between M3 and M4 (length along the arc) is set to 1: 3: 2.
[0051]
Further, as shown in FIG. 7, even when three or more microphones are arranged on each of two
or more sides of a virtual cube in a microphone array, the present invention is applied to the
arrangement of microphones on each side. can do.
In the case of FIG. 7, since four microphones M1 to M4 are disposed on each side, the distance
between adjacent microphones on each side is set to 1: 3: 2.
Although not shown, the present invention is applied to the arrangement of microphones for
each oblique side even when three or more microphones are arranged on each of two or more
oblique sides of the virtual quadrangular pyramid in the microphone array. be able to.
[0052]
It is a block diagram which shows the general structure of a delay and sum type microphone
array.
It is a graph which shows the change of SNR after the delay sum process by the change of the
angle of a sound source and a noise source in each microphone space | interval (5 cm, 10 cm, 15
cm).
FIG. 3 (a) shows the arrangement positions and the types of intervals when the arrangement
targets are arranged at equal intervals in the case of four arrangement targets, and FIG. It is a
schematic diagram which shows the arrangement position at the time of arrange | positioning
according to a space | interval, and the kind of space | interval. It is a block diagram which shows
the structure of the delay sum type | mold microphone array of a present Example. It is a
schematic diagram which shows arrangement | positioning of the microphone of FIG. It is a
schematic diagram which shows the example in case each microphone in a microphone array is
arrange | positioned at circular arc shape. It is a schematic diagram which shows the example at
04-05-2019
12
the time of arrange | positioning three or more microphones in each of two or more sides of a
virtual cube in a microphone array.
Explanation of sign
[0053]
Mi microphone Di delay Pi multiplier S adder
04-05-2019
13
Документ
Категория
Без категории
Просмотров
0
Размер файла
22 Кб
Теги
jp2005260743
1/--страниц
Пожаловаться на содержимое документа