close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JPH0595596

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH0595596
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
noise reduction apparatus for reducing noise of an input speech signal, improving S / N ratio,
and raising speech recognition rate, for example, used in automobile telephones and speech
recognition apparatuses. is there.
[0002]
2. Description of the Related Art As a device for reducing the noise of an input voice signal in a
car telephone or the like, for example, the one disclosed in Japanese Patent Application LaidOpen No. 63-190462 is known. This is an equalizer amplifier that embeds a transmitting
microphone in the vehicle's instrument panel or in a vehicle operation unit such as a steering
wheel, and amplifies the low frequency band of the audio signal from the microphone at an
amplification factor higher than other frequency bands. The equalizer amplifier consists of an op
amp with negative feedback, a resistor and a capacitor, and the capacitor removes the highpitched part of the audio signal by the capacitor to relatively increase the amplification factor of
the low-pitched part during transmission. It is intended to enable clear communication to the
other end of the telephone by preventing the bass range from becoming smaller, and to prevent
the installation position of the transmitting microphone from being restricted to the front pillar
or the like.
[0003]
10-04-2019
1
SUMMARY OF THE INVENTION The noise reduction apparatus as described above usually uses
only one transmitting microphone as a means for inputting voice.
[0004]
Therefore, the present invention newly provides a noise reduction device that improves the S / N
ratio based on a concept of suppressing noise and emphasizing and extracting a voice signal by
combining voice signals received by each microphone using a plurality of microphones. The
purpose is to provide to
[0005]
SUMMARY OF THE INVENTION A noise reduction device according to the present invention
comprises a plurality of microphones disposed around a speaker for receiving the voice of the
speaker, and voice signals respectively received by the plurality of microphones. Time difference
detecting means for detecting a time difference between the two, and a delaying means for
delaying in phase one of the audio signals respectively received by the plurality of microphones
based on the time difference detected by the time difference detecting means; And means for
adding each audio signal after phase adjustment by the delay means.
The above-mentioned time difference detection means can be configured to calculate the degree
of cross-correlation of each audio signal received by each of the plurality of microphones, and to
detect the time difference between each audio signal based on the degree of cross-correlation.
[0006]
The time difference detecting means detects the time difference between the voice signals of the
speakers received by the microphones, and the delay means adjusts the time difference of the
voice signals so that the time difference disappears.
When the audio signals of the microphones whose phases are aligned with time in this way are
added by the addition means, the volume level is increased by multiple times. On the other hand,
since the level of the irregular noise does not increase even if it is added, the S / N ratio is
relatively improved.
10-04-2019
2
[0007]
As a method of detecting the time difference, when the cross-correlation degree of the sound
signal received by each microphone is determined, the irregular noise has a low degree of
correlation, while the sound signal has a high degree of correlation. The audio signal of the
microphone may be extracted and the time difference may be determined.
[0008]
Embodiments of the present invention will be described hereinbelow with reference to the
drawings.
FIG. 2 shows the configuration of a noise reduction apparatus as an embodiment of the present
invention. The apparatus of this embodiment is an application of the present invention to a
mobile telephone, and is configured to receive the voice of the driver to be input to the mobile
telephone 1 by the two microphones 3a and 3b. As these microphones, as shown in FIG. 2, the
microphone 3a is installed at the right end of the front instrument panel of the driver 4 and the
microphone 3b is installed near the left shoulder of the driver's seat. The sound propagation time
between them is ta, and the sound propagation time between the driver 4 and the microphone 3b
is tb.
[0009]
The received signal received by the microphone 3a is analog / digital converted by the A / D
converter 6a and then divided into n and input to the band pass filter groups 7a1 to 7an.
Similarly, the received signal received by the microphone 3b is analog-to-digital converted by the
A / D converter 6b, divided into n, and input to the band pass filter groups 7b1 to 7bn. The band
pass filter groups 7a1 to 7an and 7b1 to 7bn are band pass filters which divide the voice band
into n. The output signals of the band pass filter groups 7a1 to 7an and 7b1 to 7bn are
respectively input to correlators 91 to 9n provided for each band. These correlators 91-9n
calculate the degree of cross-correlation of the signals in the respective bands from the
microphones 3a and 3b, and input the degree of correlation 111-11n which is the calculation
result to the CPU 12.
[0010]
10-04-2019
3
Reference numeral 13 denotes a delay unit for delaying the sound collection signal 5b from the
microphone 3b which has been A / D converted by the A / D converter 6b, and the amount of
delay is controlled by the CPU 12. An adder 14 adds the sound collection signal 5b on the
microphone 3b side delayed by the delay unit 13 and the sound collection signal 5a on the
microphone 3a side from the A / D converter 6a. Reference numeral 15 denotes a D / A
converter that D / A converts the output signal from the adder 14. Reference numeral 1 denotes
a mobile telephone, but when the present invention is applied not to a mobile telephone but to a
speech recognition apparatus, it is a speech recognition apparatus.
[0011]
The operation of the apparatus of this embodiment is described below. The voice transmitted by
the driver 4 is input to the microphones 3a and 3b, respectively, and A / D converted by the A /
D converters 6a and 6b. The sound-collected signals 5a and 5b after A / D conversion are input
to the band-pass filter groups 7a1 to 7an and 7b1 to 7bn, respectively, and the audio band is
divided into n, and the n-divided audio bands microphones 3a and 3b The signals are input to
correlators 91-9n, respectively. The correlators 91 to 9n calculate the degree of correlation 111
to 11n as a function of time t for the two input signals for each of the bands. For example, if an
input signal (noise and voice signal of the driver 4) as shown in FIG. 4 is input to the
microphones 3a and 3b, the correlation degree related to noise becomes a small value when the
cross-correlation degree is calculated. The degree of correlation of the audio signal is a large
value.
[0012]
The CPU 12 extracts the signal having the largest correlation degree among the two series of
time-series input signals to the microphones 3a and 3b based on the information of the
correlation degree 111 to 11n obtained from the correlators 91 to 9n The time difference Δt
between them is determined. That is, when the voice of the driver 4 is input to the microphones
3a and 3b, the correlation degree of the voice signal should be the highest between the two
systems, and the time difference Δt between the voice signals can be seen from FIG. -Tb). Thus,
when the relationship between the correlation degree and the time difference Δt is determined,
as shown in FIG. 3, the speech signal has a high correlation degree at the position of the time
difference (ta-tb), while the irregular noise has the correlation degree Low.
10-04-2019
4
[0013]
The sudden regular noise as shown in FIG. 2 also has a high degree of correlation at the position
of the time difference (t1 -t2) in FIG. 3, but the driver's 4 voice shows the driver's 4 physique
Since the time difference (ta-tb) can be predicted from a standard size such as a seat or a seat, the
signal of the time difference (t1-t2) far apart from this is not an audio signal but a sudden one. It
can be estimated that the noise is a noise. Therefore, by applying a filter centered around the
time difference (ta-tb), it is possible to distinguish the voice signal from the regular noise and
extract only the voice signal.
[0014]
After obtaining the time difference Δt = (ta−tb) of the audio signal, the CPU 12 causes the delay
unit 13 to delay the audio signal 5b input to the microphone 3b by (ta−tb).
As a result, the phase of the audio signal 5a input from the microphone 3a and the audio signal
5b 'input from the microphone 3b and delayed by the delay unit 13 coincide with each other, and
these two signals are superimposed in the same phase by the adder 13. It will be done.
[0015]
As a result of the speaker's voice signal being time-phase-matched and added, its volume level is
doubled. On the other hand, since random noise is a random signal, the level does not increase
even if these are added, so that the signal after addition has a relatively improved S / N ratio of
the audio signal. . For example, when the microphones 3a and 3b are nondirectional and the
wavelength of noise is shorter than (ta-tb), the improvement is about 6 dB at the maximum.
[0016]
Although the case where the speaker is the driver 4 is described above as an example, the same
operation is performed when the speaker changes from the driver 4 to, for example, a passenger
in the front passenger seat, and the passenger To follow the voice and enhance the voice signal
to improve the S / N ratio.
[0017]
10-04-2019
5
In the above embodiment, the speech signal is divided into n bands by the band pass filter groups
7a1 to 7an and 7b1 to 7bn to obtain the correlation, but without performing such band division,
each of the microphones 3a and 3b The correlation between the signals of the system may be
determined.
That is, the band pass filter groups 7a1 to 7an and 7b1 to 7bn may be replaced by one band pass
filters 7a and 7b which pass the voice band.
[0018]
Various modifications are possible in the practice of the present invention. For example, although
the above embodiment describes the case where two microphones are provided as audio input
signals, the present invention is not limited to this, and if the number of microphones is further
increased, the S / N ratio of the present invention can be increased. The improvement effect is
further improved.
[0019]
FIG. 5 shows an embodiment in which three microphones are provided. In the figure, a driver 41,
a passenger 42 in the rear seat, audio speakers 43L and 43R on the left and right, and external
noise 44 are assumed as sound sources. The three microphones 3 a, 3 b and 3 c are disposed at
three locations around the driver 41.
[0020]
Here, in order to simplify the explanation, the operation will be described assuming that impulse
sound is generated simultaneously from each sound source. At this time, time until sound waves
reach the microphones 3a, 3b, 3c is shown in FIG. As illustrated, the sound source signals
reached by the distance between each microphone and each sound source have a time difference.
10-04-2019
6
[0021]
Next, the degree of correlation between the sound source signals input to the microphones 3a, 3b
and 3c is determined by a cross correlator. In this case, as shown in FIG. 7, the correlation degree
11 is calculated with the microphones 3a and 3b, 3c and 3b, and 3a and 3c as a set, respectively,
and the time difference Δt of each sound source signal is determined based thereon. Out of
these, the voice signal of the driver 41 is extracted, and the voice signals inputted to the three
microphones are phase-matched and added to emphasize the voice signal to reduce noise.
[0022]
Here, four methods (1) to (4) for extracting the sound of the driver 41 from the respective sound
sources 41, 42, 43L, 43R, 44 will be described below. (1) Usually, the position of the mouth of
the driver 41 when the driver 41 sends a speech is within a certain predictable range as long as
the driver 41 does not move abnormally. For example, it can be considered that there is a driver's
41 mouth around the hatched area 22 in FIG. From this range area 22, it is possible to predict
the possible range of the time difference of the voice signal of the driver 41 for the abovementioned sets of microphones, ie, the sets of microphones 3a and 3b, 3c and 3b, and 3a and 3c.
[0023]
The range which limits this time difference is shown by hatching 20 in FIG. The limited space
range in the vehicle compartment corresponding to the time difference of the hatched portion 20
is the range surrounded by hyperbolic lines 21ab, 21cb and 21ac in FIG. The range satisfying the
AND condition of this limited space range is the above-mentioned hatched region 22, and the
sound source satisfying this AND condition is always in this hatched region 22, and the sound
source is the driver 41.
[0024]
Therefore, as shown in FIG. 7, the time differences Δt of the respective sound sources 41, 42,
43L, 43R, 44 are determined for three sets of the microphones 3a and 3b, 3c and 3b, 3a and 3c,
and these are shown as hatched portions 20ab and 20cb, respectively. If, by limiting the range of
the time difference by 20 ac, only the sound source present in it is extracted, and then the sound
source common to three sets of these extracted sound sources is extracted (ie, if the AND
condition is taken), This is an audio signal of the driver 41 in the hatched area 22 in FIG.
10-04-2019
7
[0025]
(2) Next, for example, when the passenger 42 in the rear seat speaks by approaching near the
boundary of the hatched area 22, the noise reduction device of the present invention does not
erroneously follow the voice of the passenger 42. And the driver 41 are distinguished, and the
method of raising the extraction accuracy of the driver 41's voice is described.
In this case, as shown in FIG. 8, the posture position that the driver 41 takes most for the set of
each of the microphones 3a and 3b, 3c and 3b, 3a and 3c (represented in FIG. 8 as the time
difference of the horizontal axis) Prepare weighting coefficients 23ab, 23cb, and 23ac that
approach zero as weightings increase and deviate from this, and the correlations of the
respective sound sources determined for the combinations of the microphones 3a and 3b, 3c and
3b, and 3a and 3c are prepared. Weighting factors are applied respectively to perform weighting.
In this way, the speech of the driver 41 has a high degree of correlation, while the degree of
correlation of the voice of the same passenger as in the vicinity of the boundary of the hatched
area 22 becomes small. Go up.
[0026]
(3) Next, a method will be described in which human voice and other noise are distinguished
using time variation dΔt / dt of time difference Δt so that the present apparatus does not follow
noise and the like. The time variation dΔt / dt of the time difference Δt of the voice of the driver
41 in FIG. 7 is equal to or less than the speed V at which a human can move. Therefore, the
integration filter 24 with a time constant 1 / V is applied to the time variation dΔt / dt so that
only the time variation dΔt / dt of human speech is extracted and followed.
[0027]
FIG. 9 shows the state of extraction due to this time variation. In the figure, the vertical axis
represents the degree of correlation × weighting coefficient, the horizontal axis represents the
time difference Δt, and the axis in the depth direction represents time t. It is assumed that the
voice of the driver 41 has been extracted as 41, 41, 41, 41 at each time t to t. Here, when a
sudden noise 44 occurs at time t, in order to distinguish the voice 41 which should be extracted
10-04-2019
8
next and the noise 44 which should not be extracted, the voice 41 and the noise based on the
voice 41 are referred to The time variation dΔt / dt of 44 is determined respectively. In this case,
since the time variation of the noise 44 exceeds the speed V at which the human can move, it is
applied to the integration filter 24 for attenuating the time variation 1 / V or more as shown in
FIG. As a result, the degree of correlation of the noise 44 that has appeared suddenly is
attenuated and considered as noise other than human voice, and ignored and not extracted. (4)
Next, a method to distinguish human voice from noise such as music and running noise will be
described. The human speech usually ceases because of breathlessness in about 10 seconds as
shown in FIG. On the other hand, noise such as music from the speakers and running noise may
continue over longer. Therefore, for example, a signal that lasts for 30 seconds or more is
regarded as a noise such as a music signal or a running sound, and the signal extraction of the
next weight is performed.
[0028]
FIG. 11 shows the state of this extraction. As illustrated, it is assumed that a high degree of
correlation is obtained for the two sound sources 41 and 43, and at first, one of them is followed
as indicated by a thick solid line. However, although the sound source 41 on the side which has
not been followed up ceases after about 10 seconds, the sound source 43 on the side which is
currently following up continues without interruption. As a result, it is possible to deduce that the
sound source currently being followed is a music sound or a running sound, while it can be
considered that the sound source 41 on the side which has not been followed up is apparently
human voice. Therefore, the sound source to be followed is switched to the sound source 41 on
the side after the point where the sound source is interrupted.
[0029]
As described above, according to the present invention, the S / N ratio of the input voice signal
can be improved, and the noise reduction device of the present invention is applied to a car
telephone or a speech recognition device. Can improve the speech recognition rate.
10-04-2019
9
Документ
Категория
Без категории
Просмотров
0
Размер файла
19 Кб
Теги
description, jph0595596
1/--страниц
Пожаловаться на содержимое документа