close

Вход

Забыли?

вход по аккаунту

?

JPH04322599

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH04322599
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
speaker orientation detection apparatus for estimating a speaker's utterance position in a sound
collection system such as a conference system.
[0002]
2. Description of the Related Art In recent years, there are many scenes where automatic
recognition of a speaker in a sound collection system or signal processing unit of a sound
collection system using a plurality of microphones requires information on speaker position,
There is a need for an accurate speaker orientation detection device that is not affected by
ambient noise and the like.
[0003]
An example of the above-mentioned conventional speaker orientation detection apparatus will be
described below with reference to the drawings.
[0004]
FIG. 5 shows the configuration of a conventional speaker orientation detection apparatus.
10-05-2019
1
In FIG. 5, 1 is a first microphone unit, and 6 is a second microphone unit.
A correlator 11 correlates output signals from the first microphone unit 1 and the second
microphone unit 6. 12 is an output terminal of the correlator.
[0005]
The operation of the speaker orientation detection means configured as described above will be
described below. First, sound waves uttered from a certain direction are observed as an output
time-series signal x (n) from the first microphone unit 1 and an output time-series signal y (n)
from the second microphone unit 6. Next, with x (n) and y (n) as inputs, the correlation means 11
calculates the following equation.
[0006]
[Equation 1]
[0007]
At this time, the speaker position is estimated at time τmax when Rxy (τ) reaches the maximum
value, and the first microphone unit 1 is connected on the line connecting the first microphone
unit 1 and the second microphone unit 6. Assuming that the direction is front (θ = 0 °), the
speaker orientation θt can be estimated as in the following equation.
[0008]
[Equation 2]
[0009]
Where c is the speed of sound and d is the microphone unit interval.
(FIG. 6) is an example of an input waveform and a cross correlation function, (a) is an output
signal waveform of the first microphone unit 1, (b) is an output signal waveform of the second
microphone unit 6, (c) is the above The cross talk function Rxy (τ) is obtained by (Equation 1)
10-05-2019
2
from the two signal waveforms, and the talker direction θt is determined by (Equation 2) from
the time τmax when Rxy (τ) is maximum.
[0010]
(FIG. 7) shows the input signal waveform in the case where much noise is mixed only in the
second microphone unit 6 due to differences in directivity, sound pressure frequency
characteristics and sound collection environment of the two microphone units, etc. It is an
example of the correlation function Rxy (τ).
In the figure, (a) shows the output signal waveform of the first microphone unit 1, (b) shows the
output signal waveform of the second microphone unit 6, and (c) is obtained from the above two
signal waveforms by Eq. The cross talk function Rxy (.tau.) And the speaker orientation .theta.t
can be obtained from Eq. 2 from time .tau.max when Rxy (.tau.) Is maximum.
[0011]
However, in the configuration as described above, when the search range of the maximum value
of the cross correlation function Rxy (τ) when the distance d between the microphone units is
long, the cross correlation function is wide. In Rxy (τ), since the periodicity of speech is
particularly emphasized, a plurality of local maximum values appear, and the difference between
the maximum value and the second largest value is small. When the ratio is low or when the
difference between the characteristics of the two microphone units is large, the position of the
maximum value of Rxy (τ) appears at an inappropriate place, which causes a problem of
frequent false detection.
[0012]
SUMMARY OF THE INVENTION In view of the above problems, the present invention provides a
speaker orientation detection apparatus with few errors which is not affected by noise or
differences in characteristics of microphone units.
[0013]
SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the speaker
orientation detecting apparatus according to the present invention comprises a signal rectifying
means, a signal integrating means with a time constant τ1 at a subsequent stage of the signal
rectifying means, The signal integration means provided in parallel with the signal integration
10-05-2019
3
means of time constant τ1 at the subsequent stage of the signal rectification means is the signal
integration means of time constant τ2 and the signal integration means of time constant τ1
and the signal integration means of time constant τ2 The signal subtracting means is provided
at a stage subsequent to the first microphone unit and the second microphone unit.
[0014]
According to the present invention, the signal integrating means of the time constant τ1 is set
so that the signal integrating means of the time constant τ1 outputs a time-varying change of
the average power of the audio signal with the above-mentioned configuration. By setting τ2
short so that the output signal changes following the maximum amplitude of each pitch period of
the audio signal, a difference signal between the signal integration means of time constant τ1
and the signal integration means of time constant τ2 is obtained. The voice pitch information
and power change information are emphasized, the effect of steady-state noise on power rise is
suppressed, and the effect of suppressing noise components higher in frequency than the voice
band is obtained. Since the cross-correlation function Rxy (τ) comes to have a clear maximum
value, it is possible to perform accurate speaker orientation detection without being affected by
noise and the like.
[0015]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A speaker orientation detection
apparatus according to an embodiment of the present invention will be described below with
reference to the drawings.
FIG. 1 shows the configuration of the speaker orientation detection apparatus according to the
first embodiment of the present invention.
In FIG. 1, 1 is a first microphone unit, 2 is a first signal rectifier, and is provided downstream of
the first microphone unit 1.
A first signal integrator 3 is provided downstream of the first signal rectifier 2.
A second signal integrator 4 is provided in parallel with the first signal integrator 3 at a stage
subsequent to the first signal rectifier 2.
10-05-2019
4
A signal subtracter 5 takes the difference between the output signals from the first signal
integrator 3 and the second signal integrator 4. 6 is a second microphone unit, and 7 is a second
signal rectifier provided at the rear stage of the second microphone unit 6. A third signal
integrator 8 is provided downstream of the second signal rectifier 7. A fourth signal integrator 9
is provided in parallel with the third signal integrator 8 at a stage subsequent to the second
signal rectifier 7. A second signal subtractor 10 takes the difference between the output signals
from the third signal integrator 8 and the fourth signal integrator 9. A correlator 11 estimates
the direction of the speaker from the cross-correlation function between the first signal
subtractor 5 and the output signal from the second signal subtractor 10. Reference numeral 12
denotes an output terminal of the correlator 11.
[0016]
The operation of the speaker orientation detection apparatus configured as described above will
be described below using (FIG. 1), (FIG. 2), (FIG. 3) and (FIG. 4).
[0017]
FIG. 2 (a) shows an example of the waveform of the output signal of the first microphone unit 1,
and FIG. 2 (b) shows an example of the waveform of the output signal from the first signal
subtractor 5. is there.
The integration time constant τ 1 of the first signal integrator 3 and the third signal integrator 8
(FIG. 1) is set long to output the time envelope of the power of the audio signal, and the second
signal integrator 4 and The integration time constant τ 2 of the fourth signal integrator 9 is set
short so that the output signal changes in accordance with the maximum amplitude for each
pitch period of the voice signal, as shown in FIG. 2 (a). The input signal is converted into a signal
waveform focusing on the change of the time envelope of power and the voice pitch information
as shown in FIG. 2 (b).
[0018]
In FIG. 3, (a) is the output signal of the first microphone unit 1, (b) is the output signal of the
second microphone unit 6, and (c) is the cross correlation function in the correlation means. Rxy
10-05-2019
5
(τ) is represented, and when the output from the first signal subtractor 5 is x (n) and the output
signal from the second signal subtractor 10 is y (n) (Equation 1) It asked according to). As
described above, according to the present embodiment, the maximum value of the cross
correlation function Rxy (τ) becomes clearer than that in the conventional example, and the
direction of the speaker is estimated by (Equation 2) from the time τmax at that time. Become.
Furthermore, (FIG. 4) is an input / output response example when a large amount of noise is
mixed only in the second microphone unit 6, (a) is the output signal of the first microphone unit
1, (b) is the second (C) shows the cross-correlation function Rxy (τ) in the correlation means,
and the output from the first signal subtractor 5 is x (n), Assuming that the output signal from
the signal subtracter 10 of 2 is y (n), it is obtained according to (Equation 1). Similar to (FIG. 3),
the maximum value of the cross correlation function Rxy (τ) is clearer compared to the
conventional example.
[0019]
As described above, according to the present embodiment, the first microphone unit 1, the first
signal rectifier 2 provided at the rear stage of the first microphone unit 1, and the rear stage of
the first signal rectifier 2 , A second signal integrator 4 provided in parallel with the first signal
integrator 3 at a stage subsequent to the first signal rectifier 2, and a first signal integrator A
second signal rectifier provided downstream of the first signal subtractor 5, the second
microphone unit 6, and the second microphone unit 6 for obtaining the difference between the
output signal of the third signal integrator 4 and the output signal of the second signal integrator
4. 7 and a third signal integrator 8 provided downstream of the second signal rectifier 7, and a
fourth signal integrator 7 provided downstream of the second signal rectifier 7 in parallel with
the third signal integrator 8. The difference between the output signals of the signal integrator 9,
the third signal integrator 8 and the fourth signal integrator 9 is A second signal subtractor 10,
and a correlator 11 which receives an output signal from the first signal subtractor 5 and the
second signal subtracter 10 to obtain a correlation, a first signal integrator 3 and a first signal
integrator By setting the time constant of the signal integrator 8 of 3 to be equal to .tau.1, the
time constant of the second signal integrator 4 and the fourth signal integrator 9 to be equal to
.tau.2, and .tau.1 and .tau.2 to be different, It is possible to reduce the rate of error in the speaker
direction detection due to noise or differences in the characteristics of the microphone unit.
[0020]
As described above, according to the present invention, the first microphone unit, the first signal
rectifying means provided at the rear stage of the first microphone unit, and the rear stage of the
first signal rectifying means A second signal integrating means provided in parallel with the first
signal integrating means downstream of the first signal rectifying means, a first signal integrating
means, and a second signal integrating means provided in First signal subtracting means for
obtaining a difference between output signals of the signal integrating means, a second
10-05-2019
6
microphone unit, a second signal rectifying means provided at a stage subsequent to the second
microphone unit, and a second signal rectification means A third signal integrating means
provided downstream of the means, a fourth signal integrating means provided parallel to the
third signal integrating means downstream of the second signal rectifying means, and a third
signal integration Take the difference between the output signal of the means and the fourth
signal integrating means And second correlation means for obtaining a correlation with the
output signals from the first signal subtraction means and the second signal subtraction means
as input, and the first signal integration means and the third signal integration means By
providing the time constant equal to τ 1, the second signal integration means equal to the time
constant equal to τ 2, and the same time difference τ 1 and τ 2, the talk due to the difference
of the ambient noise and the characteristic of the microphone unit It is possible to reduce the
error rate of the user direction detection.
[0021]
Brief description of the surface
[0022]
1 is a block diagram of a speaker orientation detection apparatus according to an embodiment of
the present invention.
[0023]
Fig. 2 (a) Output time-series signal waveform diagram of the first microphone unit according to
the present invention
[0024]
Fig. 3 (a) Output time-series signal waveform diagram of the first microphone unit according to
the present invention
[0025]
FIG. 4 (a) A waveform diagram of an output time-series signal of the first microphone unit
according to the present invention when a large amount of noise is mixed in only the second
microphone unit.
[0026]
5 is a block diagram of a conventional speaker orientation detection device.
10-05-2019
7
[0027]
Waveform diagram of output time-series signal of the first microphone unit of FIG. 6 (a) (FIG. 5)
[0028]
Fig.7 (a) The wave form diagram of the output time-sequential signal of the 1st microphone unit
concerning a prior art example at the time of much noise being mixed only in the 2nd
microphone unit
[0029]
Explanation of sign
[0030]
1 first microphone unit 2 first signal rectifier 3 first signal integrator 4 second signal integrator 5
first signal subtracter 6 second microphone unit 7 second signal rectifier 8 third signal
Integrator 9 Fourth signal integrator 10 Second signal subtractor 11 Correlator 12 terminal
10-05-2019
8
Документ
Категория
Без категории
Просмотров
0
Размер файла
16 Кб
Теги
jph04322599
1/--страниц
Пожаловаться на содержимое документа