close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2012216998

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012216998
A signal processing apparatus and signal processing method capable of efficiently reducing a
disturbance signal are provided. SOLUTION: A plurality of speakers for reproducing sounds of a
plurality of channels, a plurality of microphones for collecting sounds of a plurality of channels,
and a user present in a spatial direction collecting sounds by the plurality of microphones are
detected. Means for outputting directivity characteristic information indicating the relative
direction of the user to the speaker, and a disturbance signal included in the sound collection
signal with respect to the sound collection signal obtained by collecting the plurality of
microphones And signal processing means for switching the processing content for reducing the
frequency according to the relative direction indicated by the directional characteristic
information. [Selected figure] Figure 1
Signal processing apparatus and signal processing method
[0001]
Embodiments of the present invention relate to a signal processing apparatus and a signal
processing method.
[0002]
Conventionally, by changing the characteristics of an acoustic signal using a noise canceller, an
echo canceller, or the like using a DSP (Digital Signal Processor) or the like, disturbance signals
such as noise components and echo components included in the acoustic signal are reduced. It is
15-04-2019
1
Also, conventionally, there is a technique for reducing disturbance signals such as noise
components and reverberation components contained in a collected sound signal obtained by
using a plurality of microphones (microphone arrays), and outputting the result as an output
signal. Proposed. Also, conventionally, when collecting sound using a plurality of microphones,
the directivity is directed to the direction of the user who is uttering, and the directivity is
changed adaptively to reduce disturbance signals such as noise components ( Speaker tracking
microphone arrays have also been proposed.
[0003]
Unexamined-Japanese-Patent No. 2010-28653
[0004]
By the way, when a speaker tracking microphone array is used to collect sound, a plurality of
speakers are used in combination to output an audio signal. When the user moves in the space
where the audio signal is output, the audio signal is collected. Because the directivity of the
sound changes, the speaker that causes the echo switches in accordance with the position of the
speaker.
However, in the prior art, the relative direction (relative position) between the speaker and the
user is not considered with regard to the reduction of the disturbance signal, so the disturbance
signal generated in relation to the user's existing position is efficiently reduced. There was a
problem that I could not do it.
[0005]
The present invention is made in view of the above, and an object of the present invention is to
provide a signal processing device and a signal processing method capable of efficiently reducing
a disturbance signal.
[0006]
The signal processing apparatus according to the embodiment includes a plurality of speakers, a
plurality of microphones, a detection unit, and a signal processing unit.
15-04-2019
2
The plurality of speakers reproduces the sound of the plurality of channels. A plurality of
microphones pick up sound of a plurality of channels. The detection means detects a user
present in a space direction collected by a plurality of microphones, and outputs directivity
characteristic information indicating relative directions of the user with respect to a plurality of
speakers. The signal processing means processes the processing content for reducing the
disturbance signal included in the sound collection signal obtained from the plurality of
microphones, according to the relative direction indicated by the directivity characteristic
information. Switch.
[0007]
FIG. 1 is a view schematically showing a configuration of a storage device according to the
present embodiment. FIG. 2 is a diagram for explaining the operation of the visual axis detection
unit. FIG. 3 is a view schematically showing an example of the configuration of an echo canceller
unit. FIG. 4 is a view schematically showing an example of the configuration of a noise canceller
unit. FIG. 5 is a diagram for explaining the operation of the noise canceller unit 29. FIG. 6 is a
diagram showing an example of noise levels included in an amplitude vector. FIG. 7 is a view
schematically showing an example of the configuration of an echo reduction unit. FIG. 8 is a
diagram for explaining the operation of the echo reduction unit. FIG. 9 is a diagram showing an
example of echo levels included in an amplitude vector. FIG. 10 is a view schematically showing a
configuration of a signal processing unit according to a modification 1 of the embodiment. FIG.
11 is a view schematically showing a configuration of a signal processing unit according to a
second modification of the present embodiment. FIG. 12 is a view schematically showing a
configuration of a signal processing unit according to a modification 3 of the present
embodiment. FIG. 13 is a view schematically showing an example of the configuration of an echo
reduction unit according to Modifications 2 and 3;
[0008]
FIG. 1 is a view schematically showing a configuration of a signal processing apparatus according
to the present embodiment. As shown in the figure, the signal processing apparatus 100 includes
an acoustic output unit 10 and a signal processing unit 20.
[0009]
15-04-2019
3
Here, the sound output unit 10 includes volume units 11L and 11R, D / A converters 12L and
12R, and speakers 13L and 13R.
[0010]
The volume unit 11L adjusts the volume of an acoustic signal for the left channel (hereinafter,
referred to as Lch) input from the input terminal 14L in accordance with the operation amount of
a volume adjustment switch (not shown).
The volume unit 11R adjusts the volume of the sound signal for the right channel (hereinafter
referred to as Rch) input from the input terminal 14R according to the operation amount of a
volume adjustment switch (not shown).
[0011]
The D / A conversion unit 12L converts the digital sound signal whose volume has been adjusted
by the volume unit 11L into an analog signal, and outputs the analog signal to the speaker 13L.
The D / A conversion unit 12R converts the digital sound signal whose volume is adjusted by the
volume unit 11R into an analog signal, and outputs the analog signal to the speaker 13R.
[0012]
The speaker 13L and the speaker 13R are stereo speakers, and output sound (reproduction
sound) in the space where the signal processing device 100 is placed. The speaker 13L converts
the analog signal input from the D / A converter 12L into physical vibration and outputs it as a
sound. The speaker 13R converts the analog signal input from the D / A conversion unit 12R into
physical vibration and outputs it as sound (reproduction sound).
[0013]
On the other hand, the signal processing unit 20 includes the microphones 21L and 21R, the A /
D converters 22L and 22R, the delay units 23L and 23R, the monaural unit 24, the camera unit
25, the sight line detection unit 26, and the echo canceler A unit 27, an array processing unit 28,
a noise canceller unit 29, a delay unit 30, and an echo reduction unit 31 are included.
15-04-2019
4
[0014]
The microphones 21 </ b> L and 21 </ b> R are stereo microphones, and pick up the sound
transmitted in the space in which the signal processing device 100 is placed.
The microphone 21L outputs the collected sound to the A / D conversion unit 22L as an analog
sound collection signal (hereinafter, referred to as an Lch sound collection signal). Further, the
microphone 21R outputs the collected sound to the A / D conversion unit 22R as an analog
sound collection signal (hereinafter, referred to as an Rch sound collection signal).
[0015]
The A / D converter 22 L converts the Lch collected signal collected by the microphone 21 L into
a digital signal, and outputs the digital signal to the echo canceller 27. The A / D conversion unit
22 R converts the Rch sound collection signal collected by the microphone 21 R into a digital
signal, and outputs the digital signal to the echo canceller unit 27.
[0016]
The delay unit 23L and the delay unit 23R are delay circuits or the like. The delay unit 23L
delays the digital sound signal whose volume has been adjusted by the volume unit 11L for a
predetermined time, and outputs the delayed signal to the monaural unit 24. The delay unit 23R
also delays the digital sound signal whose volume has been adjusted by the volume unit 11R for
a predetermined time, and outputs the delayed signal to the monaural unit 24.
[0017]
The monaural unit 24 calculates the linear sum of the acoustic signal input from each of the
delay unit 23L and the delay unit 23R based on the following equation (1), and the signal as the
calculation result is the echo canceler unit 27 and the delay unit Output to 30. In the following
formula (1), "L" means an acoustic signal input from the delay unit 23L, and "R" means an
15-04-2019
5
acoustic signal input from the delay unit 23R. Also, “α” is a coefficient determined according
to directional characteristic information described later (where 0 ≦ α ≦ 1). α · L + (1-α) · R (1)
[0018]
Specifically, the monaural unit 24 adjusts the value of the coefficient α in the above equation (1)
according to the directional characteristic information input from the sight line detector 26 to
obtain the acoustic signals “L” and “R”. Change the weight for. Here, when the directional
characteristic information indicates “area L” described later, the weight on the acoustic signal
“L” is increased by increasing the value of the coefficient α. When the directional
characteristic information indicates “area R” described later, the weight on the acoustic signal
“R” is increased by reducing the value of the coefficient α. When the directional characteristic
information indicates “area C” to be described later, by setting the coefficient α to 1/2, the
weights for the acoustic signals “L” and “R” are equalized.
[0019]
The camera unit 25 is an imaging device, and is disposed in the spatial direction in which the
microphones 21L and 21R collect sound, that is, in the output direction of the speakers 13L and
13R. The camera unit 25 outputs the captured image data to the gaze detection unit 26.
[0020]
When the gaze detection unit 26 detects a speaker present in the output direction of the
speakers 13L and 13R from the image of the imaging data by analyzing the imaging data input
from the camera unit 25, the line of sight detection unit 26 detects the speaker in the image The
directivity characteristic information indicating the presence position in the relative direction
(relative position) to the speakers 13L and 13R is generated, and output to the monaural unit 24,
the echo canceller unit 27, the array processing unit 28, the noise canceller unit 29, and the echo
reduction unit 31 Do. Here, as a method of detecting a speaker, for example, when a face or a line
of sight of a person included in an image of imaging data is detected and the face or the line of
sight is directed to the front direction, that is, the camera unit 25, Detect people as speakers. In
addition, the analysis method which detects a face and a gaze from imaging data shall use a wellknown and common use technique. However, the directivity characteristic information indicating
15-04-2019
6
the relative direction of the speaker (user) with respect to the plurality of speakers is determined
by the arrangement information of the plurality of speakers and the arrangement information of
the plurality of microphones. As a result, the sound pickup directivity of the microphone array
set by the microphone arrangement and the information on which speaker the echo goes to
which microphone from each speaker are determined for each directivity.
[0021]
FIG. 2 is a diagram for explaining the operation of the visual axis detection unit 26. As shown in
FIG. The figure shows an example of the arrangement of the speakers 13L and 13R, the
microphones 21L and 21R, and the camera unit 25 as viewed from the top. As shown in the
figure, the speakers 13L and 13R are provided with a predetermined separation distance, and
when viewed from the listening point P, the speaker 13L is disposed on the left side and the
speaker 13R is disposed on the right side. Further, the microphones 21L and 21R are
respectively provided between the speaker 13L and the speaker 13R, and viewed from the
listening point P, the microphone 21L is disposed on the left side and the microphone 21R is
disposed on the right side. In addition, the camera unit 25 is provided between the microphone
21L and the microphone 21R, and images a space A in which sound is output. The attachment
positions of the speaker 13L, the speaker 13R, and the microphones 21L and 21R have an object
shape in which the imaging direction of the camera unit 25 is an object axis.
[0022]
Also, in the sight line detection unit 26, the space A is divided into a plurality of areas and
defined, and the directivity characteristic information indicating the area where the speaker
exists is output. For example, when the visual axis detection unit 26 detects the presence of the
speaker SP in the area L closer to the speaker 13L in the space A based on the imaging data
captured by the camera unit 25, the directivity characteristic information indicating the area L is
output Do. In the example of FIG. 2, directivity characteristic information indicating relative
directions of the speaker (user) with respect to a plurality of speakers is in any direction from the
speaker R, in any direction from the speaker L, and in which area. It is given as if it is. Note that,
in FIG. 2, a space having a spread of ± 22.5 degrees from the imaging position to the imaging
direction on the basis of the imaging position of the camera unit 25 is referred to as “area C”
and the speaker 13L is the space A excluding the area C In the example, the space of “area L”
and the space near the speaker 13R are “area R”, but the number of divisions of the area and
the size of the area are not limited thereto. Further, the positional relationship between the
speakers 13L and 13R, the microphones 21L and 21R, and the camera unit 25 is not limited to
15-04-2019
7
the example shown in FIG.
[0023]
Returning to FIG. 1, the echo canceller unit 27 removes echo components included in the
collected sound signals from the A / D conversion units 22L and 22R based on the directivity
characteristic information input from the sight line detection unit 26. The configuration of the
echo canceller unit 27 will be described below with reference to FIG.
[0024]
FIG. 3 is a diagram schematically showing an example of the configuration of the echo canceller
unit 27. As shown in FIG. The echo canceller unit 27 switches the switching unit 271 in
accordance with the directivity characteristic information input from the sight line detection unit
26. Specifically, when the directivity characteristic information indicates "area L" or "area R", the
echo canceller unit 27 operates the first processing unit 272 by switching of the switching unit
271, and the directivity characteristic information indicates "area C". In the case of indicating
“,” the second processing unit 273 is operated by switching of the switching unit 271.
[0025]
Here, the first processing unit 272 includes subtraction units 2721L and 2721R, adaptive filter
learning units 2722L and 2722R, and pseudo echo generation units 2723L and 2723R.
[0026]
The subtractor unit 2721L subtracts the pseudo echo signal generated by the pseudo echo
generation unit 2723L from the Lch sound pickup signal input from the A / D conversion unit
22L, and obtains the residual echo signal as a result of the adaptive filter learning unit 2722L
and The data is output to the array processing unit 28.
The adaptive filter learning unit 2722L uses the signal input from the monaural unit 24 via the
switching unit 271 as a reference signal, and the speaker 13L and the microphone based on the
reference signal and the residual echo signal output from the subtracting unit 2721L. Estimate
15-04-2019
8
and learn the transfer function between 21L and 21L. The pseudo echo generation unit 2723L
generates a pseudo echo signal by multiplying the signal input from the monaural unit 24 via the
switching unit 271 by the transfer function estimated and learned by the adaptive filter learning
unit 2722L, and subtraction. Output to the unit 2721L.
[0027]
The subtraction unit 2721R subtracts the pseudo echo signal generated by the pseudo echo
generation unit 2723R from the Rch sound pickup signal input from the A / D conversion unit
22R, and obtains the residual echo signal as a result of the adaptive filter learning unit 2722R
and The data is output to the array processing unit 28. The adaptive filter learning unit 2722R
uses the signal input from the monaural unit 24 via the switching unit 271 as a reference signal,
and the speaker 13R and the microphone based on the reference signal and the residual echo
signal output from the subtracting unit 2721R. Estimate and learn the transfer function between
21R and 21R. The pseudo echo generation unit 2723 R multiplies the signal input from the
monaural unit 24 via the switching unit 271 by the transfer function estimated and learned by
the adaptive filter learning unit 2722 R (in the input signal and the filter coefficient, A pseudo
echo signal is generated by performing convolution), and is output to the subtraction unit 2721R.
[0028]
Further, the second processing unit 273 has a monaural unit 2731, a subtraction unit 2732, an
adaptive filter learning unit 2733, a pseudo echo generation unit 2734, and subtraction units
2735L and 2735R.
[0029]
The monaural unit 2731 calculates an average value of the Lch and Rch collected signals input
from each of the A / D conversion unit 22L and the A / D conversion unit 22R, and outputs the
calculation result to the subtraction unit 2732. Do.
Here, the method of calculating the average value is not particularly limited. For example, the
linear sum of each signal value may be divided by two.
15-04-2019
9
[0030]
Subtraction unit 2732 subtracts the pseudo echo signal generated by pseudo echo generation
unit 2734 from the signal input from monaural processing unit 2731, and outputs the resulting
residual echo signal to adaptive filter learning unit 2733. The adaptive filter learning unit 2733
is based on the signal input from the monaural unit 24 via the switching unit 271 and the
residual echo signal output from the subtracting unit 2732 to the speaker group (speakers 13 L
and 13 R) and the microphone group. The transfer function with (microphones 21L and 21R) is
estimated and learned. The pseudo echo generation unit 2734 generates a pseudo echo signal
using the signal input from the monaural processing unit 24 through the switching unit 271 and
the transfer function estimated and learned by the adaptive filter learning unit 2733, and a
subtraction unit 2732 , Subtractors 2735L and 2735R.
[0031]
The subtraction unit 2735L subtracts the pseudo echo signal generated by the pseudo echo
generation unit 2734 from the signal input from the A / D conversion unit 22L, and outputs the
resulting residual echo signal to the array processing unit 28. The subtractor 2735R subtracts
the pseudo echo signal generated by the pseudo echo generator 2734 from the signal input from
the A / D converter 22R, and outputs a residual echo signal as a result to the array processor 28.
[0032]
As described above, when the directivity characteristic information indicates "area C", the echo
canceller unit 27 calculates the average of the Lch and Rch collected signals, and based on the
common component of both collected signals, the echo canceler 27 calculates an average. In
order to remove the component, the load relating to the removal of the echo component can be
reduced as compared with the case where the directional characteristic information is “area L”
or “area R”.
[0033]
Referring back to FIG. 1, the array processing unit 28 uses the directivity characteristic
information input from the sight line detection unit 26 to generate a signal from the sound
source direction (speaker) indicated by the directivity characteristic information from the signal
input from the echo canceller unit 27. The signal is selectively extracted and output to the noise
15-04-2019
10
canceller unit 29.
Specifically, the array processing unit 28 performs delay processing and the like on the collected
sound signals collected by the microphones 21L and 21R input through the echo canceller unit
27, and sets different directions to the axial direction of directivity. Generate a plurality of
collected sound beam signals. Then, from among the plurality of collected sound beam signals,
the collected sound beam signal corresponding to the direction indicated by the directional
characteristic information input from the visual line detection unit 26 is selected, and the echo is
removed from the selected collected sound beam signal. , To the noise canceller unit 29.
[0034]
The array processing unit 28 may be configured to selectively extract a signal from each
direction (areas L, R, and C) in which the speaker exists by tracking the sound source direction,
or a specific sound source direction The signal from the speaker present in (for example, area C)
may be selectively extracted. Moreover, the signal extraction method from a sound collection
beam signal and an echo removal method shall use a well-known technique.
[0035]
The noise canceller unit 29 is a functional unit that suppresses noise components included in the
signal processed by the array processing unit 28. The configuration of the noise canceller unit 29
will be described below with reference to FIG.
[0036]
FIG. 4 is a diagram schematically showing an example of the configuration of the noise canceller
unit 29. As shown in FIG. As shown in the figure, the noise canceller unit 29 includes a frequency
domain conversion unit 291, a noise section estimation unit 292, a noise characteristic
estimation unit 293, a suppression gain calculation unit 294, a noise suppression unit 295, and a
time domain conversion unit. And 296.
[0037]
15-04-2019
11
The frequency domain conversion unit 291 converts the signal input from the array processing
unit 28 from the time domain to the frequency domain, outputs the amplitude spectrum to the
noise suppression unit 295, and outputs the phase spectrum to the time domain conversion unit
296. .
[0038]
The noise section estimation unit 292 estimates that a section having the smallest power (for
example, a minute time centered on the time when the power is the minimum) among the signals
input from the array processing section 28 is a noise section, The signal (waveform) for the
section is output to the noise characteristic estimation unit 293.
[0039]
The noise characteristic estimation unit 293 sequentially estimates the characteristic value (noise
characteristic) of the ambient environment noise from the signal of the noise section inputted
from the noise section estimation unit 292 by using the maximum likelihood method or the like,
The signal is output to the suppression gain calculation unit 294.
[0040]
Also, the noise characteristic estimation unit 293 receives the directional characteristic
information output from the sight line detection unit 26, and when the direction indicated by the
directional characteristic information changes, shortens the time interval for sequentially
estimating and updating the characteristic value, or Or change it to increase the update amount.
After that, if the direction indicated by the directivity characteristic information is fixed for a
certain period of time, the time interval for sequentially estimating and updating the
characteristic value may be lengthened and returned to the original or the updated amount may
be reduced to the original. change.
As described above, by speeding up the tracking speed of the noise characteristic when switching
to a different area, it is possible to simulate the noise characteristic in the area after the switching
at high speed, and it is possible to prevent the noise suppression amount from being reduced.
Also, a plurality of noise characteristics can be stored according to each area, and one noise
15-04-2019
12
characteristic corresponding to the area indicated by the input directivity characteristic
information is read and updated, and the acoustic characteristic is stored in the suppression gain
calculation unit 294. It may be output to
[0041]
The suppression gain calculation unit 294 calculates the suppression gain for the sound
suppression processing according to the noise characteristic input from the noise characteristic
estimation unit 293.
[0042]
The noise suppression unit 295 performs suppression processing on the amplitude spectrum
input from the area conversion unit 291 using the suppression gain calculated by the
suppression gain calculation unit 294 to suppress the colored noise included in the amplitude
spectrum. Then, the amplitude spectrum after this suppression processing is output to the time
domain conversion unit 296.
[0043]
In addition, the noise suppression unit 295 performs suppression processing according to the
direction of the noise source specified from the directivity characteristic information input from
the sight line detection unit 26 and the noise level included in the amplitude spectrum input from
the array processing unit. Toggle on / off.
Specifically, when the array processing unit 28 is set to perform the sound source tracking, the
noise suppression unit 295 turns on the suppression processing when the sound source direction
indicated by the directivity characteristic information matches the direction of the noise source,
and the noise suppression unit 295 does not match. In the case, the suppression process is
turned off.
When the array processing unit 28 is set to extract a signal from a specific sound source
direction, the suppression process is turned on when the sound source direction indicated by the
directivity characteristic information matches the specific sound source direction, and the
mismatch occurs. Turn off suppression processing.
15-04-2019
13
[0044]
Here, FIG. 5 is a diagram for explaining the operation of the noise canceller unit 29 (noise
suppression unit 295). In the same drawing, as in FIG. 3, an example of the arrangement
relationship of the speakers 13L and 13R, the microphones 21L and 21R, and the camera unit
25 as viewed from the top is shown.
[0045]
As shown in FIG. 5, it is assumed that a speaker exists in area C, and noise source N moves in the
order of area R → area C → area L with the passage of time. At this time, if the array processing
unit 28 is set to perform sound source tracking, the noise suppressing unit 295 determines the
direction of the sound source direction indicated by the directivity characteristic information,
that is, the area C where the speaker SP is present, and the array processing unit 28. The
suppression processing is turned on when the occurrence direction of the noise source N
identified from the noise level included in the amplitude spectrum from H. 2 matches, and the
suppression processing is turned off in the case of noncoincidence.
[0046]
For example, in the case of FIG. 5, in the period from time T0 to T1 in which the noise source N
exists in the area R, since the area C in which the speaker SP exists and the direction of the noise
source N (area R) do not match Turn off processing. Also, during the period from time T1 to T2
in which the noise source N exists in the area C, since the area C in which the speaker SP exists
and the direction of the noise source N (area C) coincide, the suppression process is turned on. .
Further, in the period from time T2 to T3 in which the noise source N exists in the area L, the
area L in which the speaker SP exists and the direction (area R) of the noise source N do not
match, and the suppression processing is turned off.
[0047]
When the array processing unit 28 is set to extract a signal from a specific sound source
direction, the noise suppression unit 295 determines that the direction of the sound source
direction indicated by the directivity characteristic information matches the specific sound
15-04-2019
14
source direction. The suppression process is turned on, and in the case of a mismatch, the
suppression process is turned off. In this case, the noise level included in the amplitude spectrum
from the array processing unit 28 is in the state shown in FIG.
[0048]
Here, FIG. 6 is a diagram showing an example of the noise level included in the amplitude vector
when the array processing unit 28 extracts a signal from a specific sound source direction (area
C). In this case, as shown in the figure, the noise level when the sound source direction is area C
is more prominent than the noise levels in other areas. Therefore, the noise suppression unit 295
turns on the suppression process when the directivity characteristic information indicates the
area C, and turns off the suppression process when indicating other areas.
[0049]
In the present embodiment, although the noise suppression unit 295 controls on / off of the
suppression process, the present invention is not limited to this, and the suppression gain
calculation unit 294 may use the same switching condition as the noise suppression unit 295.
The suppression gain may be set to 0 when the suppression processing is turned off.
[0050]
Referring back to FIG. 4, the time domain conversion unit 296 converts the frequency domain
into the time domain based on the amplitude spectrum input from the noise suppression unit
295 and the phase spectrum input from the frequency domain conversion unit 291. A signal as a
conversion result is output to the echo reduction unit 31.
[0051]
Returning to FIG. 1, the delay unit 30 is a delay circuit or the like similar to the delay units 23L
and 23R, delays the signal input from the monaural unit 24 for a predetermined time, and
outputs the signal to the echo reduction unit 31.
Among the signals output from the monaural unit 24 by the delay processing in the delay unit
30, the signals input to the echo reduction unit 31 via the echo canceller unit 27, the array
processing unit 28, and the noise canceller unit 29, and the delay The signal input to the echo
15-04-2019
15
reduction unit 31 via the unit 30 is synchronized.
[0052]
The echo reduction unit 31 is a functional unit that removes an echo component included in the
signal processed by the noise canceller unit 29.
The configuration of the echo reduction unit 31 will be described below with reference to FIG.
[0053]
FIG. 7 is a view schematically showing an example of the configuration of the echo reduction unit
31. As shown in FIG. As shown in the figure, the echo reduction unit 31 includes a first frequency
domain conversion unit 311, a second frequency domain conversion unit 312, an echo interval
estimation unit 313, an acoustic characteristic estimation unit 314, and a suppression gain
calculation unit 315. , An echo suppression unit 316, and a time domain conversion unit 317.
[0054]
The first frequency domain conversion unit 311 converts the signal input from the delay unit 30
from the time domain to the frequency domain, and outputs the amplitude spectrum to the echo
section estimation unit 313, the acoustic characteristic estimation unit 314, and the suppression
gain calculation unit 315 Do. The second frequency domain conversion unit 312 converts the
signal input from the noise canceller unit 29 from the time domain to the frequency domain, and
outputs the amplitude spectrum to the echo section estimation unit 313, the acoustic
characteristic estimation unit 314, and the echo suppression unit 316. Also, the phase spectrum
is output to the time domain conversion unit 317.
[0055]
The echo section estimation unit 313 receives the signal from the noise canceller unit 29, the
signal from the delay unit 30, the amplitude spectrum from the first frequency domain
15-04-2019
16
conversion unit 311, and the amplitude spectrum from the second frequency domain conversion
unit 312 as input. Do. The echo segment estimation unit 313 determines an echo segment at
which an echo is estimated to be generated based on the difference value between the signal
from the noise canceller unit 29 and the signal from the delay unit 30, the difference value
between the amplitude spectra, and the like. The acoustic characteristic estimation unit 314 is
notified.
[0056]
The acoustic characteristic estimation unit 314 receives the amplitude spectrum from the first
frequency domain conversion unit 311, the amplitude spectrum from the second frequency
domain conversion unit 312, and the echo section notified from the echo section estimation unit
313. The acoustic characteristic estimation unit 314 estimates the acoustic characteristic of the
echo component from the difference between the two amplitude spectra in the echo section
notified from the echo section estimation unit 313, and outputs the estimated acoustic
characteristic to the suppression gain calculation unit 35.
[0057]
Further, the acoustic characteristic estimation unit 314 receives the directional characteristic
information output from the sight line detection unit 26, and when the direction indicated by the
directional characteristic information changes, shortens the time interval for sequentially
estimating and updating the acoustic characteristic, or Or change to increase the update amount.
After that, if the direction indicated by the directional characteristic information is fixed for a
certain period of time, the time interval for estimating and updating the acoustic characteristic is
lengthened and restored to the original or changed to the original by reducing the amount of
update. Do. As described above, by increasing the follow-up speed of the acoustic characteristic
when switching to a different area, it is possible to simulate the acoustic characteristic in the area
after the switching at high speed, and it is possible to prevent the echo suppression amount from
decreasing. In addition, a plurality of acoustic characteristics can be stored according to each
area, one acoustic characteristic corresponding to the area indicated by the input directivity
characteristic information is read and updated, and the acoustic characteristic is stored in the
suppression gain calculation unit 35. It may be output to
[0058]
The suppression gain calculation unit 315 calculates the suppression gain for the echo
15-04-2019
17
suppression processing according to the acoustic characteristic input from the acoustic
characteristic estimation unit 314, and outputs the suppression gain to the echo suppression unit
316.
[0059]
The echo suppression unit 316 performs the suppression process on the amplitude spectrum
input from the second frequency domain conversion unit 312 using the suppression gain
calculated by the suppression gain calculation unit 35, so that an echo included in the amplitude
spectrum is obtained. The component is suppressed, and the amplitude spectrum after this
suppression processing is output to the time domain conversion unit 296.
[0060]
Further, the echo suppression unit 316 switches on / off of the suppression process according to
the directivity characteristic information input from the sight line detection unit 26 and the
signal extraction setting in the noise canceller unit 29.
Specifically, when the array processing unit 28 is set to extract a signal from a specific sound
source direction (for example, the area C), the echo suppressing unit 316 indicates the sound
source direction indicated by the directivity characteristic information and the specific sound
source direction. When the two match, the suppression process is turned off, and when the two
do not match, the suppression process is turned on.
When the array processing unit 28 is set to perform sound source tracking, suppression
processing is performed on all sound source directions.
[0061]
Here, FIG. 8 is a diagram for explaining the operation of the echo reduction unit 31 (echo
suppression unit 316). In the same drawing, as in FIG. 3, an example of the arrangement
relationship of the speakers 13L and 13R, the microphones 21L and 21R, and the camera unit
25 as viewed from the top is shown.
15-04-2019
18
[0062]
As shown in FIG. 8, it is assumed that the speaker SP moves in the order of area R → area C →
area L as time passes. At this time, assuming that the array processing unit 28 is set to extract a
signal from the area C as a specific sound source direction, the echo level included in the
amplitude spectrum from the second frequency domain conversion unit 312 is shown in FIG. It
will be in the state shown.
[0063]
Here, FIG. 9 is a view showing an example of echo levels included in the amplitude vector when
the array processing unit 28 extracts a signal from a specific sound source direction (area C). As
shown in the figure, the echo level when the sound source direction is area C is reduced by the
processing in the array processing unit 28 as compared with the echo levels in other areas.
Therefore, the noise suppression unit 295 turns off the suppression process when the directivity
characteristic information indicates the area C, and turns on the suppression process when
indicating other areas.
[0064]
In the present embodiment, the on / off control of suppression processing is controlled by the
echo suppression unit 316. However, the present invention is not limited to this. Based on the
switching condition similar to that of the echo suppression unit 316 in the suppression gain
calculation unit 35. The suppression gain may be set to 0 when the suppression processing is
turned off.
[0065]
Then, the signal processing unit 20 outputs the signal subjected to the suppression processing by
the echo reduction unit 31 to an external device (not shown).
As described above, the signal processing unit 20 specifies the direction in which the speaker is
present with respect to the signal processing apparatus 100 as directivity characteristic
information, and removes and suppresses disturbance signals such as echo and noise according
to the direction indicated by the directivity characteristic information. Therefore, it is possible to
15-04-2019
19
clear the speech uttered by the speaker more efficiently.
[0066]
While the embodiments of the present invention have been described above, the above
embodiments are presented as examples, and are not intended to limit the scope of the invention.
The above embodiment can be implemented in other various forms, and various omissions,
replacements, changes, additions, and the like can be made without departing from the scope of
the invention. Moreover, while being included in the range and summary of invention, the said
embodiment and its deformation | transformation are included in the invention described in the
claim, and its equivalent range.
[0067]
For example, in the above embodiment, the direction in which the speaker is present is specified
by the functions of the camera unit 25 and the sight line detection unit 26. However, the present
invention is not limited thereto, and a collected sound signal collected by the microphones 21L
and 21R. From this, the direction in which the speaker is present may be identified. Hereinafter,
this configuration will be described as a first modification of the present embodiment.
[0068]
FIG. 10 is a view schematically showing a configuration of a signal processing unit 20A according
to the first modification of the embodiment. In addition, about the component similar to the said
embodiment, the same code | symbol is provided and description is abbreviate | omitted.
[0069]
As shown in the figure, the signal processing unit 20A includes the microphones 21L and 21R, A
/ D conversion units 22L and 22R, delay units 23L and 23R, monauralization unit 24, echo
canceller unit 27, and array processing. A unit 28, a noise canceller unit 29, a delay unit 30, an
echo reduction unit 31, and an arrival direction estimation unit 32 are provided.
[0070]
15-04-2019
20
The arrival direction estimation unit 32 receives the Lch sound collection signal and the Rch
sound collection signal output from the A / D conversion units 22L and 22R.
The arrival direction estimation unit 32 performs delay processing or the like on each of the
collected sound signals collected by the microphones 21L and 21R, and generates a plurality of
collected sound beam signals with the direction different from each other as the directivity axial
direction. Then, from among the plurality of collected sound beam signals, the collected sound
beam signal having the highest signal level is selected, and the direction corresponding to the
collected sound beam signal is specified as the speaker's presence direction, The characteristic
information is output to the monaural unit 24, the echo canceller unit 27, the array processing
unit 28, the noise canceller unit 29, and the echo reduction unit 31.
[0071]
As described above, by providing the arrival direction estimation unit 32 in place of the camera
unit 25 and the sight line detection unit 26 of the above embodiment, it is possible to specify the
speaker's presence direction from the sound collected by the microphones 21L and 21R. Thus,
the same effects as those of the above embodiment can be obtained, and the apparatus
configuration can be simplified.
[0072]
Further, in the above embodiment, in order to remove and suppress the disturbance signal
included in the sound collected by the microphones 21L and 21R, the echo canceller unit 27, the
array processing unit 28, the noise canceller unit 29, and the echo reduction unit 31 are
sequentially signaled. Although the configuration is such that the processing is performed, the
present invention is not limited to this, and even if the configuration of the signal processing unit
20 is modified by changing the order of performing signal processing or omitting particular
signal processing by performing function integration. Good.
Hereinafter, two modified examples of the configuration of the signal processing unit 20 will be
described as modified examples 2 and 3 of the configuration of the signal processing unit 20
described above.
15-04-2019
21
[0073]
FIG. 11 is a view schematically showing a configuration of a signal processing unit 20B according
to the second modification of the embodiment. In addition, about the component similar to the
said embodiment, the same code | symbol is provided and description is abbreviate | omitted.
[0074]
The signal processing unit 20B includes the microphones 21L and 21R, the A / D converters 22L
and 22R, the delay units 23L and 23R, the monaural unit 24, the camera unit 25, the sight line
detection unit 26, and the echo canceller unit 27. , An echo reduction unit 31 B, an array
processing unit 28, and a noise canceller unit 29. Here, the removal of the delay unit 30, and the
processing order of the echo reduction unit 31B, the array processing unit 28, and the noise
canceller unit 29 following the echo canceller unit 27 have the configuration of the signal
processing unit 20 shown in FIG. It is different.
[0075]
FIG. 12 is a view schematically showing a configuration of a signal processing unit 20C according
to the third modification of the embodiment. In addition, about the component similar to the said
embodiment, the same code | symbol is provided and description is abbreviate | omitted.
[0076]
The signal processing unit 20C includes the microphones 21L and 21R, the A / D converters 22L
and 22R, the delay units 23L and 23R, the monaural unit 24, the camera unit 25, the sight line
detection unit 26, and the echo reduction unit 31C. , An array processing unit 28, and a noise
canceller unit 29. Here, the removal of the delay unit 30 and the echo canceller unit 27 and the
processing order of the echo reduction unit 31 C, the array processing unit 28 and the noise
canceller unit 29 are different from the configuration of the signal processing unit 20 shown in
FIG. There is.
[0077]
15-04-2019
22
When the above-described configurations of the signal processing units 20B and 20C are
employed, the echo reduction units 31B and 31C receive two systems of Lch and Rch. Therefore,
instead of the configuration described in FIG. 7, the configuration shown in FIG. 13 is adopted.
[0078]
Here, FIG. 13 is a diagram schematically illustrating an example of the configuration of the echo
reduction units 31B and 31C according to the second and third modifications. As shown in the
figure, the echo reduction units 31B and 31C include a first frequency domain conversion unit
411, a first monaural conversion unit 412, a second frequency domain conversion unit 413, and
a third frequency domain conversion unit 414. The second monaural unit 415, the echo interval
estimation unit 416, the acoustic characteristic estimation unit 417, the suppression gain
calculation unit 418, the first echo suppression unit 419, the first time domain conversion unit
420, and the second echo suppression And a second time domain conversion unit 422.
[0079]
The first frequency domain conversion unit 411 converts the signal input from the monaural
processing unit 24 from the time domain to the frequency domain, and transmits the amplitude
spectrum to the echo section estimation unit 416, the acoustic characteristic estimation unit 417,
and the suppression gain calculation unit 418. Output.
[0080]
The first monaural unit 412 calculates an average value of the Lch and Rch collected signals
input from each of the A / D conversion unit 22L and the A / D conversion unit 22R, and
estimates this calculation result as an echo section Output to the part 416.
[0081]
The second frequency domain conversion unit 413 converts the Lch collected signal input from
the A / D conversion unit 22L from the time domain to the frequency domain, and the amplitude
spectrum thereof is converted to the second monaural unit 415 and the first echo suppression
unit 419. And the phase spectrum to the first time domain transform unit 420.
15-04-2019
23
The third frequency domain converter 414 converts the Rch collected signal input from the A / D
converter 22R from the time domain to the frequency domain, and the amplitude spectrum
thereof is converted to the second monaural unit 415 and the second echo suppressor 421. And
the phase spectrum to the second time domain transformation unit 422.
[0082]
The second monaural unit 415 calculates the average value of the amplitude spectrum input
from each of the second frequency domain conversion unit 413 and the third frequency domain
conversion unit 414, and the calculation result is used as the echo segment estimation unit 416
and the acoustic characteristic. It is output to the estimation unit 417.
[0083]
The echo section estimation unit 416 is configured to receive the signal from the monaural unit
24, the amplitude spectrum from the first frequency domain conversion unit 411, the signal from
the first monaural unit 412, and the amplitude spectrum from the second monaural unit 415.
And as input.
The echo segment estimation unit 416 has the same function as the echo segment estimation
unit 313, and based on the difference value between the signal from the first monaural
conversion unit 412 and the signal from the monaural conversion unit 24, the difference value
between the amplitude spectra, etc. The echo characteristic estimation unit 417 is notified of an
echo section estimated to have an echo.
[0084]
The acoustic characteristic estimation unit 417 receives the amplitude spectrum from the first
frequency domain conversion unit 411, the amplitude spectrum from the second monaural
conversion unit 415, and the echo section notified from the echo section estimation unit 416.
The acoustic characteristic estimation unit 417 estimates the acoustic characteristic of the echo
component from the difference between both amplitude spectra in the echo section notified from
the echo section estimation unit 416 using the same function as the acoustic characteristic
15-04-2019
24
estimation section 314 and estimates the acoustic characteristic. Are output to the suppression
gain calculation unit 418.
[0085]
In addition, the acoustic characteristic estimation unit 417 receives the directional characteristic
information output from the sight line detection unit 26, and changes the time interval for
estimating the acoustic characteristic according to the direction indicated by the directional
characteristic information. Specifically, the acoustic characteristic estimation unit 417 shortens
the time interval when the directional characteristic information indicates “area C” by making
the time interval shorter than the time interval when “area L” or “area R”. The estimated
velocity of the acoustic characteristic in the case of being present in “area C” is increased as
compared with the case of being present in other areas. In the present embodiment, the noise
characteristics are sequentially estimated. However, the present invention is not limited thereto.
For example, acoustic characteristics corresponding to each area are held in advance, and the
direction corresponding to the input directivity The acoustic characteristics may be output to the
suppression gain calculation unit 418.
[0086]
The suppression gain calculation unit 418 calculates the suppression gain for the echo
suppression processing according to the acoustic characteristic input from the acoustic
characteristic estimation unit 417, and outputs it to the first echo suppression unit 419 and the
second echo suppression unit 421. .
[0087]
The first echo suppression unit 419 performs the suppression process on the amplitude
spectrum input from the second frequency domain conversion unit 413 using the suppression
gain calculated by the suppression gain calculation unit 418, thereby being included in the
amplitude spectrum. The echo component is suppressed, and the amplitude spectrum after this
suppression processing is output to the first time domain transform unit 420.
As in the case of the echo suppression unit 316 described above, processing according to the
directivity characteristic information may be performed.
15-04-2019
25
[0088]
The first time domain transform unit 420 transforms from the frequency domain to the time
domain based on the amplitude spectrum input from the first echo suppression unit 419 and the
phase spectrum input from the second frequency domain conversion unit 413, A signal resulting
from this conversion is output to the array processing unit 28 as an Lch collected signal.
[0089]
The second echo suppression unit 421 performs the suppression process on the amplitude
spectrum input from the third frequency domain conversion unit 414 using the suppression gain
calculated by the suppression gain calculation unit 418, thereby being included in the amplitude
spectrum. The echo component is suppressed, and the amplitude spectrum after this suppression
processing is output to the second time domain conversion unit 422.
As in the case of the echo suppression unit 316 described above, processing according to the
directivity characteristic information may be performed.
[0090]
The second time domain conversion unit 422 converts the frequency domain to the time domain
based on the amplitude spectrum input from the second echo suppression unit 421 and the
phase spectrum input from the third frequency domain conversion unit 414, A signal resulting
from this conversion is output to the array processing unit 28 as an Rch collected signal.
[0091]
The signal processing units 20B and 20C can be realized by using the echo reduction units 31B
and 31C configured as described above.
Further, in the echo reduction units 31B and 31C configured as described above, the average of
the Lch collected signal and the Rch collected signal is calculated, and the echo component is
suppressed based on the common component of the both collected signals. It is possible to
reduce the load involved in suppression.
15-04-2019
26
[0092]
Although the other configuration examples of the signal processing unit 20 have been described
using the above-described modifications 2 and 3, the echo canceller unit 27, the echo reduction
unit 31B (31C), and the array processing unit 28 are described as further other configurations. It
is also possible to use a configuration in which disturbance signals are removed / suppressed in
that order using the three processing units in the above, or using two processing units of the
echo reduction unit 31B (31C) and the array processing unit 28 The signal may be removed and
suppressed.
[0093]
Further, although two speakers (speakers 13L and 13R) are used in the above embodiment, the
present invention is not limited to this, and three or more speakers may be used.
Further, although two microphones (microphones 21L and 21R) are used in the above
embodiment, the present invention is not limited to this, and three or more microphones may be
used.
[0094]
Further, the application destination of the signal processing apparatus of the above embodiment
is not particularly limited, and it can be applied as a pre-processing apparatus such as voice
recognition in various devices such as mobile phones, notebook PCs, tablet terminals and the like.
[0095]
Reference Signs List 100 signal processing device 10 acoustic output unit 11L, 11R volume unit
12L, 12R D / A conversion unit 13L, 13R speaker 14L, 14R input terminal 20, 20A, 20B signal
processing unit 21L, 21R microphone 22L, 22R A / D conversion unit 23L, 23R delay unit 24
monauralization unit 25 camera unit 26 line-of-sight detection unit 27 echo canceller unit 271
switching unit 272 first processing unit 2721L, 2721R subtraction unit 2722L, 2722R adaptive
filter learning unit 2723L, 2723R pseudo echo generation unit 273 second Processing unit 2731
monaural conversion unit 2732 subtraction unit 2733 adaptive filter learning unit 2734 pseudo
echo generation unit 28 array processing unit 29 noise canceller unit 291 area conversion unit
292 noise section estimation unit 293 noise characteristic estimation unit 29 DESCRIPTION OF
SYMBOLS 4 suppression gain calculation part 295 noise suppression part 296 time domain
15-04-2019
27
conversion part 30 delay part 31, 31B, 31C echo reduction part 311 1st frequency domain
conversion part 312 2nd frequency domain conversion part 313 echo area estimation part 314
acoustic characteristic estimation part 315 Suppression gain calculation unit 316 echo
suppression unit 317 time domain conversion unit 32 arrival direction estimation unit 411 first
frequency domain conversion unit 412 first monaural conversion unit 413 second frequency
domain conversion unit 414 third frequency domain conversion unit 415 second monaural
conversion Unit 416 Echo section estimation unit 417 Acoustic characteristic estimation unit
418 Suppression gain calculation unit 419 First echo suppression unit 420 First time domain
conversion unit 421 Second echo suppression unit 422 Second time domain conversion unit
15-04-2019
28
Документ
Категория
Без категории
Просмотров
0
Размер файла
42 Кб
Теги
description, jp2012216998
1/--страниц
Пожаловаться на содержимое документа