close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2015523609

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2015523609
Abstract: In audio reverberation reduction based on dual microphones, it is not necessary to
accurately estimate the direction of arrival of direct sound, and it is not required that the
microphones have high coincidence. A transfer function h (t) from an auxiliary microphone to a
main microphone is calculated by a main microphone input signal x2 (t) and an auxiliary
microphone input signal x1 (t), and a tailing portion hr (t) of h (t) is calculated. ) To determine the
strength of the reverberation by h (t), calculate the regulator β of the gain function, perform
convolution on x1 (t) and hr (t), and determine the late reverberation of x2 (t) Obtain an
estimated signal, calculate the gain function from the frequency spectrum of x 2 (t), and the
frequency spectrum of β and, multiply the frequency spectrum of x 2 (t) by the gain function,
and apply dereverberation from x 2 (t) A frequency spectrum is obtained, and frequency / time
conversion is performed to obtain a late reverberant time domain signal from x 2 (t). Thus, the
late reverberation is removed from the main microphone input signal. [Selected figure] Figure 3
Voice reverberation reduction method and apparatus based on dual microphone
[0001]
The present invention relates to the field of speech enhancement, and more particularly, to a
method and apparatus for reducing speech reverberation based on dual microphones.
[0002]
Due to the reflection of the sound signal on the sound of the hard interface such as the wall or
the ground during the propagation process in the room, the sound reaching the microphone is
11-04-2019
1
added one or more times to the direct sound transmitted directly from the sound source. The
acoustic signal transmitted through reflection is also included, and these non-direct sounds
constitute a reverberation signal.
An acoustic signal that has undergone one or a small number of reflections may be referred to as
an early reflection signal, the early reflection signal may constitute an early reverberation signal,
and the early reverberation signal may exert an emphasis on speech. An acoustic signal that has
undergone multiple reflections is called a late reflection signal, and a late reflection signal
constitutes a late reverberation signal, and if the late reverberation is strong, speech intelligibility
will be reduced.
[0003]
In some hands free voice communications, the caller is far away from the microphone, and the
speech intelligibility is degraded by the reverberation in the room, resulting in degraded call
quality. Therefore, techniques for reducing reverberation and enhancing speech intelligibility are
needed. The microphone received signal includes direct sound and reverberation signals, and as
can be seen from the above, reverberation is further divided into early reverberation and late
reverberation. Among them, it is mainly late reverberation that reduces speech intelligibility, and
early reverberation usually exerts an emphasis on speech. Thus, the key to enhancing clarity is to
reduce the late reverberation signal.
[0004]
Among the various dereverberation techniques, dual microphone based dereverberation methods
based on spectral subtraction have gained wide attention. In the conventional dual microphone
based spectral subtraction dereverberation method, the adaptive beamforming (GSC) structure
provides two path signals, the first path signal is the output of the delay / sum beamformer, and
the second The signal of the passage is the output of the blocking matrix. The energy envelopes
of the two pass signals estimate the reverberation of the first pass signal by the adaptive filter
and then remove the reverberation by spectral subtraction. This method has the following
disadvantages. 1) The early reverberation is removed and the sound after processing becomes
thin. 2) There is a possibility that the reverberation is weak and the voice quality may be
damaged when the speech intelligibility is originally high by using the same spectral subtraction
processing even in the case of different reverberations without judging the strength of the
reverberation . 3) Since it is necessary to accurately estimate the direction of arrival of the direct
11-04-2019
2
sound and to separate the direct sound, the microphone is required to have a high degree of
match, and there are severe limitations on acoustic design.
[0005]
SUMMARY OF THE INVENTION In view of the above problems, the present invention provides a
dual microphone based audio reverberation reduction method and apparatus according to which
the above problems are eliminated or at least partially eliminated.
[0006]
According to one aspect of the present invention, the main microphone input signal and the
auxiliary microphone input signal are received, and the transfer function h (t) from the auxiliary
microphone to the main microphone is calculated for each frame by the main microphone input
signal and the auxiliary microphone input signal. Processing for obtaining the tailing part hr (t) of
the transfer function h (t) and determining the strength of reverberation by the transfer function
h (t) to calculate the adjustment factor β of the gain function And processing for obtaining a late
reverberation estimation signal of a main microphone input signal by performing convolution to
hr (t) using the auxiliary microphone input signal, and a frequency from a time domain with
respect to a late reverberation estimation signal of the main microphone input signal. The
conversion to the domain is performed to obtain the late reverberation spectrum of the main
microphone input signal, and from the time domain to the main microphone input signal,
Processing to obtain the frequency spectrum of the main microphone input signal by conversion
to the number domain, the frequency function of the main microphone input signal, the
adjustment factor β of the gain function, and the late reverberation spectrum of the main
microphone input signal Calculating the frequency spectrum of the main microphone input
signal by multiplying the frequency function of the main microphone input signal to obtain a
frequency spectrum of the main microphone input signal after dereverberation; and calculating
the frequency spectrum of the main microphone input signal after dereverberation A process of
converting the frequency domain to the time domain to obtain a time domain signal after
dereverberation of the main microphone input signal, and superimposing the time domain signal
after dereverberation of the main microphone input signal for each frame After addition, the
continuous signal after dereverberation of the main microphone input signal is output. An audio
reverberation reduction method based on dual microphones is provided which performs the
processing of
[0007]
According to another aspect of the present invention, the signals received by the main and
auxiliary microphones are processed on a frame-by-frame basis, and a reverberation spectrum
estimation unit and a spectrum subtraction unit are included, the reverberation spectrum
11-04-2019
3
estimation unit comprising A microphone input signal and an auxiliary microphone input signal
are received, and a transfer function h (t) from the auxiliary microphone to the main microphone
is calculated by the main microphone input signal and the auxiliary microphone input signal, and
the transfer function h (t) is calculated. Of the tailing part hr (t), the reverberation strength is
determined by the transfer function h (t), the gain function adjustment factor .beta. Is calculated
and output to the spectral subtraction unit, and the auxiliary microphone input signal is Then,
convolution is performed to hr (t) to obtain a late reverberation estimation signal of the main
microphone input signal, and the main microphone input signal is obtained. The second
reverberation estimation signal is converted from the time domain to the frequency domain to
obtain the second reverberation spectrum of the main microphone input signal and then output
to the spectrum subtraction unit, wherein the spectrum subtraction unit is the main microphone
Receives the input signal, the adjustment factor β of the gain function output from the
reverberation spectrum estimation unit, and the late reverberation spectrum of the main
microphone input signal, converts the main microphone input signal from the time domain to the
frequency domain, and performs main The frequency spectrum of the microphone input signal is
obtained, the gain function is calculated from the frequency spectrum of the main microphone
input signal, the adjustment factor β of the gain function and the late reverberation spectrum of
the main microphone input signal, and the frequency spectrum of the main microphone input
signal is calculated. Multiply the gain function to The frequency spectrum of the main
microphone input signal after dereverberation is obtained, the frequency spectrum of the main
microphone input signal after dereverberation is converted from the frequency domain to the
time domain, and the time after the dereverberation of the main microphone input signal is
obtained. A dual used for obtaining a domain signal, superimposing and adding the time domain
signal after dereverberation of the main microphone input signal for each frame, and outputting
a continuous signal after dereverberation of the main microphone input signal A microphone
based audio reverberation reduction device is provided.
[0008]
As can be seen from the above, the present invention calculates the transfer function h (t) from
the auxiliary microphone to the main microphone by the main microphone input signal and the
auxiliary microphone input signal, and the tailing part hr (t) of the transfer function h (t) To
determine the strength of the reverberation by the transfer function h (t), calculate the control
function β of the gain function, and convolute to hr (t) using the auxiliary microphone input
signal to calculate the main microphone The late reverberation estimation signal of the input
signal is obtained, and the gain function is calculated from the frequency spectrum of the main
microphone input signal, the adjustment factor β of the gain function and the late reverberation
spectrum of the main microphone input signal, and the gain is converted to the frequency
spectrum of the main microphone input signal The frequency spectrum of the main microphone
11-04-2019
4
input signal after dereverberation is obtained by multiplying the function, that is, the main
microphone input is obtained by the spectral subtraction method. Since the late reverberation
estimation spectrum of the main microphone input signal is subtracted from the frequency
spectrum of the signal, the late reverberation can be effectively removed from the input signal of
the main microphone and the early reverberation can be reserved, and the processed sound
becomes thin. Voice quality is improved.
At the same time, in the process of estimating the late reverberation, the intensity of the spectral
subtraction is adjusted according to the reverberation intensity so that the spectral subtraction is
reduced or not when the reverberation is weak, so that the reverberation is weak and the speech
intelligibility is high in the first place It is guaranteed not to damage the voice in case the voice
quality is protected.
And in such a means, since it is not necessary to estimate the direction of arrival of a direct
sound correctly, it is not required that the microphones have high consistency, and there is no
strict limitation on acoustic design.
[0009]
5 is a graph of a transfer function from an excitation signal to a microphone input signal as listed
in an embodiment of the present invention.
It is a graph of the transfer function from the auxiliary | assistant microphone mentioned to the
Example of this invention to the main microphone. FIG. 6 is a schematic diagram of a dual
microphone based audio reverberation reduction method flow according to one embodiment of
the present invention; FIG. 7 is a schematic diagram of an overall flow of a dual microphone
based audio reverberation reduction method according to another embodiment of the present
invention; It is a graph of the transfer function from an auxiliary microphone to the main
microphone in case the distance from the sound source to the main microphone in the Example
of this invention is 0.5 m. It is a graph of the transfer function from an auxiliary microphone to
the main microphone in case the distance from the sound source to the main microphone in the
Example of this invention is 1 m. It is a graph of the transfer function from an auxiliary
microphone to the main microphone in case the distance from the sound source to the main
microphone in the Example of this invention is 2 m. It is a graph of the transfer function from an
auxiliary microphone to the main microphone in case the distance from the sound source to the
main microphone in the Example of this invention is 4 m. It is a graph of the amplitude frequency
11-04-2019
5
characteristic of a frequency compensation filter when the space | interval between the main
microphone and auxiliary | assistant microphone in the Example of this invention is 6 cm. It is a
graph of the amplitude frequency characteristic of a frequency compensation filter when the
space | interval between the main microphone and auxiliary | assistant microphone in the
Example of this invention is 18 cm. It is a figure of the time domain of the main microphone
input signal in the Example of this invention. It is a figure of the time domain after
dereverberation with respect to the main microphone in the Example of this invention. It is an
audio spectrogram of the main microphone input signal in the example of the present invention.
It is an audio | voice spectrogram after dereverberation with respect to the main microphone in
the Example of this invention. It is a whole block diagram of the audio | voice reverberation
reduction apparatus based on the dual microphone in the Example of this invention. It is a
detailed structure of the sound reverberation reduction apparatus based on the dual microphone
in one preferable embodiment of this invention, and its input-output schematic diagram.
[0010]
First, in the description of the present application, "microphone" will be abbreviated as "mike" for
the sake of simplicity of the application. The analysis against the prior art requires accurate and
stable late reverberation estimation and reverberation strength judgment as it is necessary to
remove the late reverberation and at the same time protect the direct sound and the early
reverberation in order to reduce the reverberation better. Be done.
[0011]
The present invention presents a means of dereverberation based on dual microphones (main
microphone and auxiliary microphone), making full use of the approximate relationship between
reverberation and transfer function of dual microphone space, dual microphone space According
to the spectral subtraction module, the intelligibility is satisfied even in various reverberant
environments and near-optimal audio quality is obtained by estimating the reverberation and
determining the reverberation strength by the transfer function of In addition, since the means in
the present invention does not have to separate the direct sound and do not need to estimate the
direction of arrival, the requirement for acoustic design is alleviated without requiring the
consistency of the microphone.
[0012]
11-04-2019
6
The basic principle of the present invention is as follows. Direct sound and early reverberation
can be well reserved in spectral subtraction, since the late reverberation is estimated by the tail
of the transfer function between dual microphones. Then, in the process of estimating the late
reverberation, furthermore, the reverberation degree of the room is estimated by the energy
difference between the head and the tail of the transfer function between the dual microphones,
and the intensity of spectral subtraction is adjusted to adjust the spectrum when the
reverberation is weak. Protect voice quality by reducing or not reducing subtractions.
[0013]
In order to clarify the technical means of the present invention, the technical principle of the
present invention will be analyzed and described below. Early reverberant signals can act as an
emphasis on speech, but late reverberation will reduce speech intelligibility. FIG. 1 is a graph of
the transfer function from the excitation signal to the microphone input signal given in the
example of the present invention. Referring to FIG. 1, in the transfer function from the excitation
signal to the microphone input signal, the location where the peak is the maximum corresponds
to the direct sound, and usually, a point away from the maximum peak, the early reflection and
the late As a boundary point with reflection, a portion from the maximum peak to the boundary
point corresponds to early reverberation, and a portion after the boundary point corresponds to
late reverberation. In FIG. 1, the boundary point is 50 ms.
[0014]
Excitation signal s (t), microphone input signal x (t), transfer function from excitation signal to
microphone input signal tf (t), transfer function of part corresponding to direct sound and early
reverberation tfd (t) If the transfer function of the part corresponding to the late reverberation is
denoted as tfr (t), the microphone input signal is expressed as the convolution of the excitation
signal and the transfer function as x (t) = s (t) * tf (t) The direct sound and early reverberation
component of the microphone input signal can be expressed as xd (t) = s (t) * tfd (t), and the late
reverberation component of the microphone input signal is xr (t) = s (T) can be expressed as * tfr
(t). Therefore, the microphone input signal can also be expressed as x (t) = s (t) * tf (t) = s (t) * (tfd
(t) + tfr (t)) = xd (t) + xr (t) .
[0015]
11-04-2019
7
Speech clarity may be represented by C50. The formula is as follows. (1) w (t) is a transfer
function from the excitation signal to the microphone input signal. 0 to 50 ms correspond to
direct sound and early reverberation, and 50 ms or later correspond to late reverberation. The
stronger the reverberation, the smaller the value of C50. Since the improvement of C50 before
and after dereverberation can reflect the dereverberation effect, C50 may be used as an objective
evaluation index of dereverberation.
[0016]
In the present invention, the reverberation estimation principle based on dual microphones (main
microphone and auxiliary microphone) is as follows. As shown in FIG. 2, the input signal of the
main microphone is denoted by x 2 (t), the input signal of the auxiliary microphone is denoted by
x 1 (t), and the transfer function from the auxiliary microphone to the main microphone is
denoted by h (t). FIG. 2 is a graph of the transfer function h (t) from the auxiliary microphone to
the main microphone mentioned in the embodiment of the present invention.
[0017]
The input signal x2 (t) of the main microphone is equal to the convolution of the input signal x1
(t) of the auxiliary microphone and the transfer function h (t). (2) h (t) can be divided into two
parts: head and tail. (3) Among them, hd (t) represents the head of h (t) and hr (t) represents the
tail of h (t). Since the tailing part hr (t) of h (t) reflects multiple reflections of the signal in space,
the tailing part hr (t) of h (t) and the auxiliary microphone input signal x1 (t) The convolution
signal of is close to the later reverberation component of the main microphone, and is an
estimated signal of the later reverberation component of the main microphone. Pick one point in
h (t) and set the previous value of the boundary point of h (t) to 0 as the boundary point of hd (t)
and hr (t) to get hr (t) Be A distance range from the boundary point to the maximum peak of h (t)
can be set to 30 ms to 80 ms (experimental value). According to experience, when the maximum
peak from the boundary point to h (t) is 50 ms or more, the direct reflection and early reflection
components of the main microphone's late reverberation signal do not remain at all, thus
reducing the damage to voice. In order to obtain, in the embodiment of the present invention, the
explanation will be made by taking 50 ms as the boundary point as an example.
[0018]
11-04-2019
8
In order to further clarify the object, technical means and advantages of the present invention, a
more detailed description of embodiments of the present invention will be given below with
reference to the drawings. FIG. 3 is a schematic diagram of the flow of the audio reverberation
reduction method based on dual microphones according to one embodiment of the present
invention. As shown in FIG. 3, the method mainly includes a reverberation spectrum estimation
part and a spectrum subtraction part, and specifically, the following processing is performed for
each frame.
[0019]
Step 1.1 Receive main microphone input signal x2 (t) and auxiliary microphone input signal x1
(t), and transfer function h (t) from auxiliary microphone to main microphone by main
microphone input signal and auxiliary microphone input signal calculate. Step 1.2 Obtain the
tailing part hr (t) of the transfer function h (t). Step 1.3 Then, the strength of reverberation is
judged by the transfer function h (t) to calculate the adjustment factor β of the gain function.
Step 1.4 Convolute to hr (t) using the auxiliary microphone input signal to obtain the late
reverberation estimation signal of the main microphone input signal. Step 1.5 The late
reverberation estimation signal of the main microphone input signal is converted from the time
domain to the frequency domain to obtain the late reverberation spectrum of the main
microphone input signal.
[0020]
Step 2.1 The main microphone input signal x2 (t) is converted from the time domain to the
frequency domain to obtain the frequency spectrum X2 of the main microphone input signal.
Step 2.2 The gain function G is calculated from the frequency spectrum X2 of the main
microphone input signal, the adjustment factor β of the gain function, and the later
reverberation spectrum of the main microphone input signal. Step 2.3 The frequency spectrum
X2 of the main microphone input signal is multiplied by the gain function G to obtain a
frequency spectrum D after dereverberation of the main microphone input signal. Step 2.4 The
dereverberated frequency spectrum D of the main microphone input signal is converted from the
frequency domain to the time domain to obtain a time domain signal d (t) after dereverberation
of the main microphone input signal. Step 2.5 The reverberant-removed time domain signal of
the main microphone input signal is superimposed and added frame by frame, and then the
continuous signal xd (t) of the main microphone input signal after reverberation reduction is
output.
11-04-2019
9
[0021]
In the method shown in FIG. 3, the second reverberation estimation signal of the main
microphone input signal is obtained by performing convolution to hr (t) using the auxiliary
microphone input signal, and then from the frequency spectrum of the main microphone input
signal by spectral subtraction. Since the late reverberation estimation spectrum of the main
microphone input signal is subtracted, the late reverberation can be effectively removed from the
input signal of the main microphone, and the early reverberation can be reserved, so that the
voice quality is improved. Further, in the method shown in FIG. 3, the reverberation is weak by
adjusting the intensity of the spectral subtraction according to the reverberation intensity to the
process of estimating the late reverberation, and reducing or not reducing the spectral
subtraction when the reverberation is weak. Voice quality is protected by ensuring that voice
quality is not impaired if speech intelligibility is high in the first place. And in such a method,
since it is not necessary to estimate the direction of arrival of the direct sound correctly, the
microphone is not required to have high matching, and there is no strict limitation on the
acoustic design.
[0022]
In one embodiment of the present invention, in addition to the method shown in FIG. 3, the late
reverberation estimation signal of the main microphone input signal is compared with the true
late reverberation component of the main microphone input signal, Low-pass filters are provided
according to the different microphone spacings in order to additionally take into account the
problem of lack of estimation in parts, and corresponding frequency compensation is performed
on the late reverberation estimation signal. Specifically, reference is made to the embodiment
shown in FIG.
[0023]
FIG. 4 is a schematic diagram of the overall flow of a dual microphone based audio reverberation
reduction method according to another embodiment of the present invention. As shown in FIG. 4,
the input of the entire system is the auxiliary microphone input signal x1 (t) and the main
microphone input signal x2 (t), and the output is the reverberation reduced signal xd (t). Broadly
speaking, it includes two parts, a reverberation spectrum estimation process and a spectrum
11-04-2019
10
subtraction process. Compared with the flow of the method shown in FIG. 3, FIG. 4 adds the step
of performing frequency compensation on the late reverberation estimation signal (in FIG. 4, the
step of performing frequency compensation on the late reverberation estimation signal is a step)
1.45, and the time domain / frequency domain conversion step is described as step 1.5 as in FIG.
Hereinafter, the method will be described in detail with reference to FIG.
[0024]
1. Reverberation spectrum estimation Input: Input signal x1 (t) of auxiliary microphone, input
signal x2 (t) of main microphone. Output: Gain function adjustment factor β (one input in the
spectral subtraction process), late reverberation spectrum of the main microphone input signal
(one input in the spectral subtraction process). The reverberation spectrum estimation includes
six steps: step 1.1, step 1.2, step 1.3, step 1.4, step 1.45, step 1.5.
[0025]
2. Spectrum subtraction Input: Main microphone input signal x 2 (t), gain function adjustment
factor β (output in reverberation spectrum estimation process), late reverberation spectrum of
main microphone input signal (output in reverberation spectrum estimation process). Output:
Signal xd (t) after reduction of reverberation of the main microphone input signal (also an output
of the entire system). The spectral subtraction process comprises five steps: step 2.1, step 2.2,
step 2.3, step 2.4, step 2.5.
[0026]
Hereinafter, each step in the reverberation spectrum estimation process and the spectrum
subtraction process and the relationship between them will be described in detail. 1.
Reverberation Spectrum Estimation Process Step 1.1 The transfer function h (t) from the
auxiliary microphone to the main microphone is calculated. Input of step 1.1: input signal x1 (t)
of auxiliary microphone and input signal x2 (t) of main microphone. Output of step 1.1: transfer
function h (t) from auxiliary microphone to main microphone (input of step 1.2).
[0027]
11-04-2019
11
In one embodiment of the present invention, a transfer function is generated using the cross
power spectrum Px2x1 of the auxiliary microphone input signal x1 (t) and the main microphone
input signal x2 (t) and the power spectrum Px1x1 of the auxiliary microphone input signal x1 (t).
Calculate H. (4) Inverse Fourier transform is performed on the transfer function H in the
frequency domain to obtain the transfer function h (t) in the time domain. In other embodiments
of the present invention, the calculation of h (t) may use different methods, such as a method of
adaptive filtering, which will not be described in detail here.
[0028]
Step 1.2 Find tailing part hr (t) of transfer function h (t). Input of step 1.2: Transfer function h (t)
from auxiliary microphone to main microphone (output of step 1.1). Output of step 1.2: Tailing
part hr (t) of transfer function from auxiliary microphone to main microphone (input to step 1.4).
[0029]
In the embodiment of the present invention, taking the boundary point between the early
reverberation and the late reverberation on the time axis of the transfer function h (t), the
previous value of the boundary point in the transfer function h (t) is set to 0. Thus, the tailing
part hr (t) of the transfer function h (t) is obtained. In one preferred embodiment of the invention,
one point in the transfer function h (t) is picked and the distance of the largest peak to h (t) at
this point is 50 ms, the point at h (t) The previous value of is set to 0 and denoted as tailing part
hr (t).
[0030]
Step 1.3 Judge the strength of the reverberation by the transfer function h (t) from the auxiliary
microphone to the main microphone, and find the adjustment factor β of the gain function.
Input of step 1.3: transfer function h (t) from auxiliary microphone to main microphone (output
of step 1.1). Output of step 1.3: Gain function adjustment factor β (as one input in the spectral
subtraction process).
11-04-2019
12
[0031]
In order to reduce the damage to the voice due to the dereverberation when the reverberation is
weak, in this step 1.3, the adjustment factor β of the gain function is calculated by judging the
strength of the reverberation. In the embodiment of the present invention, the ratio of the energy
of the head of the transfer function from the auxiliary microphone to the main microphone and
the energy of the tailing part is taken logarithmically and denoted as ρ. (5) Among them, h (t) is
a transfer function from the auxiliary microphone to the main microphone, and T is a designated
boundary point on the time axis of h (t). The boundary point T is not necessarily the boundary
point between the early reverberation and the late reverberation, but the boundary point T
always includes direct sound and may further include part or all of the early reverberation.
[0032]
FIG. 5A is a graph of the transfer function from the auxiliary microphone to the main microphone
when the distance from the sound source to the main microphone according to the embodiment
of the present invention is 0.5 m. When the distance from the sound source to the main
microphone is L = 0.5 m, the numerical range of T is 20 ms to 50 ms, where T is 50 ms (that is,
the boundary point T is 50 ms away from the maximum peak of h (t)) When the time point is
taken, the speech intelligibility index is C50 = 12.3 dB, ρ = 9.4 dB.
[0033]
FIG. 5B is a graph of the transfer function from the auxiliary microphone to the main microphone
when the distance from the sound source to the main microphone in the embodiment of the
present invention is 1 m. When the distance from the sound source to the main microphone is L
= 1 m, the numerical range of T is 20 ms to 50 ms, where T is 50 ms (that is, the boundary point
T is 50 ms away from the maximum peak of h (t)) When taking), the speech intelligibility index is
C50 = 8.1 dB, ρ = 6.0 dB.
[0034]
FIG. 5C is a graph of the transfer function from the auxiliary microphone to the main microphone
when the distance from the sound source to the main microphone in the embodiment of the
11-04-2019
13
present invention is 2 m. When the distance from the sound source to the main microphone is L
= 2 m, the numerical range of T is 20 ms to 50 ms, where T is 50 ms (ie, the boundary point T is
50 ms away from the maximum peak of h (t)) When taking), the speech intelligibility index is C50
= 5.4 dB, ρ = 3.7 dB.
[0035]
FIG. 5D is a graph of the transfer function from the auxiliary microphone to the main microphone
when the distance from the sound source to the main microphone in the embodiment of the
present invention is 4 m. When the distance from the sound source to the main microphone is L
= 4 m, the numerical range of T is 20 ms to 50 ms, where T is 50 ms (that is, the boundary point
T is 50 ms away from the maximum peak of h (t)) When taking), the speech intelligibility index is
C50 = 4.5 dB, ρ = 2.2 dB.
[0036]
The further the sound source is from the microphone, the stronger the reverberation. As can be
seen from FIGS. 5A to 5D, as the reverberation becomes stronger, the energy of the head of the
transfer function from the auxiliary microphone to the main microphone becomes lower, the
energy of the tailing part becomes higher, and the logarithmic ρ taken for the ratio of the two.
Can reflect the strength of the reverberation. As the reverberation gets stronger, the value of ρ
gets smaller and smaller. Therefore, the strength of the reverberation is judged by the value of 、
, and the adjustment factor β of the gain function is obtained by this.
[0037]
There are a plurality of methods for calculating the regulatory factor β, and equation (6) is an
empirical equation for calculating β in the embodiment of the present invention. (6) ρ1 and 22
are set values and are empirical values, and in the embodiment of the present invention, ρ1 is 9
dB and 22 is 2 dB (microphone spacing is 6 cm).
[0038]
11-04-2019
14
Step 1.4 The input signal x1 (t) of the auxiliary microphone and the tailing part hr (t) of the
transfer function from the auxiliary microphone to the main microphone are convoluted to
obtain a late reverberation estimation signal of the main microphone input signal. Input of step
1.4: input signal x1 (t) of auxiliary microphone, tailing part hr (t) of transfer function from
auxiliary microphone to main microphone (output of step 1.2). Output of step 1.4: Late
reverberation estimation signal of main microphone input signal (as input of step 1.45).
Specifically, it becomes the following equation. (7)
[0039]
Step 1.45 Perform frequency compensation on the late reverberation estimation signal of the
main microphone input signal to obtain a compensated signal. Input of step 1.45: Late
reverberation estimation signal of main microphone input signal (output of step 1.4). Output of
step 1.45: Late reverberation estimation signal of main microphone input signal subjected to
frequency compensation (as input of step 1.5).
[0040]
When the late reverberation estimation signal of the main microphone input signal is compared
with the true late reverberation component of the main microphone input signal, the late
reverberation estimation signal is underestimated in the low frequency region. Therefore, in the
present invention, frequency compensation is performed on the late reverberation estimation
signal of the main microphone input signal. Since the distance between the main microphone and
the auxiliary microphone affects the late reverberation estimation signal, in the embodiment of
the present invention, a low pass filter is provided according to different microphone intervals to
correspond to the late reverberation estimation signal. Frequency compensation is performed to
obtain a late reverberation estimation signal after compensation.
[0041]
FIG. 6A is a graph of the amplitude frequency characteristic of the frequency compensation filter
when the distance between the main microphone and the auxiliary microphone in the
embodiment of the present invention is 6 cm. FIG. 6B is a graph of the amplitude frequency
characteristic of the frequency compensation filter when the distance between the main
microphone and the auxiliary microphone in the embodiment of the present invention is 18 cm.
11-04-2019
15
As understood from the above, in the embodiment of the present invention, as the distance
between the main microphone and the auxiliary microphone is larger, the degree to which the
frequency compensation is performed on the low frequency band portion of the late
reverberation estimation signal of the main microphone input signal It becomes smaller.
[0042]
Step 1.5 The late reverberation estimation signal of the main microphone input signal subjected
to frequency compensation is converted from the time domain to the frequency domain to obtain
a late reverberation spectrum of the main microphone input signal. Input of step 1.5: late
reverberation estimation signal of the main microphone input signal subjected to frequency
compensation (output of step 1.45). Output of step 1.5: Late reverberation spectrum of the main
microphone input signal (as one input in the spectral subtraction process). The late reverberation
estimation signal of the frequency-compensated main microphone is converted to the frequency
domain to obtain the late reverberation spectrum of the main microphone input signal. (8)
[0043]
2. Spectrum Subtraction Process Step 2.1 Convert the main microphone input signal x2 (t)
from the time domain to the frequency domain and write X2. Input of step 2.1: main microphone
input signal x2 (t). Output of step 2.1: Frequency spectrum X2 of main microphone input signal
(inputted to step 2.2). Specifically, it becomes the following equation. (9)
[0044]
Step 2.2 The gain function G is calculated from the frequency spectrum X2 of the main
microphone input signal and the estimated late reverberation spectrum of the main microphone,
and the gain function is adjusted by the adjustment factor β. Input of step 2.2: Frequency
spectrum X2 of main microphone input signal (output of step 2.1), late reverberation spectrum
of main microphone (output of step 1.5 in reverberation spectrum estimation process),
adjustment factor of gain function β (output of step 1.3 in reverberant spectrum estimation
process). Output of step 2.2: gain function G (one input of step 2.3).
[0045]
11-04-2019
16
In one embodiment of the present invention, the power spectral subtraction method is used to
calculate the gain function G (l, k) according to the following equation. (10) Among them, 1 is a
frame number, k is a frequency point number, β is an adjustment factor of a gain function, is a
late reverberation frequency spectrum of the main microphone input signal, and X2 is a
frequency spectrum of the main microphone input signal.
[0046]
As understood from the equation (10), the magnitude of the gain function G (l, k) can be adjusted
by the adjustment factor β of the gain function. In this way, it is possible to reduce or not reduce
spectral subtraction when the reverberation is weak, ensuring that the reverberation is weak and
the speech is not damaged if the speech intelligibility is high, and the speech quality is protected.
[0047]
Step 2.3 The amplitude spectrum | X2 | of the main microphone input signal is multiplied by the
gain function G and combined with the phase of the main microphone input signal to obtain the
frequency spectrum D after dereverberation of the main microphone input signal. Input of step
2.3: frequency spectrum X2 of main microphone input signal (output of step 2.1), gain function G
(output of step 2.2). Output of step 2.3: Frequency spectrum D after dereverberation of main
microphone input signal (as input of step 2.4). Specifically, the frequency spectrum D (l, k) after
dereverberation of the main microphone input signal is calculated by the following equation. (11)
Among them, l is a frame number, k is a frequency point number, | X 2 (l, k) | is an amplitude
spectrum of the main microphone input signal, G (l, k) is a gain function, and phase (l, k) is It is
the phase of the main microphone input signal.
[0048]
Step 2.4 The dereverberated frequency spectrum D of the main microphone input signal is
converted to the time domain and denoted as d (t). Input of step 2.4: Frequency spectrum D after
dereverberation of main microphone input signal (output of step 2.3). Output of step 2.4: Time
domain signal d (t) after dereverberation of main microphone input signal (input of step 2.5).
(12)
11-04-2019
17
[0049]
Step 2.5 The reverberation-removed time domain signal of the main microphone input signal is
superimposed and added for each frame to obtain a continuous signal xd (t) after the
reverberation reduction of the main microphone input signal. Input of step 2.5: Time domain
signal d (t) after dereverberation of main microphone input signal (output of step 2.4). Output of
step 2.5: Continuous signal x d (t) after reduction of reverberation of main microphone input
signal (output of entire system).
[0050]
FIG. 7A is a diagram of a time domain of a main microphone input signal according to an
embodiment of the present invention. FIG. 7B is a diagram of a time domain after
dereverberation for the main microphone in the embodiment of the present invention. FIG. 7C is
an audio spectrogram of the main microphone input signal according to an embodiment of the
present invention. FIG. 7D is an audio spectrogram after dereverberation with respect to the main
microphone in the embodiment of the present invention.
[0051]
7A to 7D, in the present embodiment, the main microphone and the auxiliary microphone face
the sound source, the vertical distance from the sound source to the dual microphone is 2 m, and
the distance between the main microphone and the auxiliary microphone is 18 cm. The C50
before dereverberation from the main microphone input signal is 6.8 dB, and the C50 after
dereverberation using the method shown in FIG. 4 is 10.5 dB, as can be seen from this. After
adopting the inventive method, C50 improves by 3.7 dB.
[0052]
FIG. 8 is an overall block diagram of a dual microphone based audio reverberation reduction
apparatus according to an embodiment of the present invention.
The apparatus processes the signals received by the main microphone and the auxiliary
11-04-2019
18
microphone for each frame, and includes a reverberation spectrum estimation unit 700 and a
spectrum subtraction unit 800 as shown in FIG.
[0053]
The reverberation spectrum estimation unit 700 receives the main microphone input signal and
the auxiliary microphone input signal, and calculates a transfer function h (t) from the auxiliary
microphone to the main microphone by the main microphone input signal and the auxiliary
microphone input signal, The tailing part hr (t) of the transfer function h (t) is acquired, and the
strength of the reverberation is determined by the transfer function h (t) to calculate the
adjustment factor β of the gain function and output to the spectral subtraction unit 800 Then,
convolution is performed to hr (t) using the auxiliary microphone input signal to obtain the late
reverberation estimation signal of the main microphone input signal, and the time domain to the
frequency domain with respect to the late reverberation estimation signal of the main
microphone input signal Conversion to obtain the late reverberation spectrum of the main
microphone input signal, and output to spectrum subtraction unit 800. Used to.
[0054]
The spectrum subtraction unit 800 receives the main microphone input signal, the adjustment
factor β of the gain function output from the reverberation spectrum estimation unit 700, and
the later reverberation spectrum of the main microphone input signal, and receives the main
microphone input signal from the time domain. Conversion to the frequency domain is
performed to obtain the frequency spectrum of the main microphone input signal, and the gain
function is calculated from the frequency spectrum of the main microphone input signal, the
adjustment factor β of the gain function, and the later reverberation spectrum of the main
microphone input signal The frequency spectrum of the microphone input signal is multiplied by
the gain function to obtain the frequency spectrum of the main microphone input signal after
dereverberation, and the frequency spectrum of the main microphone input signal after
dereverberation is converted from the frequency domain to the time domain To the main
microphone input signal Obtain the time domain signal after dereverberation of the No., and
superimpose and add the time domain signal after dereverberation of the main microphone input
signal for each frame, and then output the continuous signal after the reverberation reduction of
the main microphone input signal Used for
[0055]
In one embodiment of the present invention, the reverberation spectrum estimation unit 700
performs convolution to hr (t) using the auxiliary microphone input signal to obtain a later
reverberation estimation signal of the main microphone input signal, and then the main
microphone first. After performing frequency compensation on the late reverberation estimation
11-04-2019
19
signal of the input signal and then performing time domain to frequency domain conversion on
the frequency compensated signal to obtain the late reverberation spectrum of the main
microphone input signal, the spectrum is obtained. Output to the subtraction unit 800.
[0056]
FIG. 9 is a detailed configuration of an audio reverberation reduction apparatus based on dual
microphones according to a preferred embodiment of the present invention and an input /
output schematic diagram thereof.
Referring to FIG. 9, the audio reverberation reduction apparatus based on the dual microphone
includes reverberation spectrum estimation unit 91 and spectrum subtraction unit 92.
Among them, the reverberation spectrum estimation unit 91 includes a transfer function
calculation unit 911, a transfer function tailing calculation unit 912, a reverberation strength
determination unit 913, a late reverberation estimation unit 914, a frequency compensation unit
915, and a first time / frequency conversion unit 916. Including.
The spectral subtraction unit 92 includes a second time / frequency conversion unit 921, a gain
function calculation unit 922, a dereverberation unit 923, a frequency / time conversion unit
924 and a superposition and addition unit 925.
[0057]
The transfer function calculation unit 911 receives the main microphone input signal and the
auxiliary microphone input signal, and calculates the transfer function h (t) from the auxiliary
microphone to the main microphone by the main microphone input signal and the auxiliary
microphone input signal, The transfer function h (t) is used to output to the transfer function
tailing calculation unit 912 and the reverberation strength determination unit 913.
[0058]
The transfer function tailing calculation unit 912 is used to obtain the tailing part hr (t) of the
transfer function h (t) and output it to the late reverberation estimation unit 914.
11-04-2019
20
Specifically, the transfer function tailing calculation unit 912 takes the boundary point between
the early reverberation and the late reverberation on the time axis of the transfer function h (t),
and obtains the value before the boundary point of the transfer function h (t). Set to 0 to obtain
the tailing part hr (t) of the transfer function h (t).
[0059]
The reverberation strength determination unit 913 is used to determine the strength of the
reverberation by the transfer function h (t) and to calculate the adjustment factor β of the gain
function and output it to the gain function calculation unit 922. Specifically, the reverberation
strength determination unit 913 calculates a parameter 表 す representing the reverberation
strength according to the above equation (5). That is, h (t) is the transfer function from the
auxiliary microphone to the main microphone, and T is the designated boundary point on the
time axis of h (t). Then, the reverberation strength determination unit 913 calculates the
adjustment factor β of the gain function by the above equation (6). That is, among them, ρ1 and
ρ2 take set values. For example, ρ1 is 9 dB and 22 is 2 dB (the distance between microphones
is 6 cm).
[0060]
The late reverberation estimation unit 914 receives the auxiliary microphone input signal and
performs convolution to hr (t) using the auxiliary microphone input signal to obtain the late
reverberation estimation signal of the main microphone input signal and then to the frequency
compensation unit 915. Used for output.
[0061]
The frequency compensation unit 915 is used to perform frequency compensation on the late
reverberation estimation signal of the main microphone input signal and to output the frequency
compensated signal to the first time / frequency conversion unit 916.
The larger the distance between the main microphone and the auxiliary microphone, the smaller
the degree to which the frequency compensation unit 915 performs frequency compensation on
the late reverberation estimation signal of the main microphone input signal.
11-04-2019
21
[0062]
The first time / frequency conversion unit 916 converts the frequency-compensated main
microphone input signal from the time domain to the frequency domain with respect to the later
reverberation estimation signal to obtain the later reverberation spectrum of the main
microphone input signal. After being obtained, it is used to output to the gain function
calculation unit 922.
[0063]
The second time / frequency conversion unit 921 receives the main microphone input signal,
performs conversion from the time domain to the frequency domain, and obtains the frequency
spectrum of the main microphone input signal, and then the gain function calculation unit 922
and dereverberation It is used to output to unit 923.
[0064]
The gain function calculation unit 922 is a frequency spectrum of the main microphone input
signal output from the second time / frequency conversion unit 921, an adjustment factor β of
the gain function output from the reverberation strength determination unit 913, and the first
time / time. The second reverberation spectrum of the main microphone input signal output from
the frequency conversion unit 916 is used to calculate a gain function and output the result to
the dereverberation unit 923.
The gain function calculation unit 922 can calculate the gain function G (l, k) according to the
above equation (10).
That is, among them, 1 is a frame number, k is a frequency point number, β is an adjustment
factor of a gain function, is a late reverberation frequency spectrum of the main microphone
input signal, and X2 is a frequency spectrum of the main microphone input signal.
[0065]
The dereverberation unit 923 multiplies the frequency spectrum of the main microphone input
11-04-2019
22
signal by the gain function to obtain the dereverberated frequency spectrum of the main
microphone input signal, and outputs the frequency spectrum to the frequency / time conversion
unit 924. In the present embodiment, the dereverberation unit 923 calculates the frequency
spectrum D (l, k) after dereverberation of the main microphone input signal according to the
above equation (11). Among them, 1 is the frame number, k is the frequency point number, | X 2
(l, k) | is the amplitude of the main microphone input signal, G (l, k) is the gain function, and
phase (l, k) is the main microphone It is the phase of the input signal.
[0066]
The frequency / time conversion unit 924 converts the frequency spectrum of the main
microphone input signal after dereverberation from frequency domain to time domain to obtain a
time domain signal after dereverberation of the main microphone input signal. It is used to
output to the superposition addition unit 925.
[0067]
The superposition and addition unit 925 is used to perform superposition addition for each
frame on the time domain signal output from the frequency / time conversion unit 924 to obtain
a continuous signal after dereverberation of the main microphone input signal.
[0068]
Summarizing the above, the audio reverberation reduction apparatus based on the dual
microphone as in the embodiment of the present invention processes the signal received by the
main microphone and the auxiliary microphone for each frame.
The reverberation spectrum estimation unit 700 in the device receives the input signal x2 (t) of
the main microphone and the auxiliary microphone input signal x1 (t), and transmits from the
auxiliary microphone to the main microphone by x2 (t) and x1 (t). The function h (t) is calculated
to obtain the tailing part hr (t) of h (t), and the strength of the reverberation is judged by h (t) to
calculate the regulator function β of the gain function It outputs to the spectrum subtraction
unit 800 in this apparatus.
At this time, convolution is performed to hr (t) using x1 (t) to obtain a late reverberation
estimation signal x2 (t), and conversion from time domain to frequency domain is performed on
x2 (t). It is used to obtain the later reverberation spectrum of t) and output to the spectrum
11-04-2019
23
subtraction unit 800 in the device. A spectral subtraction unit 800 in the apparatus performs a
time domain to frequency domain conversion on x 2 (t) to obtain a frequency spectrum of x 2 (t),
and the frequency spectrum of x 2 (t), the β and β The gain function is calculated by and the
frequency spectrum of x 2 (t) is multiplied by the gain function to obtain the frequency spectrum
after dereverberation of x 2 (t), and conversion from the frequency domain to the time domain is
performed, x 2 It is used to obtain a time domain signal after dereverberation of (t).
[0069]
In a device such as the present invention, the second reverberation estimation signal of the main
microphone input signal x2 (t) is obtained by performing convolution on the auxiliary
microphone input signals x1 (t) and hr (t), and then the spectral subtraction method is
performed. Since the late reverberation estimation spectrum of the main microphone input signal
is subtracted from the frequency spectrum of the main microphone input signal x2 (t), the late
reverberation is effectively removed from the main microphone input signal x2 (t) and the early
reverberation is reserved. Can improve voice quality. At the same time, in the process of
estimating the late reverberation of the present invention, the intensity of the spectral
subtraction is adjusted according to the intensity of the reverberation, and when the
reverberation is weak, the reverberation is weak and speech intelligibility is weak. Is guaranteed
not to damage the voice if it is high, and the voice quality is protected. And, in such a device,
there is no need to accurately estimate the direction of arrival of the direct sound, so the
microphone is not required to have high consistency, and there is no strict limitation on the
acoustic design.
[0070]
As can be seen from the above, the technical means of the present invention removes
reverberation and at the same time effectively protects speech, automatically estimates the
degree of reverberation in a room, and selects appropriate processing even in various
environments. You then reach near-optimal voice quality. And, there are no strict limitations on
microphone consistency and acoustic design, making the application more flexible and
convenient.
[0071]
11-04-2019
24
What has been described above is merely a preferred embodiment of the present invention and is
not intended to limit the protection scope of the present invention. All changes, equivalent
replacements, improvements and the like made within the spirit and principle of the present
invention shall fall within the protection scope of the present invention.
11-04-2019
25
Документ
Категория
Без категории
Просмотров
0
Размер файла
39 Кб
Теги
description, jp2015523609
1/--страниц
Пожаловаться на содержимое документа