close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2007129736

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2007129736
A method is provided for suppressing background noise signals in an audio signal containing
sampled noise. SOLUTION: A step of digital frequency domain processing of an audio signal
containing noise to generate a time domain filtering coefficient, and a digital time domain
processing of an audio signal containing noise according to the filter coefficient, background
noise is substantially suppressed And generating an audio signal. [Selected figure] Figure 1
Method and apparatus for suppressing background noise in audio signals and corresponding
apparatus with echo cancellation
[0001]
The present invention relates generally to a method and apparatus for suppressing background
noise in voice signals in mobile phone applications. The invention also relates to a system using
this type of device in combination with echo cancellation.
[0002]
In a noisy environment, the electrical signal generated by the acousto-electrical conversion of the
audio signal mixes with the background noise. For example, if the background noise level is high,
as in a vehicle, signal processing should be used to eliminate background noise in the electrical
voice signal. Basically, there are two prior art background noise suppression methods, spectral
subtraction and filter bank.
15-04-2019
1
[0003]
As described in U.S. Pat. No. 4,628,529, when using a filter bank, the process divides the input
signal into multiple time domain signals each representing a respective predetermined frequency
band. Weighting these time signals by their respective coefficients, which are dependent on the
respective step, the step of estimating the signal-to-noise ratio for each of these time-domain
signals, and depending on the respective signal-to-noise ratio for the time-domain signal in
question. And adding the weighted time domain signals to produce the resulting audio signal
with the background noise signal suppressed. Each signal-to-noise ratio is usually estimated in
response to fluctuations in the power of the relevant time domain signal in the respective
frequency band. In filter bank processing, the above-mentioned separation, estimation, weighting
and addition steps are all performed in the time domain, so a powerful calculation means is
needed. This computing means available on mobile phones is practically limited in terms of MIPS,
due to the capabilities of digital signal processors (DSPs). Therefore, it has been proposed to limit
the background noise signal suppression process to a coarse frequency band that reduces the
accuracy of the process.
[0004]
The spectral subtraction process usually works in the frequency domain using a fast Fourier
transform (FFT). The main drawback of the spectral subtraction process is that non-linear
distortion occurs in the processed speech signal due to the loss of signal phase information. That
such distortions occur in the spectral subtraction process is that the process eliminates the phase
information in the samples generated by applying the Fast Fourier Transform to the speech
signal containing the noise to be processed. This is because the process is non-linear as a result
of applying. Furthermore, this non-linearity of the spectral subtraction process makes effective
combination with the echo cancellation process proposed in the present invention impossible.
This is because the operation of the echo canceler is adversely affected by the loss of this phase
information.
[0005]
The first object of the present invention is to provide a method for suppressing background noise
in audio signals which has the advantage of considerably reducing the power consumption
required in terms of instructions / second compared to filter bank processing. is there.
15-04-2019
2
[0006]
A second object of the present invention is to provide a method that does not generate non-linear
distortion of the audio signal to be processed, unlike spectral subtraction processing.
[0007]
Another object of the present invention is to provide a system comprising a background noise
suppressor which carries out the steps of the method in conjunction with an echo canceller.
[0008]
The present invention comprises the steps of: digital frequency processing an audio signal
containing noise to generate a time domain filtering coefficient; digital time domain processing
the audio signal containing noise according to the filter coefficient; Generating a substantially
suppressed audio signal, the method comprising: suppressing background noise signals in the
sampled noise-containing signal.
[0009]
The invention comprises a method comprising digital frequency domain processing for a given
processing cycle, comprising the steps of: extracting a plurality of frequency domain energy
components in the noisy speech signal; Estimating, for each component, the ratio of the energy
level of the noisy speech signal to the energy level of the background noise signal; and the
energy level of the noisy speech signal for each of the selected frequency domain components.
Determining the gain of each of the extracted frequency domain energy components according to
the estimated ratio of the energy level of the background noise signal; and combining the filter
coefficients according to the gains.
[0010]
The step of extracting frequency domain energy components comprises the steps of: generating
K groups (K is an integer) comprising a plurality of frequency domain components for each of the
interleaved K blocks of the noisy speech signal; And D. calculating an energy average of the K
frequency domain components of the same rank in each of the K groups to generate respective
extracted frequency domain energy components.
[0011]
15-04-2019
3
For each of the K frequency domain component groups, prior to the calculating step, the step of
selecting a number of frequency domain components having respective predetermined ranks in
each group is performed to select a set of selected frequencies The domain component is
symmetrical to the corresponding frequency domain component in the plurality of extracted
frequency domain components.
Furthermore, the generation step and the synthesis step are respectively implemented by fast
Fourier transform and inverse Fourier transform.
[0012]
An apparatus for carrying out the method comprises: means for extracting a plurality of
frequency domain energy components in the noisy speech signal; and for each of the extracted
frequency domain energy components, the energy level and the darkness of the noisy speech
signal. Means for estimating the energy level ratio of the noise signal, and for each of the
selected frequency domain components, according to the estimated ratio of the energy level of
the noisy speech signal to the energy level of the background noise signal. A means for obtaining
respective gains of each of the extracted frequency domain energy components; a means for
combining the filter coefficients according to the gains; and a time domain filtering of the voice
signal including the noise according to the filter coefficients And means for generating an audio
signal in which the background noise signal is substantially suppressed for each successive
processing cycle.
[0013]
The invention also provides two variants of the combined echo cancellation and noise
suppression device.
[0014]
A first variant of this device comprises: a noise suppression device for suppressing background
noise signals in the speech signal to be transmitted to generate a noise suppression signal; and
based on a given speech signal and a differential signal An echo canceller comprising: first means
for generating an echo signal; and second means for subtracting the estimated echo signal from
the noise-suppressed speech signal to generate the difference signal.
[0015]
A background noise suppression device, digital frequency domain processing means for
processing the speech signal to be transmitted to generate a time domain filtering factor, and
15-04-2019
4
generating the noise suppression speech signal in which the background noise signal is
substantially suppressed First digital time domain processing means for processing the audio
signal according to the filter coefficient; and an audio signal received from a remote terminal
according to the filter coefficient to generate the given audio signal. And second digital time
domain processing means very similar to said first time domain processing means to be
processed.
[0016]
A second variant of the device comprises: a first means for generating an estimated echo signal
based on the audio signal and the differential signal received from the remote terminal; And
second means for subtracting an echo signal to produce said difference signal.
[0017]
This variation further comprises: a background noise suppression device for suppressing a
background noise signal in the difference signal to generate a noise suppression audio signal,
wherein the background noise suppression device is configured to: Digital frequency domain
processing means for processing to generate filtering coefficients; digital time domain processing
said difference signal according to said filter coefficients such that said background noise signal
is substantially suppressed. And processing means.
[0018]
Other features and advantages of the present invention will become more apparent from the
following description when read in conjunction with the corresponding accompanying drawings.
[0019]
Referring to FIG. 1, an apparatus 1 according to the invention for suppressing background noise
signals in audio signals comprises a sampling circuit 1a, a frequency domain processing circuit
100 and a time domain processing circuit 14.
The frequency domain processing circuit 100 includes an energy component extraction circuit
10, a signal noise ratio estimation circuit 11, a gain calculation circuit 12, and a filter coefficient
synthesis circuit 13 connected in cascade.
15-04-2019
5
The time domain processing circuit 14 is a finite impulse response (FIR) time domain filter.
[0020]
The sampling circuit 1a samples the noisy analog signal s (t) at a frequency F = 1 / T.
This signal consists of an audio signal and a background noise signal added to it.
The sampled speech signal s (nT) containing noise generated by the sampling operation is sent to
one input of the energy component extraction circuit 10 in the frequency domain processor 100
and to one input of the FIR time domain filter 14.
FIG. 2 schematically represents the processing performed by the circuit 10 for receiving a speech
signal s (nT) containing noise.
The noisy sampled speech signal s (nT) is in the form of a series of frames of samples, and four of
these frames T (n-2), T (n-1), T (n), T (N + 1) is shown in the first line of FIG.
In the illustrated embodiment, the frame T (n) consists of M = 128 samples e (n) m (m varies
between 0 and 127).
For each frame T (n) associated with a given processing cycle of the method according to the
invention, a sample block B (1), B (2), B (3) of integer K = 3 is generated.
This K = 3 sample block is formed in the illustrated embodiment by a frame T (n) and two frames
T (n-2) and T (n-1).
Sample blocks B (1) to B (3) with K = 3 are interleaved and rank 0 and M / 2 = 64 in frame T (n2) and in frame T (n-1) respectively Comprising 2M = 256 consecutive samples in frames T
(n−2) to T (n), starting from the first sample of K = 3 of rank 0 of.
15-04-2019
6
Each group b (1) i, b (2) i, b (3) i of 2M samples (i varies from 0 to (2M-1) = 255), but block B (1),
Form B (2), B (3).
In steps 100a, 100b, 100c, three identical fast Fourier transforms are applied to the respective
sample groups b (1) i, b (2) i, b (3) i (0 ≦ i ≦ 255). A time window operation can be performed
prior to these fast Fourier transform steps. These fast Fourier transforms are performed on K = 3
sample groups b (1) i, b (2) i, b (3) i and K = 3 frequency domain component groups E (1) i, E ( 2)
Associate each of i, E (3) i (i varies from 0 to 255). The subsequent processing is simplified by
selecting several frequency domain components in each group E (1) i to E (3) i (0 ≦ i ≦ 255) in
step 101 in FIG. This step is based on the property that the fast Fourier transform of the actual
signal has pseudo-symmetry. Since the samples forming the speech signal are actual speech
signals, each frequency domain component group E (k) i (k = 1, 2 or 3) can be written in the
following form.
E(k)i={E(k)0,E(k)1,...,E(k)127,E(k)128,E(k)
129=E(k)127,...,E(k)225=E(k)1} (1)
[0021]
In process step 101, for each group E (k = 1) i, E (k = 2) i, E (k = 3) i (0 ≦ i ≦ 255), several
component frequency domain components are selected, ie The components E (k) 0 to E (k) 128
forming the selected frequency domain group are selected. This first 129 selected frequency
regions are sufficient to represent each group E (k) i (0 ≦ i ≦ 255). This is because other
frequency components in the group, that is, the subsequent 127 components E (k) 129 to E (k)
255 can be deduced by considering the symmetry. The frequency domain components E (k) 0 to
E (k) 128 selected in each group correspond to E (k) 129 corresponding to those selected from
all frequency domain components in the group originally generated. And is symmetric with E (k)
255. Thus, the output of process step 101 includes frequency domain components E (k) 0 to E (k)
128 for each group. At step 102, the 129 frequency domain components selected in each group
are decimated by two, and only one component in two of each selected component group is
retained. In step 102, the components are decimated by 2 so that one component in two is
selectively discarded for a given frequency, and two at two respective frequencies on either side
of the given frequency. The interaction effect of each of the four frequency domain components
on the discarded component is suppressed. In practice, the 65 frequency domain components E
(k) i to be stored have i = 1, 3, 5,. . . , 127 is a component. Since the frequency component E (k) 0
is a continuous component, holding it does not provide a benefit. To simplify the notation. These
frequency components E (k) i (i = 1, 3, 5,..., 127, 128) are denoted as E (k) j (0 ≦ j ≦ 64).
Therefore, the result of steps 101 and 102 for each first component group E (1) i, E (2) i, E (3) i
15-04-2019
7
(0 ≦ i ≦ 255) is a group of components selected and decimated is there.
[0022]
In step 103, three sets of K = 3 frequency domain components of the same rank j among the
selected / decimated frequency components of K = 3 E (1) j, E (2) j, E (3) j ( An energy average of
j varies from 0 to 64 is calculated, and 65 average energy components Emj (j varies from 0 to 64)
are generated. In this calculation, the modulus of each frequency domain component of the same
rank j in K = 3 selected / decimated components is squared to generate an energy component of
K = 3, and then this K = 3 energy component Is averaged.
[0023]
Thus, the device 10 processes the speech signal s (nT) with noise relating to that frequency or
frequency band, respectively, during one cycle related to one frame T (n) that processes the
speech signal s (nT) with noise. Extract 65 energy components Emj representing the energy or
power of. Although all steps 100, 101, 102 described with respect to FIG. 2 enhance the method
of the present invention, a single M = 128 samples of frame T (n) held for that processing cycle
in question. It should be noted that the fast Fourier transform can be reduced to a single stage
applied. Furthermore, the selection step 101 is optional and is applied directly to the frequency
domain components generated by the FFT process.
[0024]
Referring again to FIG. 1, 65 energy components Emj (0 ≦ j ≦ 64) are sent to one signal input to
the signal noise ratio estimation circuit 11. For each of the extracted 65 energy components Emj,
the circuit 11 determines, for the energy component Emj in question, between the noisy speech
signal s (nT) and the background noise signal contained in the noisy speech signal. The signal
noise ratio SNRj is estimated. This signal to noise ratio is given by the following equation.
SNRjn=Emjn/Bjn(2)
[0025]
Where n is the number of processing cycles for frame T (n) and Bj is the noise energy component
in energy component Emj.
15-04-2019
8
[0026]
In practice, this signal-to-noise ratio estimation is based on the calculation of noise energy
components estimated at each given energy component.
In this estimation, for example, the extracted energy component Emjn and the noise energy
component Bjn <− calculated earlier during the processing cycle before the relevant processing
cycle for suppressing the noise signal in the frame T (n). A ratio of 1> is used. The higher the
ratio, the stronger the presence of an audio signal relating to the relevant frequency domain
energy component Emjn, and in this case, the noise component Bj calculated for the energy
component Emj (<n-1)>. <n-1)> is maintained in the noise component Bjn. The lower this ratio is,
the stronger the energy component is equivalent to the noise signal, and in this case, the noise
component Bjn fluctuates by calculation. The circuit 11 assigns a signal noise ratio SNRj (0 ≦ j
≦ 64) to each of the extracted energy components Emj (0 ≦ j ≦ 64) using an estimation
algorithm based on this principle. Circuit 12 calculates gain Gj for each of the 65 signal noise
ratios SNRj, for example, assuming a value of approximately 0 to 1 directly related to signal noise
ratio SNRj for the corresponding frequency domain component. . For a given frequency domain
energy component Emj, the higher the ratio SNRj of the noisy speech signal s (nT) to the noise
signal, the lower the gain Gj and the lower the noisy SNR signal ratio SNRj The lower the gain Gj,
the higher the gain. Therefore, the noise signal component is attenuated for each frequency
domain energy component Emj. The gain Gj is a gain by which the weighting of the respective
energy components Emj gives a discrete spectrum of weighted frequency domain energy
components representing the speech signal s (nT) with noise signals substantially suppressed.
[0027]
One output of the circuit 12 for generating the gain Gj is sent to one input of the filter coefficient
synthesis circuit 13. This circuit 13 comprises a first circuit (not shown) which duplicates the 65
gains Gj calculated using equation 1. This circuit has 65 gains G0, G1,. . . , G 64, and produces
128 gains which can be written in the form of gains G j (i is 0 to 127) as follows.
Gj={G0,G1,...,G63,G64,G65=G63,...,G127=G1}
[0028]
15-04-2019
9
A second circuit (not shown) in the synthesis circuit 13 in the form of an inverse Fourier
transform TFD <-1> is the 128 coefficients C (nT) of the filter 14 by inverse Fourier transforming
the 128 gains Gj. Synthesize The 128 coefficients C (nT) are sent to the filter 14, ie usually to the
first control input of the FIR filter. The second input of the filter 14 receives the noisy speech
signal s (nT). The filter 14 convolutes 128 samples of frame T (n) with a coefficient C (nT) to
produce a noise suppressed frame of 128 samples forming part of the noise suppressed speech
signal s <*> (nT) Do. This process applied by the above described apparatus is, of course, that the
control input of the FIR filter 14 is carried out on each frame T (n) by means of processing steps
10, 11, 12, 13 performed on the samples forming the audio signal to be processed. It is
"adaptive" in that it is corrected every).
[0029]
Summarizing the above, the main features of the background noise suppression method of the
present invention are firstly the generation of time domain filter coefficients C (nT) using digital
frequency domain processing 100 of the noisy speech signal, Using the digital time domain
processing 14 of the noisy speech signal s (nT) using the filter coefficient C (nT) to produce a
nearly suppressed speech signal s <*> (nT) It is.
[0030]
Referring to FIG. 3, a first embodiment of a combined background noise suppression and echo
cancellation system according to the present invention is included in a terminal, i.e. usually a
mobile phone, and comprises a microphone 2, a loudspeaker 4 and the aforementioned
invention. And a time domain processing circuit 14 ′ and an echo canceller 3.
The background noise suppression device 1 is the same as the device shown in FIG. 1 and
includes a frequency domain processing device 100 and a time domain processing device 14.
The echo canceller comprises a subtractor 30 and a circuit 31 for generating an estimated echo
signal. The microphone 2 receives an audio signal [s (t) + e (t)] to be transmitted, which is formed
by the audio signal s (t) including noise and the echo signal e (t) added thereto. The echo signal is
obtained as a result of the acoustic coupling between the loudspeaker 4 and the microphone 2.
As mentioned above, the noise suppression device 1 transmits to the first input of the subtractor
30, the noise suppression transmission voice signal [s <*> (nT), the second input of which is
connected to the output of the circuit 31. Process the audio signal to be transmitted to generate +
e <*> (nT)]. The audio signal r (t) received from the remote terminal is sent to one input of the
loudspeaker and is applied to one input of the circuit 31 via the time domain processing circuit
15-04-2019
10
14 'and the sampling circuit 14a' located in front of it. Sent to An important feature of the
present invention is that the time domain processing circuit 14 'is always very similar to the time
domain processing circuit 14 in the noise suppressor 1 (FIG. 1). The feature is that the estimated
echo of the received signal r (t) generated by the circuit 31 is processed by the subtracter 30 not
by the first echo signal e (nT) but by the echo noise e <* It is based on being subtracted from>
(nT). This circuit 14 'is only a duplicate of the time domain processing circuit 14 in the device 1
as indicated by the double-dashed arrow in FIG. Thus, the time domain processing circuit 14 'is
always associated with the same 128 filter coefficients C (nT) as the circuit 14 in the device 1.
The time domain processing circuit 14 'processes the received speech signal r (t) to generate a
noise suppressed received speech signal r <*> (nT). In this process, the coefficient C (nT) of the
received signal r (t) and the sample r (nT) are convoluted in 128 cycles. The circuit 31 generates
an estimate of the noise-suppressed echo signal e <*> (nT) from the noise-suppressed received
speech signal r <*> (nT) and the echo cancellation coefficient w (nT). Therefore, at the output of
the subtractor 30, the echo signal is obtained with a substantially suppressed differential signal.
The echo cancellation coefficient w (nT) is obtained from this difference signal.
[0031]
Referring to FIG. 4, a second embodiment of the combined noise suppression and echo
cancellation system of the present invention comprises a microphone 2, a loudspeaker 4, an echo
canceller 3, a frequency domain processing unit 100 and a time domain processing circuit 14
And a sampling circuit 5. Device 100 and circuit 14 are the same as the devices and circuits
described in FIG. The echo canceller 3 comprises a subtractor 30 and a circuit 31 for generating
an estimated echo signal. The microphone 2 receives a transmission voice signal [s (t) + e (t)]
comprising a voice signal s (t) containing noise and an echo signal e (t) added thereto. The echo
signal is obtained as a result of the acoustic coupling between the loudspeaker 4 and the
microphone 2. The transmitted voice signal [s (t) + e (t)] is sampled in the sampling circuit 5 to
generate a signal [s (nT) + e (nT)]. The sampled signal is sent to the input of device 100 and
through subtractor 30 to the input of circuit 14. The speech signal r (t) received from the remote
terminal is sent to the input of the circuit 31 and to the input of the loudspeaker 4. Circuitry 31
is responsive to signal r (t) to generate an estimated echo signal that is sent to the first input of
subtractor 30. The second input of the subtractor 30 receives the transmitted speech signal [s
(nT) + e (nT)]. At the output of the subtractor 30, a differential signal is sent to the circuit 14. In
this example, the frequency domain processing performed by the device 100 is applied to the
speech signal [s (nT) -e (nT)] and the time domain of the circuit 14 is based on the coefficients C
(nT) generated by the device 100. Processing is applied to the differential or transmitted speech
signal that is processed by echo cancellation. This embodiment eliminates the "replication" of the
circuit 14 in the branch containing the circuit 31, as illustrated for the previous embodiment by
the dotted arrow in FIG.
15-04-2019
11
[0032]
FIG. 1 is a block diagram of a device according to the invention for suppressing background noise
in audio signals. FIG. 2 schematically represents the processing steps implemented in the circuit
of the device of FIG. 1; FIG. 2 is a block diagram of a first embodiment according to the invention
of a system using the device of FIG. 1 in conjunction with echo cancellation; FIG. 7 is a block
diagram of a second embodiment according to the invention of a system using the first device
with echo cancellation.
Explanation of sign
[0033]
2 microphone 3 echo canceller 4 loudspeaker 10 energy component extraction circuit 11 SNR
estimation circuit 12 gain calculation circuit 13 filter synchronization circuit 14 time domain
filter
15-04-2019
12
Документ
Категория
Без категории
Просмотров
0
Размер файла
23 Кб
Теги
description, jp2007129736
1/--страниц
Пожаловаться на содержимое документа