close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2008287046

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008287046
An object of the present invention is to remove an offensive sound due to a change in
background noise while securing sufficient echo and howling suppression. A background noise
interpolation apparatus according to the present invention includes a noise level estimation unit,
a complex plane area determination unit, a noise signal generation unit, a loss compensation
coefficient calculation unit, a multiplication unit, and an addition unit. The complex plane area
determination unit determines, for each frequency band, what area the frequency domain audio
signal is in. The noise signal generation unit, for each frequency band, when the determination
result of the complex plane region determination unit is the noise level region, in principle, the
frequency domain speech signal is a frequency region noise signal, and the determination result
of the complex plane region determination unit is noise If it is not in the level domain, in
principle, the real domain and the imaginary domain are suppressed so as to belong to the noise
level domain while generating a frequency domain noise signal while increasing the suppression
rate of the real domain of the frequency domain audio signal. The background noise interpolator
uses this frequency domain noise signal to generate a signal that interpolates the background
noise. [Selected figure] Figure 7
Background noise interpolation device, background noise interpolation method
[0001]
BACKGROUND OF THE INVENTION Field of the Invention The present invention relates to a
background noise interpolator for interpolating background noise into a speech signal which has
been given a loss by a speech switch or echo suppressor, and to a background noise interpolation
method.
15-04-2019
1
[0002]
When a speech communication is performed at a communication terminal, in order to prevent
acoustic echo and howling caused by acoustic coupling between a speaker and a microphone, a
technique for giving a loss to a voice signal is often used.
Specifically, voice switches and echo suppressors are used.
[0003]
A voice switch or an echo suppressor adjusts the amount of loss so as to give a loss to the voice
signal when there is no voice to be transmitted or received or when there is an echo, otherwise it
does not give a loss. Therefore, there are times when background noise included in the audio
signal is given loss and when it is not given loss. And, the change of the background noise makes
the speaker an annoying sound. As a countermeasure, there is a method of controlling the
insertion loss amount of the voice switch or the echo suppressor for each frequency band (Patent
Document 1). JP, 2001-94480, A
[0004]
In the method of Patent Document 1, the trade-off relationship between the suppression of echo
and howling and the reduction of the change in background noise is optimized for each
frequency band to reduce the offensive sensation. However, in order to reduce the unpleasant
sensation sufficiently, there is a limit to the echo and howling suppression performance.
[0005]
An object of the present invention is to remove offensive sounds due to changes in background
noise while securing sufficient echo and howling suppression.
[0006]
The background noise interpolation device according to the present invention includes an audio
15-04-2019
2
signal converted into a frequency domain (hereinafter referred to as "frequency domain audio
signal"), and the frequency domain audio signal (hereinafter referred to as "a loss" given to each
predetermined frequency band). It has a function of generating a signal obtained by interpolating
background noise with respect to the frequency domain loss providing signal from the
“frequency domain loss applying signal” and the loss amount for each frequency band of the
loss.
Specifically, a noise level estimation unit, a complex plane area determination unit, a noise signal
generation unit, a loss compensation coefficient calculation unit, a multiplication unit, and an
addition unit are provided. The noise level estimation unit estimates the noise level included in
the frequency domain speech signal for each frequency band. The complex plane area
determination unit determines, for each frequency band, to which area of the plurality of areas
on the complex plane the frequency domain speech signal belongs to from the real part and the
imaginary part of the frequency domain speech signal. However, one of the plurality of regions
on the complex plane is a noise level region corresponding to a range determined to be noise
based on the noise level. The noise signal generation unit is referred to as a noise signal in the
frequency domain (hereinafter referred to as “frequency domain noise signal”) for each
frequency band based on the determination result of the complex plane region determination
unit. Generate). For example, when the determination result of the complex plane region
determination unit is a noise level region, a frequency domain speech signal or a signal obtained
by correcting the frequency region speech signal is regarded as a frequency region noise signal,
and the determination result of the complex plane region determination unit is a noise level
region. If not, in principle, the frequency domain noise signal may be generated by suppressing
the real part and the imaginary part so as to belong to the noise level area while increasing the
suppression rate of the real part of the frequency domain speech signal. The loss compensation
coefficient calculation unit calculates, for each frequency band, a loss compensation coefficient
for compensating for the amount of loss. The multiplication unit multiplies the frequency domain
noise signal by the loss compensation coefficient for each frequency band to generate an
interpolation signal. The adder generates, for each frequency band, a signal obtained by adding
the interpolation signal to the frequency domain loss imparting signal.
[0007]
According to the background noise interpolation device of the present invention, interpolation
processing is performed on a signal for which sufficient echo and howling suppression has been
secured in order to eliminate discontinuous changes in background noise. Moreover, in the
interpolation processing, in consideration of what region the frequency domain speech signal
belongs to in the complex plane region, so that the coupling portion between the noise signal and
15-04-2019
3
the interpolation signal and the coupling portion between the interpolation signals do not
become discontinuous. And an interpolation signal is generated. Therefore, it is possible to
remove an offensive sound due to a change in background noise while securing sufficient echo
and howling suppression.
[0008]
The principles and embodiments of the present invention are described below. In addition, the
same number is attached | subjected to the structure part which has the same function, and the
step which performs the same process, and duplication description is abbreviate | omitted. First
Embodiment FIG. 1 is a diagram showing how a background noise interpolation device of the
present invention is used. FIG. 1A shows a configuration in the case where an audio switch is
inserted before the received audio signal is converted into sound by a speaker. FIG. 1B shows a
configuration in the case where an audio switch or the like is inserted between the conversion of
sound into an audio signal by the microphone and the transmission. In the configuration of FIG.
1, the input and output of the voice switch 920, the echo suppressor 970, and the voice switch
980 are signals in the frequency domain. Thus, when the input / output of each component is a
signal in the frequency domain, the frequency converters 910, 960, 965 may be disposed first,
and the frequency inverse converters 930, 990 may be disposed last. Further, one
communication terminal may be provided with the configurations of both FIG. 1 (A) and FIG. 1
(B).
[0009]
In either case of FIG. 1 (A) or FIG. 1 (B), the background noise interpolation device 100 follows
the component (audio switch 920, echo suppressor 970, audio switch 980) for adding a loss to
the audio signal. Be placed. Further, the input to the background noise interpolation device 100
is a configuration in which a voice signal in the frequency domain (frequency domain voice
signal), a voice signal in which loss in the frequency domain is added (frequency domain loss
assignment signal), and loss in the voice signal The amount of loss for each frequency band given
by When both the echo suppressor 970 and the voice switch 980 give a loss as shown in FIG. 1B,
a multiplier 975 or the like is used to give the product of the amount of loss. The output of the
background noise interpolation device 100 is a voice signal (voice signal after frequency domain
interpolation) in which background noise in the frequency domain is interpolated.
[0010]
15-04-2019
4
Principle Background of the Invention The noise interpolator of the present invention generates a
frequency domain noise signal used for interpolation from a frequency domain speech signal. In
the following, first, it will be shown what kind of frequency domain noise signal can be used for
interpolation to effectively reduce offensive sounds and how the frequency domain noise signal
should be generated.
[0011]
FIG. 2 shows an image of a waveform when the frequency domain speech signal is converted to
the time domain. 2A shows an image of the waveform when the real part of the frequency
domain speech signal is converted to the time domain, and FIG. 2B shows an image of the
waveform when the imaginary part of the frequency domain speech signal is converted to the
time domain Is shown. Since the real part is synthesized based on the cosine function, as shown
in FIG. 2A, it takes non-zero values at both ends of the time frame. On the other hand, since the
imaginary part is synthesized based on a sine function, it becomes zero at both ends of the time
frame as shown in FIG. 2 (B). If the waveform becomes discontinuous at the coupled portion of
the time frame, abnormal noise is likely to occur, leading to deterioration in auditory quality.
Therefore, when generating a noise signal from an audio signal containing sounds other than
noise, the suppression ratio of the real part is made higher than that of the imaginary part. As
described above, when the frequency domain speech signal is suppressed to generate the noise
signal in the frequency domain (frequency domain noise signal), the discontinuity of the time
frame is unlikely to occur. Therefore, an unpleasant sound can be reduced effectively.
[0012]
FIG. 3 shows an area in which a signal in the frequency domain is determined as noise. FIG. 3A is
an example in which the noise level area is rectangular, and FIG. 3B is an example in which the
noise level area is circular. Background noise level (noise level) varies depending on the use
environment. It also differs depending on the frequency band. Therefore, for each frequency
band, it is estimated from the feature of level fluctuation of the frequency domain speech signal.
For example, the local minimum value of the temporal change of the frequency domain speech
signal is determined. Then, if necessary, this value is corrected to determine the noise level area.
The correction is to adjust a region to be judged as a noise level in consideration of an error or
the like depending on the estimation method. For example, there is a correction method such as
setting the noise level to 1.5 times the estimated value.
15-04-2019
5
[0013]
FIG. 4 is a diagram showing how a frequency domain noise signal is generated from a frequency
domain speech signal. As shown in FIG. 4A, when the frequency domain speech signal is in the
noise level domain, the frequency domain speech signal is used as it is as the frequency domain
noise signal. Note that, for example, a value obtained by multiplying the imaginary part and the
real part of the frequency domain speech signal by 0.9 may be used as the frequency domain
noise signal. Also, as shown in FIG. 4B, when the frequency domain speech signal is outside the
noise level domain, suppression is performed so as to be within the noise level domain while
making the real part rejection rate larger than the imaginary part rejection rate ((4) Map). And let
the signal after suppression be a frequency domain noise signal. The method of suppression may
be determined by the designer as appropriate, and any method of suppression may be
determined such as using a function to be suppressed (mapping). However, as mentioned above,
the principle is to make the real part suppression rate larger than the imaginary part suppression
rate.
[0014]
FIG. 5 is a diagram illustrating an example of the suppression method. FIG. 5A is a diagram in
which the outside of the noise level region (the region indicated by (1) in the drawing) is divided
into three regions. (2) shows a second region in which both the real part and the imaginary part
fall within the range where the absolute value is large. (3) shows a third region in which the
absolute value of the real part is large but the absolute value of the imaginary part is small. (4)
shows a fourth region in which the absolute value of the imaginary part is large but the absolute
value of the real part is small. When the frequency domain speech signal belongs to the noise
level domain, the frequency domain speech signal is regarded as a frequency domain noise signal
as described above. When the frequency domain speech signal belongs to the second domain,
frequency domain noise belonging to the noise level domain while making the real part
suppression rate larger than that of the imaginary part using a function to suppress (map) etc.
Generate a signal. When the frequency domain speech signal belongs to the third domain, the
imaginary part is assumed to be the imaginary part of the frequency domain speech signal or a
value obtained by correcting this, and the frequency domain noise signal is suppressed by
suppressing the real part into the noise level domain. Generate The correction here is, for
example, a process of multiplying the value of the imaginary part by 0.9 (process of adjusting the
value of the imaginary part). When belonging to the fourth region, it is considered that the
imaginary part contains much information other than background noise, so neither the imaginary
15-04-2019
6
part nor the real part is actively used. Therefore, a frequency domain noise signal of a value with
sufficiently small imaginary part and real part is generated. In this case, the imaginary part and
the real part both have sufficiently small values (the background noise in the corresponding
frequency band is almost eliminated), so the suppression rate of the real part is not necessarily
the imaginary part as in the above principle. It does not have to be larger than the suppression
rate of FIG. 6 shows another division of the noise level area, the second area, the third area, and
the fourth area. As described above, there are various methods for dividing the area.
[0015]
Functional Configuration and Processing Flow FIG. 7 shows a functional configuration example of
the background noise interpolation device of the first embodiment. FIG. 8 shows an example of
the internal configuration of the noise signal generator 130. FIG. 9 is a diagram showing an
example of the processing flow of the background noise interpolation device of the first
embodiment. The background noise interpolation apparatus 100 includes a noise level estimation
unit 110, a complex plane area determination unit 120, a noise signal generation unit 130, a loss
compensation coefficient calculation unit 140, a multiplication unit 150, and an addition unit
160. The noise signal generation unit 130 includes a noise level area processing unit 131 and a
noise level area outside processing unit 135. As shown in FIG. 1, the background noise
interpolation device 100 is given a loss for each of the frequency domain loss imparting signals
(predetermined frequency bands) to the frequency domain speech signal (speech signal
converted to the frequency domain). And the amount of loss for each frequency band of the loss.
[0016]
The noise level estimation unit 110 estimates the noise level included in the frequency domain
speech signal for each frequency band (S110). As a method of estimating the noise level, for
example, there is a method shown in the explanation of the principle.
[0017]
Complex plane region determination unit 120 determines, for each frequency band, to which
region of a plurality of regions on the complex plane predetermined a frequency domain speech
signal belongs from the real part and imaginary part of the frequency domain speech signal
(S120). However, one of the plurality of regions on the complex plane is a noise level region
15-04-2019
7
corresponding to a range determined to be noise based on the noise level.
[0018]
The noise signal generation unit 130 generates a frequency domain noise signal (a noise signal in
the frequency domain) for each frequency band based on the determination result of the complex
plane area determination unit 120 (S130). For example, it is confirmed whether the
determination result is a noise level area (S1301). When the determination result of the complex
plane area determination unit is the noise level area, the noise level area processing unit 131 sets
the frequency domain speech signal as the frequency domain noise signal (S131). The frequency
domain noise signal may be corrected by multiplying both the imaginary part and the real part of
the frequency domain speech signal by 0.9. When the determination result of the complex plane
area determination unit is not the noise level area, in principle, the noise level outside processing
means 135 belongs to the noise level area while increasing the suppression rate of the real part
of the frequency domain speech signal. To suppress the real part and the imaginary part to
generate a frequency domain noise signal (S135).
[0019]
The loss compensation coefficient calculation unit 140 calculates, for each frequency band, a loss
compensation coefficient for compensating for the amount of loss (S140). For example, if the
amount of loss for each frequency band ω is given as α (ω) (where 0 ≦ α (ω) ≦ 1), then
1−α (ω) is taken as the loss compensation coefficient for each frequency band calculate.
[0020]
The multiplying unit 150 multiplies the frequency domain noise signal by the loss compensation
coefficient for each frequency band to generate an interpolation signal (S150). The addition unit
160 generates and outputs a frequency domain post-interpolation audio signal obtained by
adding the interpolation signal to the frequency domain loss imparting signal for each frequency
band (S160).
[0021]
15-04-2019
8
According to the background noise interpolation apparatus 100 of the first embodiment,
interpolation processing is performed on a signal for which sufficient echo and howling
suppression has been secured in order to eliminate discontinuous changes in background noise.
Moreover, in the interpolation processing, in consideration of what region the frequency domain
speech signal belongs to in the complex plane region, so that the coupling portion between the
noise signal and the interpolation signal and the coupling portion between the interpolation
signals do not become discontinuous. And an interpolation signal is generated. Therefore, it is
possible to remove an offensive sound due to a change in background noise while securing
sufficient echo and howling suppression. [Modification Example] In this modification example, as
shown in FIG. 5, the case where the outside of the noise level area is divided into three areas will
be described. In this case, as shown in FIG. 8, the noise level outside processing means 135 is
provided with a second area processing means 132, a third area processing means 133, and a
fourth area processing means 134. Also in the process flow, as shown in FIG. 9, the process of
the second area (S132), the process of the third area (S133), and the process of the fourth area
(S134) are processes outside the noise level area (S135). It is contained in.
[0022]
Then, when it is determined that the frequency domain speech signal belongs to the second
region in which both the real part and the imaginary part fall within the range where the
absolute value is larger than the noise level, the second region processing means 132 A
frequency domain noise signal belonging to the noise level domain is generated while making the
suppression rate larger than the suppression rate of the imaginary part (S132). If it is determined
that the frequency domain speech signal belongs to a third domain that falls within a range in
which the absolute value of the real part is larger than the noise level but the absolute value of
the imaginary part is smaller, The imaginary part is assumed to be the imaginary part of the
frequency domain speech signal or a value obtained by correcting the imaginary part, and the
real part is suppressed to the place within the noise level area to generate a frequency domain
noise signal (S133). The correction here is, for example, a process of multiplying the value of the
imaginary part by 0.9 (process of adjusting the value of the imaginary part). When it is
determined that the frequency domain speech signal belongs to a fourth domain that falls within
a range where the absolute value of the imaginary part is larger than the noise level but the
absolute value of the real part is smaller, A frequency domain noise signal having a sufficiently
small value for both the imaginary part and the real part is generated (S134). In steps S132 and
S133, a frequency domain noise signal belonging to the noise level region is generated while
always making the suppression rate of the real part larger than the suppression rate of the
imaginary part. However, in step S134, since both the imaginary part and the real part reduce
the absolute value, the frequency domain noise signal belonging to the noise level region is
15-04-2019
9
generated while the suppression rate of the real part is necessarily larger than the suppression
rate of the imaginary part. It does not have to be.
[0023]
The other configuration and processing are the same as in the first embodiment. Therefore, the
same effect is obtained. Second Embodiment In the first embodiment, the case where the input
and output of the voice switch and the echo suppressor are signals in the frequency domain is
shown. However, the input / output of the voice switch or the echo suppressor may be a time
domain signal. FIG. 10 shows a method using the background noise interpolator of the present
invention in such a case. FIG. 10A shows a configuration in the case where a voice switch is
inserted before the received voice signal is converted into a sound by the speaker. FIG. 10B
shows a configuration in the case where an audio switch or the like is inserted between the
conversion of sound into an audio signal by the microphone and the transmission. Note that one
communication terminal may have both the configurations of FIG. 10 (A) and FIG. 10 (B).
[0024]
In either case of FIG. 10A or FIG. 10B, the background noise interpolation device 200 follows the
component (audio switch 820, echo suppressor 870, audio switch 880) for adding a loss to the
audio signal. Be placed. Also, the input to the background noise interpolation device 200 is given
by an audio signal in the time domain (audio signal), an audio signal in which the loss in the time
domain is added (loss imparting signal), and a component that applies the loss to the audio
signal. It is the amount of loss for each frequency band. The output of the background noise
interpolation device 200 is a voice signal (voice signal after interpolation) in which background
noise in the time domain is interpolated.
[0025]
FIG. 11 shows an example of the functional configuration of the background noise interpolation
device according to the second embodiment. Also, FIG. 12 shows a processing flow of the
background noise interpolation device of the second embodiment. The background noise
interpolation apparatus 200 includes the background noise interpolation apparatus 100
described in the first embodiment, a loss-added signal frequency conversion unit 210, an audio
signal frequency conversion unit 220, and a frequency inverse conversion unit 230.
15-04-2019
10
[0026]
The loss assignment signal frequency conversion unit 210 frequency-converts the voice signal to
which the loss has been given into a frequency domain loss assignment signal (S210). The audio
signal frequency converter 220 frequency converts the audio signal into a frequency domain
audio signal (S220). By the steps S210 and S220, the input signal to the background noise
interpolation device 100 becomes the same as in the first embodiment (frequency domain speech
signal, frequency domain loss imparting signal, loss amount). The background noise interpolation
apparatus 100 outputs the speech signal after frequency domain interpolation by the method
shown in the first embodiment (S100). The frequency inverse transform unit 230 frequency
reverse transforms the frequency domain interpolated speech signal and outputs the interpolated
speech signal (S230).
[0027]
Also by the background noise interpolation device of the second embodiment, as in the first
embodiment, it is possible to remove an offensive sound due to a change in the background noise
while securing sufficient echo and howling suppression.
[0028]
FIG. 13 shows an example of the functional configuration of a computer.
The background noise interpolation apparatus of the present invention can be implemented by
causing the recording unit 2020 of the computer to read a program for executing the processing
of each component and causing the processing unit 2010, the input unit 2030, the output unit
2040, etc. to operate. . In addition, as a method of reading into a computer, a program is
recorded in a computer readable recording medium, and a method of reading into a computer
from the recording medium, a program recorded in a server or the like is read into the computer
through a telecommunication line or the like. There is a way to
[0029]
15-04-2019
11
The figure which showed how to use the background noise interpolation device of a 1st
embodiment. The figure which showed the image of the waveform at the time of converting a
frequency domain audio | voice signal into a time domain. The figure which shows the area |
region which judges the signal of a frequency domain as a noise level. FIG. 6 illustrates how to
generate a frequency domain noise signal from a frequency domain speech signal. The figure
which shows the example of the method of suppression. The figure which shows another division
of a noise level area | region, 2nd area | region, 3rd area | region, and 4th area | region. FIG. 2 is
a block diagram showing an example of the functional arrangement of a background noise
interpolation apparatus according to the first embodiment; FIG. 6 shows an exemplary internal
configuration of a noise signal generation unit 130. The figure which shows the example of the
processing flow of the background noise interpolation apparatus of 1st Embodiment. The figure
which showed how to use the background noise interpolation device of a 2nd embodiment. The
figure which shows the function structural example of the background noise interpolation
apparatus of 2nd Embodiment. The figure which shows the processing flow of the background
noise interpolation apparatus of 2nd Embodiment. The figure which shows the function
structural example of a computer.
15-04-2019
12
Документ
Категория
Без категории
Просмотров
0
Размер файла
23 Кб
Теги
description, jp2008287046
1/--страниц
Пожаловаться на содержимое документа