close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2003324787

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2003324787
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an
echo suppression method, apparatus, and program that can be used, for example, in a
speakerphone.
[0002]
2. Description of the Related Art With the spread of voice conferences, there is a demand for the
provision of a speech communication system which is excellent in simultaneous call performance
and has little echo. An echo suppressor is one that meets this requirement. FIG. 4 is a block
diagram showing a part of a conventional echo suppressor, including a receiving system from the
receiving end receiving the received signal x (k) to the speaker 2 and a transmitting system from
the microphone 3 to the transmitting end 4 In the speech communication system, the reception
signal x (k) is supplied to the pseudo echo path 6, and the pseudo echo signal y ^ (k) from the
pseudo echo path 6 is subtracted from the echo signal y (k) by the subtraction means 7. The echo
signal y (k) is thereby eliminated. Here, until the impulse response h ^ (k) of the pseudo echo
path 6 approaches the impulse response h (k) of the true echo path, the acoustic echo is returned
to the other side. In the echo suppressor, when the estimation accuracy of the pseudo echo path
is not sufficient, a loss is inserted and suppressed to either the reception signal from the other
party, the transmission signal from the near end, or both signals. In particular, a method called a
voice switch is widely used, which suppresses echo and howling generated thereby by
attenuating either or both of a reception signal and a transmission signal.
15-04-2019
1
[0003]
FIG. 5 shows a block diagram in the case where a loss is inserted and suppression means for
attenuating the signal is used. Parts in common with FIG. 4 are assigned the same numbers. The
suppression means 8 and 9 perform loss insertion to attenuate the received and transmitted
signals respectively. The amount of insertion loss used at this time is determined by the energy of
the transmission path from the speaker 2 until it is picked up by the microphone 3 and the
amount of echo reduction in the adaptive filter. Generally, the energy transferred from the
speaker 2 to the microphone 3 can not be correctly determined unless its transmission path is
estimated with a high degree of accuracy, so an amount of loss considered to be sufficient for
echo reduction is initially set. Ru. For example, the energy of the signal may be a thousandth.
[0004]
If there is only a reception signal from the other party, and there is no transmission signal from
the transmitter, insert a loss on the transmission signal side to attenuate the transmission signal
and reduce the echo returned to the other party. it can. Also, if there is no reception signal from
the other party but only the transmission signal of the transmitter, a loss is inserted on the
reception signal side. If only one side of the speech communication system is observed, the
insertion loss to the reception signal side may seem unnecessary, but if it is assumed that the
other side also forms a speech communication system, the two-point speech communication
system In order to reduce the loop gain, it is necessary to insert a loss on the reception signal
side because
[0005]
[Problems to be solved by the invention] In the case where the estimation accuracy of the pseudo
echo path is not sufficient and the loss to be inserted is large, the two-way simultaneous call
(speaking simultaneously from the other side and the near-end loudspeaker system) When a)
occurs, it is necessary to control to pass either voice preferentially (as much as possible without
attenuating the signal). In order to pass either voice preferentially, it is necessary to insert a loss
to the signal opposite to it and perform attenuation. That is, in order to pass the transmission
signal preferentially (do not attenuate the transmission signal), it is necessary to attenuate the
reception signal, and to pass the reception signal preferentially (do not attenuate the reception
signal) The speech signal needs to be attenuated.
15-04-2019
2
[0006]
In general, it is determined in advance as a specification of the acoustic echo canceller whether to
pass the transmission preferentially or to pass the reception preferentially. For example, in the
case of transmission priority, regardless of the presence or absence of a reception signal, if it is
determined that the transmission signal is voiced, a loss is inserted on the reception signal side,
and the reception signal is attenuated. In addition, when the reception priority is given, the
reception signal is attenuated regardless of the presence or absence of the transmission signal. In
addition, in the case of receiving priority, regardless of the presence or absence of the
transmission signal, if it is determined that the received signal is voiced, a loss is inserted in the
transmission signal side, and the transmission signal is attenuated. For example, when the power
of the reception signal calculated in a certain fixed time width exceeds a predetermined value, it
is determined that the reception signal is present, otherwise it is determined that the reception
signal is absent. Ru. Here, assuming that the reception signal power calculated with a fixed time
width at time k is Px (k) and the reception signal at time k is x (k), the power calculation and the
judgment as to the presence or absence of the reception signal are, for example, As given.
[0007]
Px (k) = ptx * Px (k-1) + (1-ptx) * x (k) * x (k) (1) Px (k)> Thr: Received voiced (2) Px (k) < = Thr:
Reception silence (3) Here, ptx represents a time constant for determining the time width, and for
example, ptx = 0.98 is used, and Thr is a threshold value for determining the presence or absence
of a reception signal. When the power in a fixed time width exceeds a predetermined threshold,
the receiver signal is determined to be voiced. In addition, the determination of the presence or
absence of the transmission signal may be made, for example, by the difference between the
powers of the signal collected by the microphone and the signal after the echo is canceled by the
adaptive filter of the acoustic echo canceller, each calculated over a fixed time width. When the
difference is less than a predetermined value, it is determined that transmission is silent, and
when the power difference exceeds a predetermined value, it is determined that transmission is
present.
[0008]
Here, at time k, the microphone collected signal power calculated with a fixed time width is Py
15-04-2019
3
(k), the error signal power calculated with a fixed time width after echo cancellation by the
adaptive filter is Pe (k), Assuming that the microphone collected signal and the error signal are y
(k) and e (k), respectively, the calculation of the power and the determination of the presence or
absence of the transmission signal are given, for example, as follows. Py (k) = pty * Py (k-1) + (1pty) * y (k) * y (k) (4) Pe (k) = pte * Pe (k-1) + (1-pte ) * e (k) * e (k) (5) Py (k)-Pe (k)> Ths: with
speech (6) Py (k)-Pe (k) <= Ths: without speech ( 7) Here, pty and pte represent time constants
for determining the time width, for example, pty = pte = 0.98, etc. are used, and Ths is a
threshold value for judging the presence or absence of a transmission signal. When the
difference between the power collected by the microphone with a fixed time width and the power
of the echo reduced by the adaptive filter with a fixed time width exceeds a predetermined
threshold, the transmission signal is considered to be voiced to decide.
[0009]
In fact, it is rare to talk from both sides simultaneously for a long time. In other words, it is rare
that the transmission voice and the reception voice continue at the same time. When either party
starts speaking, one of the meeting participants listens in many cases. At this time, for example,
when the specification of the acoustic echo canceller is transmission priority, the sound
generated on the listening side is a problem. When a so-called abnormal sound such as a sound
unintended by the listener or a sound emitted in a very short time and in which the transmitter is
unconsciously, for example, tapping a desk with a pen is picked up by the microphone, It is
misjudged that the transmission voice is voiced, and a loss is inserted into the reception signal,
and the reception signal is attenuated.
[0010]
As a result, on the listener side, a sound not intended to be transmitted to the other party is
picked up by the microphone, whereby the reception signal from the other party is attenuated,
and a loss is inserted in the reception signal reproduced from the speaker each time And the
sound quality of the received signal output from the speaker is degraded. For this problem,
although the transmission signal can not be inserted at all, by setting the reception priority, the
reception voice will not be interrupted. However, even if it is set to the reception priority, it may
sound like the reception voice is interrupted. Now, it is assumed that a communication
conference is being performed between the point A and the point B, and both points are handsfree and a loud-talking call system. Also, assuming that the voice switch uses the same value for
the attenuation for the reception signal and the attenuation for the transmission signal, the
attenuation by the voice switch at point A is the attenuation by La and the voice switch at point
15-04-2019
4
B. Can be defined as Lb. The amount of attenuation of the voice switch differs each time
depending on the estimation of the energy of the echo transmission line and the amount of echo
reduction by the adaptive filter. Assuming that the attenuations La and Lb are larger or smaller
than a predetermined threshold ThL (eg, 0.25), it is determined whether the attenuation is large
or small. There are four ways.
[0011]
1. A: Large B: Large 2. A: Large, B: Small 3. A: Small, B: Large 4. A: Small, B: Small Now, it is
assumed that the voice switch is set to receive priority at point A and point B, respectively, and it
is in the state of 2 in the above combination. That is, the case where the amount of attenuation at
point A is large and the amount of attenuation at point B is small is considered. The voice uttered
from the point A is emitted from the speaker at the point B, and the microphone at the point B
picks up the echo, but the echo at the point B can be sufficiently reduced by the adaptive filter.
Because the amount of attenuation by the voice switch at point B is small, the abnormal sound
generated at point B is sent to point A without being sufficiently suppressed by the voice switch.
At this time, since the voice switch at the point A is the reception priority, the same amount of
attenuation as that used to attenuate the reception signal acts on the transmission signal. When
the transmission signal is interrupted at the point A, it sounds as if the reception signal was
interrupted at the point B.
[0012]
Because the amount of attenuation at point A and the amount of attenuation at point B vary
adaptively depending on the amount of echo reduction by the adaptive filter and the estimation
accuracy of the echo transmission path, all combinations occur. The problem can not be solved
simply by predicting the pattern in advance and setting reception priority or transmission
priority. For example, in the example mentioned above, if point A is set to transmission priority
and point B is set to reception priority, interruption of the reception voice at point B will not
occur, but when the voice switch is a combination of three , And B, the received voice at point A
is interrupted. A simulation was performed using a calculator to clarify the drawbacks of the
prior art. As a condition of the simulation, a reception signal from the other party is present for
10 seconds, and a transmission signal by the near-end listener starts from 5 seconds after the
start of the simulation and is present for 5 seconds thereafter. Three abnormal sounds are
emitted by the near-end listener and collected by the microphone 3 within 5 seconds from the
start of the simulation. In addition, as a specification of the acoustic echo canceller, a
"transmission priority" specification is used in which transmission signals are preferentially
15-04-2019
5
passed.
[0013]
The simulation results are shown in FIG. 6 and FIG. The reception signal strength Pr (k) and the
transmission signal strength Ps (k) were calculated using the equations (1), (4), and (5). FIG. 6
shows a case where ptx, pty and pte used in the equations (1), (4) and (5) are ptx = pty = pte = 0.
Further, FIG. 7 shows the case where ptx = pty = pte = 0.999. That is, in FIG. 6, the signal
strength is determined by the square of the instantaneous value of the signal, and in FIG. 7, this
corresponds to calculating the signal strength with a time width of 999 ms. The solid line shown
in FIG. 6A is the power of the reception signal from the other party, and the broken line is the
power of the reception signal after being attenuated by the reception signal suppression means
8, and is the sound actually reproduced from the speaker 2.
[0014]
FIG. 6B shows the sound picked up by the microphone 3. Five seconds later, the near-end listener
starts speaking. At 0.63 seconds, 1.5 seconds, and 3.1 seconds before that, there is an abnormal
sound picked up by the microphone 3 which is not intended by the near-end listener. In the
second half 5 seconds of FIG. 6B, the solid line is the power of the signal collected by the
microphone 3, and the broken line is the power of the transmission signal after being attenuated
by the transmission signal suppression means 9. What the solid and broken lines in FIG. 7 mean
is the same as FIG. As can be seen from FIGS. 6 and 7, according to the prior art, even when the
time width for calculating the signal strength is shortened or lengthened, the occurrence of
abnormal sound causes attenuation of the reception signal and interruption of sound. I
understand.
[0015]
SUMMARY OF THE INVENTION The object of the present invention is to provide an echo
suppression method, apparatus and program in which even if an abnormal sound occurs, the
sound is not interrupted.
[0016]
According to the present invention, received signals of N (N is an integer of 2 or more) channels
are respectively passed through N pseudo echo paths to obtain N pseudo echo signals, and N N
15-04-2019
6
echo echo signals are determined. The sum of the pseudo echo signals is used as a synthetic
pseudo echo signal, and the N reception signals are simultaneously reproduced, and the
transmission signal is determined by subtracting the synthetic pseudo echo signal from the
collected echo signals, and the N reception signals are In the echo suppression method for
sequentially estimating N pseudo echo paths using a transmission signal, the intensity of the
echo signal is determined by the echo signal intensity measurement means, and the transmission
signal intensity is measured by the transmission signal intensity measurement means. The
reception signals of N channels are suppressed by the reception signal suppression means, and
the suppression output is used as the transmission signal to the pseudo echo path and the echo
path, and the transmission signal is suppressed by the transmission signal suppression means.
Measured echo signal strength and measurement It is judged by the transmission judgment
means whether the transmission is in progress with the input transmission signal strength as an
input, and the number of times it is judged that transmission is in progress by the transmission
judgment means is counted by the transmission state counting means It is judged by the
reception judging means whether the reception is in progress using the received reception signal
strength, and the number of times it is judged that the reception is in progress is counted by the
reception state counting means, and the counted number of transmission states and the
reception state We propose an echo suppression method that determines the amount of
suppression to be used in the reception signal suppression means and the transmission signal
suppression means with a number as input.
[0017]
In the present invention, furthermore, the echo signal strength measurement means for
measuring the strength of the echo signal, the transmission signal strength measurement means
for measuring the strength of the transmission signal, N channel reception signals are
suppressed and their outputs are simulated. Reception signal suppression means for transmitting
signals to the echo path and echo path, transmission signal suppression means for suppressing
the transmission signal, and echo signal strength measured by the echo signal intensity
measurement means It is determined that transmission is in progress by the transmission
determination means for determining whether the transmission signal strength measured by the
transmission signal strength measurement means is being input or not and the transmission
determination means Whether the call is being received using the reception signal strength
measured by the transmission signal count means for counting the number of times, the
reception signal strength measurement means for measuring the strength of the reception signal,
and the reception signal strength measurement means Election judgment to judge Means,
receiving state counting means for counting the number of times it is judged to be receiving by
the receiving judgment means, the number of transmitting states counted by the transmitting
state counting means and the number of receiving states counted by the receiving state counting
means An echo suppressor is proposed which comprises a receiver signal suppressor and a
suppressor control unit for determining the amount of suppression used in the transmitter signal
15-04-2019
7
suppressor.
[0018]
The invention further proposes an echo reduction program which is described by a computer
readable code and which causes the computer to carry out the echo reduction method.
Generally, the abnormal sound is usually up to several hundred ms at the longest, and the voice
to be transmitted to the other party continues for one second or more.
Therefore, in the present invention, by utilizing these properties, the sound quality deterioration
of the reception voice signal due to the abnormal sound on the listener side is prevented.
When the observation is performed with a fixed time width by calculating the frequency of the
received sound with a fixed time width and the transmitted sound with a certain time width from
the judgment result of the received sound determination and the transmitted sound
determination Even if the power is observed, it is possible to distinguish between the abnormal
sound and the voice by determining whether it is a continuous sound or a single sound, and in
the case of the abnormal sound, it is not judged as a voice, and In the case, it is determined that
there is a voice, and control of the insertion loss is performed.
[0019]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS FIG. 1 shows an embodiment of
the present invention, and the parts corresponding to those in FIG. According to the present
invention, the intensity of the incoming signal is measured by the intensity measuring means 10
for incoming signal, and the intensity is used as an input to determine the presence / absence of
incoming speech by the incoming speech judging means 13 every sample. For example, the
presence / absence of a call reception is determined as reception silence if the strength of the
reception signal is less than a predetermined value, and as reception presence if the strength of
the reception signal exceeds a predetermined value. Further, the intensities of the echo signal
and the transmission signal are measured by the echo signal intensity measurement means 12
and the transmission signal intensity measurement means 11, respectively. Using the echo signal
strength and the transmission signal strength, the transmission determination means 14
15-04-2019
8
determines the presence or absence of transmission for each sample. For example, if the
difference between the echo signal strength and the transmission signal strength is less than a
predetermined value, it may be determined that the difference between the echo signal strength
and the transmission signal strength exceeds a predetermined value. If so, it is determined that
there is voice transmission. The reception state counting means 15 and the transmission state
counting means 16 respectively count the number determined to be the reception state and the
transmission state.
[0020]
The reception state counting means 15 and the transmission state counting means 16 count
respective states from the past T seconds to the present. Assuming that the current time is k, the
count result of the reception state is Cr (k), and the count result of the transmission state is Cs
(k), the ratio of the number of reception states during T seconds (voice signal identification time)
(reception state frequency) Similarly, assuming that the ratio of the transmission state
(transmission state frequency) is Rs (k), Rr (k) = Cr (k) / (T * fs) (8) ) Rs (k) = Cs (k) / (T * fs) (9).
However, fs is a sampling frequency (a unit is Hz). When the reception state frequency Rr and the
transmission state frequency Rs (k) exceed predetermined values Thr and Ths, respectively, that
is, Rr (k)> Thr & Rs (k)> Ths (10) is satisfied. If it is, it is regarded as a two-way simultaneous call,
and according to the specifications of the acoustic echo canceller set in advance, if it is a
transmission priority, a loss is inserted into the reception signal side by the reception signal
suppression means 8 to attenuate the reception signal. . In the case of the reception priority, the
transmission signal suppression means 9 inserts a loss to the transmission signal side to
attenuate the transmission signal. Next, when the reception state frequency Rr (k) exceeds the
predetermined value Thr and the transmission state frequency Rs (k) is the predetermined value
Ths or less, that is, Rr (k)> Thr & Rs (k) If <= Ths (11) is satisfied, it is regarded as presence /
absence of speech and transmission silence, and loss is inserted into the transmission signal side
by the transmission signal suppression means 9 to attenuate the transmission signal . Also, when
the reception state frequency Rr (k) is less than or equal to the predetermined value Thr and the
transmission state frequency Rs (k) exceeds the predetermined value Ths, that is, Rr (k) <= Thr &
Rs ( k)> If Ths (12) is satisfied, it is regarded as reception silence and transmission speech, and a
loss is inserted to the reception signal side by the reception signal suppression means 8, and the
reception signal is attenuated.
[0021]
Finally, if the reception state frequency Rr (k) and the transmission state frequency Rs (k) do not
15-04-2019
9
exceed the predetermined values Thr and Ths respectively, that is, Rr (k) <= Thr & Rs (k) <= Ths If
(13) is satisfied, both incoming and outgoing voices are considered silent, and a loss is inserted
into the receiving or sending side according to a predetermined acoustic echo canceller
specification. The interval T for calculating the frequency is, for example, one second, and the
frequency thresholds Thr and Ths are, for example, 0.25. Further, the square of the
instantaneous value is used for the transmission judgment and the reception judgment. That is, in
the equations (1), (4), and (5), ptx = pty = pte = 0.0, respectively.
[0022]
Computer simulations were performed to show the effectiveness of the present invention. As a
condition of the simulation, as in the case of FIG. 6 and FIG. 7, a reception signal from the other
party exists for 10 seconds, and a transmission signal by the near-end listener starts from 5
seconds after the simulation starts, and exists for 5 seconds thereafter. Do. Three abnormal
sounds are emitted by the near-end listener and collected by the microphone 3 within 5 seconds
from the start of the simulation. In addition, as a specification of the acoustic echo canceller, a
"transmission priority" specification is used in which transmission signals are preferentially
passed. FIG. 2 shows the result of simulation of the echo suppression method according to the
present invention. The method of calculating the reception signal strength Pr (k) and the
transmission signal strength Ps (k) is the same as in FIGS. 6 and 7. Here, ptx = pty = pte = 0. In
FIG. 2, it can be seen that due to the abnormal sound, the reception signal is not attenuated, and
the voice on the listener side uttered after 5 seconds is not attenuated after about 1 second and
is not attenuated thereafter.
[0023]
FIG. 3 shows an embodiment of the present invention in the case where the reproduction signal
is N-channel (N is an integer of 2 or more). Parts common to FIG. 1 are assigned the same
reference numerals. The difference from the case where the reproduction signal is one channel is
shown below. In the case of an incoming signal for one channel, the signal input to the intensity
measuring device 10 for incoming signal is only one, while in the case of an N channel incoming
signal, the signal input to the intensity measuring device 10 for incoming signal Is N pieces. The
following equation (1) ′ is used for the reception judgment in place of the equation (1). Px (k) =
ptx * Px (k-1) + (1-ptx) * (x1 (k) * x1 (k) + x2 (k) * x2 (k) +... + XN (k) * xN ( k)) (1) ′ Using the Px
(k) calculated by the equation (1) ′, the presence / absence of speech is determined by the
equations (2) and (3). The subsequent procedure is the same as that of the case where the
reproduction signal is one channel, and when it is finally determined by the suppression control
15-04-2019
10
means 17 that “insert a loss to the reception signal side”, the N channel reception signal
suppression means 81 to 8N. Gives the same amount of loss, and attenuates the N channel
received signal.
[0024]
Also in the case of the N channel shown in FIG. 3, the same function and effect as the
embodiment shown in FIG. 1 can be obtained. The echo reduction method according to the
present invention described above is realized by causing a computer to execute an echo
reduction program described by a computer readable code. The program is installed in a
computer from a recording medium such as a CD-ROM or a magnetic disk, or is taken into a
computer through a communication line and installed, and is executed by an operation means
such as a CPU.
[0025]
As described above, according to the echo suppression method of the present invention, an
instantaneous loud sound such as striking a desk with a pen is picked up by the microphone
when reported from the other end of the call Even in this case, the reception signal from the
other party is not attenuated, and the sound quality deterioration of the reception signal can be
prevented in advance.
15-04-2019
11
Документ
Категория
Без категории
Просмотров
0
Размер файла
23 Кб
Теги
description, jp2003324787
1/--страниц
Пожаловаться на содержимое документа