close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2010171985

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010171985
To achieve high echo suppression ratio (ESR) levels. A speaker receives a speaker signal x (t). The
microphone 104 receives a microphone signal d (t) that includes the local signal s (t) and the
echo signal x (t). The echo signal x (t) depends on the loudspeaker signal x (t). The microphone
signal d (t) is filtered in parallel by the first adaptive filter and the second adaptive filter having
echo cancellation characteristics complementary to each other. The minimum echo output e (t) is
determined from the output e (t) from the first adaptive filter and the output e (t) from the
second adaptive filter. The energy of the minimum echo output is smaller and the correlation
between the minimum echo output and the loudspeaker signal x (t) is smaller. A microphone
output is then generated using the minimum echo output e (t). [Selected figure] Figure 1A
Echo and noise cancellation
[0001]
Claim of Priority The present application claims the benefit of US Patent No. 6,099, 686, which is
common to and co-assigned by the present application and is also pending at the same time as
the present application, the entire disclosure of which is incorporated herein by reference. This
application claims the benefit of US Pat. No. 5,075,015 to which the present application and the
assignee are common and pending at the same time as the present application, the entire
disclosure of which is incorporated herein by reference. The present application also claims the
benefit of U.S. Patent Application Publication No. 2003 / 019,269, which is common to the
present application and assigned to the same time as the present application, the entire
disclosure of which is incorporated herein by reference. The present application also claims the
benefit of U.S. Patent Application Publication No. 2004/01998, which is common to and assigned
15-04-2019
1
by the present application and is concurrently filed with the present application, the entire
disclosure of which is incorporated herein by reference. The present application also claims the
benefit of U.S. Patent Application Publication No. 2003 / 019,269, which is common to and coowned by the present application and is concurrently filed with the present application, the
entire disclosure of which is incorporated herein by reference. The present application also
claims the benefit of U.S. Patent Application Publication No. 2003 / 019,269, which is common
to and co-assigned by the present application and is concurrently filed with the present
application, the entire disclosure of which is incorporated herein by reference. The present
application also claims the benefit of U.S. Patent Application Publication No. 2003 / 019,269,
which is common to and co-owned by the present application and is concurrently filed with the
present application, the entire disclosure of which is incorporated herein by reference. The
present application also claims the benefit of U.S. Patent Application Publication No.
2004/01998, which is common to and co-owned by the present application and is concurrently
filed with the present application, the entire disclosure of which is incorporated herein by
reference. The present application also claims the benefit of U.S. Patent Application Publication
No. 2004/01998, which is common to the present application and assigned to the same time as
the present application, the entire disclosure of which is incorporated herein by reference. The
present application also claims the benefit of U.S. Patent Application Publication No. 2003 /
019,269, which is common to and co-owned by the present application and is concurrently filed
with the present application, the entire disclosure of which is incorporated herein by reference.
[0002]
U.S. Patent Application No. 11 / 381,728, Shadon Mao, "ECHO AND NOISE CANCELATION", filed
May 4, 2006, (Attorney Docket No. SCEA05064US00) U.S. Patent Application No. 11 / 381,729,
Shadon Mao, "ULTRA SMALL MICROPHONE ARRAY ", filed May 4, 2006, (Attorney Docket No.
SCEA05062US00) US Patent Application No. 11 / 381,725, Shadon Mao," METHODS AND
APPARATUS FOR TARGETED SOUND DETECTION ", filed May 4, 2006, (Deputy No. 11 /
381,727, Shadon Mao, "NOISE REMOVAL FOR ELECTRONIC DEVICE WITH FAR FIELD
MICROPHONE ON CONSOLE", filed May 4, 2006, (Attorney Docket SCEA05073US00) US Patent
Application No. 11 / 381,724, Shadon Mao, "METHODS AND APPARATS FOR TARGETED SOUND
DETECTION AND CHARACTERIZATION", filed May 4, 2006, (Attorney Docket No.
SCEA05079US00) US Patent Application No. 11 / 381,721, Shadon Mao, "SELECTIVE SOUND
SOURCE LISTENING IN CONJUNCTION WITH COMPUTER INTERACTIVE PROCESSING ", 20 Filed
May 4, 2006, (Attorney Docket No. SCEA 04005 JUMBOUS) PCT Application PCT / US06 /
17483, Shadon Mao, "SELECTIVE SOUND SOURCE LISTENING IN CONJUNCTION WITH
COMPUTER INTERACTIVE PROCESSING", filed May 4, 2006, ( Attorney Docket No. SCEA 04005
JUMBOPCT) U.S. Patent Application No. 11 / 418,988 Shadon Mao, "METHODS AND
APPARATSUS FOR ADJUSTING A LISTENING AREA FOR CAPTURING SOUNDS", filed May 4,
2006, (Attorney Docket No. SCEA-00300) U.S. Patent Application No. 11 / 418,989; Shadon Mao,
"METHODS AND APPARATSUS FOR CAPTURING AN AUDIO SIGNAL BASED ON VISUAL IMAGE",
15-04-2019
2
filed May 4, 2006, (Attorney Docket No. SCEA-00400) U.S. Patent Application No. 11 / No. 429,
047, Shadon Mao, "METHODS AND APPARATESS FOR CAPTURING AN AUDIO SIGNAL BASED ON
A LOCATION OF THE SIGNAL", filed May 4, 2006, (Agent Attribution No. SCEA-00500)
[0003]
TECHNICAL FIELD OF THE INVENTION The present invention relates to acoustic signal
processing, and more particularly to echo and canceling in acoustic signal processing.
[0004]
Many portable electronic devices, such as interactive video game controllers, can handle bidirectional audio signals.
Such devices typically comprise a microphone that receives the local speech signal s (t) from the
user of the device, and a speaker that emits a speaker signal x (t) that can be heard by the user.
In order to miniaturize the video game controller, it is desirable to install the microphone and the
speaker relatively close (for example, within 20 cm). On the other hand, the user may be located
further away from the microphone (e.g. 3 to 5 meters, etc.). The microphone produces a signal d
(t) that includes both the local speech signal s (t) and the speaker echo signal x1 (t). In addition to
this, the microphone may receive background noise n (t). Therefore, the entire microphone signal
is d (t) = s (t) + x1 (t) + n (t). Because it is relatively close to the loudspeaker, the microphone
signal d (t) may be occupied by the loudspeaker echo signal x1 (t).
[0005]
In telecommunications applications, speaker echo is a widespread phenomenon, and echo
suppression and echo cancellation are relatively mature approaches. The echo suppressor
operates when it detects the presence of a voice signal going in one direction on the line and
inserts a large loss in the other direction. Usually, when the echo suppressor at the far end of the
line detects voice from the near end of the line, the echo suppressor adds this loss. This added
loss can prevent the loudspeaker signal x (t) from being re-sent into the local speech signal d (t).
15-04-2019
3
[0006]
Echo suppression, although effective, often leads to several problems. For example, the local
speech signal s (t) and the remote speaker signal x (t) often occur simultaneously, at least for a
short time. This situation is also called double talk. The situation where only the remote speaker
signal is present is also called remote single talk. As each echo suppressor detects voice energy
from the far-end of the circuit, this usually results in loss being inserted in both directions
simultaneously, blocking calls on both sides. To prevent this, the echo suppressor can be
configured to detect only audio activity from near-end speakers. This ensures that losses are not
inserted (or only smaller losses are inserted) when the near-end speaker and the far-end speaker
are talking simultaneously. Unfortunately, this temporarily cancels the effect of the original echo
suppressor.
[0007]
In addition, because echo suppressors alternately insert and remove losses, there is often a small
delay when a new speaker starts to speak, and the sound towards the beginning of the speaker's
speech is clipped . Furthermore, when the far-end partner's surroundings are noisy, the near-end
speaker can hear the background sound when the far-end speaker is speaking, but when the
near-end speaker starts speaking, the echo suppressor makes its background sound Suppress.
This causes the near-end user to have an impression as if the line was disconnected because the
background sound suddenly disappears.
[0008]
An echo cancellation method has been developed to address the above mentioned problems.
Echo cancellation uses analog or digital filters to remove unwanted noise and echoes from the
input signal, producing a filtered signal e (t). In echo cancellation, complex algorithmic
procedures are used to compute speech models. This procedure comprises the steps of inputting
the microphone signal d (t) and part of the remote signal x (t) into the echo cancellation
processor, predicting the speaker echo signal x1 (t), and the microphone And a subtraction step
from the signal d (t). The echo prediction scheme must be learned by the echo cancellation
processor in a process known as adaptation.
[0009]
15-04-2019
4
The effect of such an approach is measured by the echo suppression ratio (ESR). This is simply
the ratio of the true echo energy received by the microphone to the residual echo energy
remaining in the filtered signal x1 (t) (typically expressed in decibels). According to International
Telecommunications Union (ITC) defined criteria, for remote single talk, an echo level attenuation
of at least 45 dB is required. During double talk (or during strong background noise) this
attenuation level may be as low as 30 dB. However, these recommended criteria were developed
in systems where the user generating the local speech signal is closer to the microphone. Thus,
the recorded signal-to-noise ratio (the ratio of target voice energy to echo noise energy) is often
better than 5 dB. These recommended criteria for applications such as video game controllers
where the user is 3 to 5 meters away and loudspeakers closer than 0.5 meters from the open
microphone produce large echoes Is not true. In such applications, the signal to noise ratio will
be from -15 dB to less than -30 dB. For remote single talk, an ESR of 60 dB or more may be
required, and for double talk, an ESR of 35 dB or more may be required. Existing echo
cancellation techniques can not achieve such high ESR levels.
[0010]
Accordingly, there is a need in the art for echo cancellation systems and methods that overcome
the aforementioned disadvantages.
[0011]
SUMMARY OF THE INVENTION To overcome the aforementioned disadvantages, embodiments
of the present invention aim at echo cancellation methods and apparatus in a system having a
speaker and a microphone.
The speaker receives the speaker signal x (t). The microphone receives a microphone signal d (t)
that includes the local signal s (t) and the echo signal x1 (t). The echo signal x1 (t) depends on the
loudspeaker signal x (t). The microphone signal d (t) is filtered in parallel by the first adaptive
filter and the second adaptive filter having echo cancellation characteristics complementary to
each other. The minimum echo output e3 (t) is determined from the output e1 (t) from the first
adaptive filter and the output e2 (t) from the second adaptive filter. The energy of the minimum
echo output is smaller and the correlation between the minimum echo output and the
loudspeaker signal x (t) is smaller. A microphone output is then generated using the minimum
echo output e3 (t). As an option, residual echo cancellation and / or noise cancellation may be
applied to the minimum echo output.
15-04-2019
5
[0012]
FIG. 1 is a schematic view of an echo cancellation apparatus according to an embodiment of the
present invention. FIG. 1B is a schematic diagram of a voice activity detection adaptive filter that
may be used in the echo cancellation apparatus of FIG. 1A. FIG. 1B is a schematic diagram of an
adaptive filter with cross-correlation analysis that may be used in the echo cancellation
apparatus of FIG. 1A. It is a flowchart explaining the echo cancellation method concerning
embodiment of this invention. FIG. 7 is a flow chart illustrating another method for echo
cancellation according to an embodiment of the present invention. FIG. 7 is a schematic view of
an echo cancellation apparatus according to another embodiment of the present invention.
[0013]
DESCRIPTION OF THE SPECIFIC EMBODIMENTS The following detailed description includes
specific details for the purpose of illustration, but many variations and modifications may be
made to the details described below within the scope of the present invention. That is to be
understood by one of ordinary skill in the art. Therefore, the description of the embodiments of
the present invention described below does not lose the generality of the invention described in
the claims, and the following description is described in the claims. There is no restriction
imposed on the invention.
[0014]
According to an embodiment of the present invention, a new configuration of integrated echo
and noise canceller with two functionally identical filters is proposed. These filters involve
orthogonal controls and representations. In such a configuration, the two orthogonal filters
complement each other to increase the robustness of the overall system in noisy hands-free voice
communications.
[0015]
In particular, the integrated echo noise canceller uses two separately controlled subsystems in
15-04-2019
6
parallel. Each of these subsystems has an orthogonal control mechanism. The echo noise
canceller includes a front echo canceller and a backup echo canceller. The front echo canceller
uses double talk detection. In order to ensure that it is robust to local speech, the front echo
canceller offers a conservative adaptation approach but offers smaller echo suppression and
slower adaptation to speech and echo changes . The backup echo canceller uses cross correlation
to measure the similarity between the error signal and the echo signal. The backup echo
canceller takes an aggressive strategy so that the filter can be updated quickly. The backup echo
canceller is unstable with respect to local voice / noise, as it may over-adapt while providing a
large echo suppression. Integration of the outputs of these two echo cancellers is performed
based on cross-correlation analysis to determine which of the echo cancellers and the echo signal
have a large difference. Also in this integration, the filter stability of both echo cancellers is
checked. If one filter is overpredicted or underpredicted, then that filter is complemented by the
other filter. Such systems are designed to ensure that one filter works properly at any time.
[0016]
The system may optionally include an echo residual noise predictor that takes a similar approach.
The echo residual noise predictor uses in parallel two independent sub-predictors with
orthogonal control. The first predictor is based on a robust double-talk detector echo-distancemismatch. The first predictor is unstable due to double talk detection errors while being
relatively accurate. The second predictor is based on cross-spectrum analysis. The prediction of
the second predictor is biased but stable, independent of local speech detection, and consistent.
In combining the predictions of these two residual echoes, a min / max approach is taken for farend speech only or for double talk, respectively.
[0017]
FIG. 1A is a diagram showing an audio system 99 using an echo cancellation apparatus 100
according to an embodiment of the present invention. The operation of apparatus 100 may be
understood by reference to the flowchart of method 200 shown in FIG. 2A and method 220
shown in FIG. 2B. Audio system 99 generally includes a speaker 102 and a microphone 104 that
receive remote signal x (t). The local sound source 101 emits a local speech signal s (t). The
microphone 104 receives both the local speech signal s (t) and the echo signal x1 (t) associated
with the speaker signal x (t). The microphone 104 also receives noise n (t) originating from the
environment in which the microphone 104 is located. The microphone 104 then generates a
microphone signal d (t). The microphone signal d (t) will be given by d (t) = s (t) + x1 (t) + n (t).
15-04-2019
7
[0018]
The echo cancellation apparatus 100 generally includes a first adaptive echo cancellation filter
EC (1) and a second adaptive echo cancellation filter EC (2). Each adaptive filter receives a
microphone signal d (t) and a speaker signal x (t). As shown in FIGS. 2A-2B, filter EC (1)
adaptively filters microphone signal d (t) as shown in step 202 and filter EC (2) as shown in step
204. In parallel with the first filter EC (1), the microphone signal d (t) is adaptively filtered. As
used herein, a filter "operating in parallel" refers to receiving substantially the same input d (t).
Parallel operations are distinguished from serial operations in which the output of one filter is
the input of the other filter. Depending on the state of the two filters EC (1), EC (2), one filter acts
as the main "front" filter and the other acts as the "backup" filter. One filter takes a cautious
approach to echo cancellation while the other takes a more aggressive approach.
[0019]
The states of the filters EC (1), EC (2) will be understood in connection with the following signal
model. y (t) = x (t) <*> h (n) d (t) = y0 (t) + s (t) e (t) = d (t) -y (t) where y (t) Is an echo synthesized
by the echo canceller filter. x(t)は、ラウドスピーカにおいてプレイするエコーである。 h
(n) is an adaptive filter function of the echo canceller filter. d (t) is the noisy signal received by
the microphone. y0 (t) is a true echo that appears at the microphone. s (t) is a local voice. And e
(t) is an echo canceled residual signal generated by the echo canceller filter.
[0020]
The two filters EC (1), EC (2) have complementary echo cancellation properties. As used herein,
"with complementary echo cancellation" means that in two adaptive filters that receive the same
input, when one filter is not well adapted to the input, the other filter is successfully applied to
the input. It means the case where it is adapted. In the context of this application, a filter function
h (n) "well adapted" means that the filter function h (n) is stable and converges to a true echopath-filter. It is when you are not over-predicting or under-predicting.
[0021]
15-04-2019
8
If h (n) converges to a true echo pass filter (y (t) ~ = y0 (t)), ie if the predicted echo is
approximately equal to the true echo, then the coherence function α Using, the state of the echo
canceller filter EC (1), EC (2) will be quantified. α is related to the cross correlation between y (t)
and e (t), and equation 1 holds. (Expression 1) <img class = "EMIRef" id = "205972386-00003" />
Here, "E" is a statistical expectation value. The operator shown in Equation 2 represents a cross
correlation operation. (Eq. 2) <img class = "EMIRef" id = "205972386-000004" /> The cross
correlation is defined by the equation 3 for the discrete functions fi and gi. (Equation 3) <img
class = "EMIRef" id = "205972386-000005" /> Here, the sum is taken for an integer j of an
appropriate value, and the asterisk represents multiple conjugation. The cross correlation is
defined by equation 4 for the continuous functions f (x) and g (x). (Equation 4) <img class =
"EMIRef" id = "205972386-000006" /> Here, the integral is taken for an appropriate value of t.
[0022]
In the coherence function α, the molecules exhibit a cross-correlation of e (t) and y (t). The
denominator represents the autocorrelation of y (t) and acts as a normalization term.
[0023]
Ideally, α should be close to “0” if h (n) converges (since the residual signal e (t) does not
contain y (t)). If h (n) does not converge, then α should be close to 1 (since e (t) contains a strong
echo of y (t)). If h (n) behaves strangely or diverges, then α should be negative (because e (t)
contains a strong echo with a phase shift of 180 degrees because of the divergence of the filter)
).
[0024]
Thus, for example, the value of the coherence function α may be used to define four possible
states for the states of the filters EC (1), EC (2). However, it is not limited to this. (1) When the
filter h (n) is stable and converges and is neither over-predicted nor under-predicted, 0 <= α <=
0.1 (2) Although the filter h (n) is stable, When under-predicted (not yet converged) α> 0.2 (3)
When filter h (n) is over-predicted, α <−0.1 (4) filter h (n) It will be understood by those skilled
in the art that when diverging, a range of different values of alpha may be determined for these
different states.
15-04-2019
9
[0025]
If the state of the filter is good (e.g., state (1)), its settings may be saved for later recovery. The
front and backup echo cancelers exchange their roles if the filter is diverging, or under-predicted
or over-predicted. While the front filter acts as a backup, the backup filter plays the role of a
front filter. This exchange eventually causes both filters to converge faster and settle more
dynamically, as one filter takes a cautious adaptive approach and the other takes an aggressive
adaptive approach.
[0026]
In addition, if the filter is under- or over-predicted, the adaptation step size will increase or
decrease by a small delta value so as to accelerate or slow down the adaptation speed for faster
convergence or better stability of tracking. It may be done. Usually, larger step sizes are needed
to speed up the convergence. This sacrifices good tracking of the details and lowers the echo
suppression ratio ESR. When focusing more slowly with a small step size, it is more stable, has
the ability to track even small changes, but is not suitable for tracking echo dislocations quickly.
[0027]
The combination of dynamic step size and front / backup filter switching provides a good overall
system balance in terms of fast tracking versus detailed tracking, stability versus convergence.
These two are really important twin tasks in adaptive system design.
[0028]
If one of the filters diverges, the settings of the other filter may be replicated to re-initialize the
diverging filter if the other filter is in good condition. Alternatively, the diverging filter may be
recovered using the good state filter settings previously saved.
[0029]
15-04-2019
10
For example, echo canceling adaptive filters EC (1) and EC (2) may be based on frequency
domain normalized least squares adaptive filters. However, it is not limited to this. Each filter
may be implemented as hardware, software, or a combination of hardware and software.
[0030]
Figures 1B and 1C show examples of suitable complementary adaptive filters. Specifically, FIG.
1B shows an adaptive echo cancellation filter 120 with voice activity detection. The filter 120
can be used as a first adaptive filter EC (1). The filter 120 includes a variable filter 122 having a
finite impulse response (FIR) filter characterized by filter coefficients wt. The variable filter 122
receives the microphone signal d (t) and filters it according to the value of the filter coefficient wt
to produce a filtered signal d '(t). The variable filter 124 predicts the desired signal by convolving
the input signal with the impulse response determined by the factor wt1. Each filter coefficient
wt 1 is updated at regular intervals of the amount Δwt according to the update algorithm 124.
As an example, the filter coefficients wt may be selected such that the filter signal d '(t) attempts
to predict the loudspeaker echo signal x1 (t) as the desired signal. A difference unit 126 subtracts
the filtered signal d '(t) from the microphone signal d (t) to provide a predicted signal e1 (t). The
prediction signal e1 (t) predicts the local speech signal s (t). The filtered signal d '(t) may be
subtracted from the remote signal x (t) to produce an error signal e (t). The error signal e (t) is
used by the update algorithm 124 to adjust the filter coefficients wt. The adaptation algorithm
124 generates a correction factor based on the remote signal x (t) and the error signal. Examples
of coefficient update algorithms include least squares (LMS) and recursive least squares (RLS). In
the LMS update algorithm, for example, the filter coefficients are updated based on the equation
wt1 + 1 = wt1 + μe (t) x (t). Here, μ is a step size. Initially, wt1 = 0 for all wt1. Note that in this
example the quantity μe (t) x (t) is the quantity Δwt. As mentioned above, the step size μ may
be adjusted dynamically by the state of the adaptive filter. Specifically, if the filter is underpredicted, the step size μ may be increased by a small amount of delta to accelerate the
adaptation speed to converge faster.
Alternatively, if the filter is over-predicted, the adaptation step size μ may each be reduced by a
small delta amount to slow down the adaptation speed so that the tracking is more stable.
[0031]
The time domain representation e (t) x (t) is a multiplication. This calculation may be
15-04-2019
11
implemented in the frequency domain as follows. Initially, e (t), x (t) and h (n) may be
transformed from the time domain to the frequency domain, for example by means of a fast
Fourier transform (FFT). E(k)=fft(e(t)) X(k)=fft(x(t))
H(k)=fft(h(n))
[0032]
The LMS update algorithm in the actual frequency domain is as follows. H (k) = H (k) + (μ <*>
conj (X (k)). <*> E (k)) / (Δ + X (k) <*> conj (X (k)) where μ is the filter adaptation step size and
is dynamic. conj (a) indicates the complex conjugate of the complex number a. <*> Indicates
complex multiplication. And, Δ is a regulator that prevents the denominator from becoming
numerically unstable.
[0033]
In the above equation, "conj (X (k)). <*> E (k) "executes the" e (t) x (t) "task. In the denominator,
“X (k) <*> conj (X (k))” plays a role of normalizing for the purpose of enhancing stability.
[0034]
The voice activated detection VAD adjusts the updating algorithm 124 to adjust the variable filter
122 when the remote signal x (t) is present (eg, if it is greater than or equal to a predetermined
threshold). ) May be adaptively filtered. An adaptive filter with voice activated detection
(sometimes referred to as double talk detection) as shown in FIG. 1B is a filter that adapts
relatively slowly. However, this filter is also very accurate in that it produces few false positives.
The complementary adaptive filter for filter 120 may be, for example, a filter that adapts
relatively quickly, but often tends to generate false positives.
[0035]
As an example, FIG. 1C shows an adaptive filter 130 that is complementary to the filter 120 of
FIG. 1B. Adaptive filter 130 includes a variable filter characterized by filter coefficients wt2 and
an update algorithm 134 (e.g., the LMS update algorithm described above). Filter 132 attempts to
15-04-2019
12
predict the speaker echo signal x1 (t) as the desired signal. The difference unit 136 subtracts the
filtered signal d '(t) from the microphone signal d (t) to provide a predicted signal e2 (t) that
predicts the local speech signal s (t). The filtered signal d '(t) may be subtracted from the remote
signal x (t) to generate an error signal e (t). The error signal e (t) is used by the update algorithm
134 to adjust the filter coefficient wt2. In filter 130, cross correlation analysis CCA adjusts
update algorithm 134 such that variable filter 132 attempts to reduce the cross correlation
between predicted signal e2 (t) and speaker echo signal x (t).
[0036]
When e2 (t) and x (t) are very strongly correlated, the filtering process is said to be underpredicted and the update algorithm 134 is adjusted to increase Δwt2. When the crosscorrelation between e2 (t) and x (t) is below the threshold, the filtering process is said to be overpredicted and the update algorithm 134 is adjusted to reduce Δwt2.
[0037]
An adaptive filter that uses cross-correlation analysis (also referred to as cross-spectrum
analysis) of the type shown in FIG. 1C adapts the filter relatively quickly. However, this filter is
also unstable in that it often produces false positives. Thus, filter 120 and filter 130 are examples
of complementary filters.
[0038]
Refer again to FIG. 1A. The integrator 106 is connected to the first adaptive filter EC (1) and the
second adaptive filter EC (2). The integrator 106 is configured to determine a minimum echo
output e3 (t) from the respective outputs e1 (t), e2 (t) of the first and second adaptive filters. The
minimum echo output e3 (t) is one of e1 (t) and e2 (t), which is smaller in energy and smaller in
correlation with the speaker signal x (t). When one of the energy of e1 (t) and e2 (t) is smaller,
but the correlation with x (t) is smaller than the other, the smaller the correlation, the minimum
echo output e3 ( used as t). For example, if one of the filters is over-predicted (i.e., the energy
output is small because it tends to cancel the target speech), the correlation should be small
regardless of the energy. The minimum energy may be determined by determining the minimum
of E {e1 (t)} and E {e2 (t)}. Here, E {} indicates an operation that determines the expected value of
the amount in parentheses. Refer again to FIGS. 2A-2B. In step 206, cross-correlation analysis is
15-04-2019
13
performed on e1 (t) and e2 (t) to determine which of e1 (t) and e2 (t) is less cross-correlated with
the speaker signal x1 (t) It is also good. Cross correlation analysis may include the step of
determining the minimum of Equations 5 and 6 below. (Expression 5) <img class = "EMIRef" id =
"205972386-000007" /> (Expression 6) <img class = "EMIRef" id = "205972386-000008" />
Here, the operator of expression 7 (expression 7) <img class = "EMIRef" id = "205972386000009" /> represents, for example, an operation that cross-correlates between the quantities on
both sides of the operator as defined above. The minimum echo output e3 (t) may be used as the
filtered output of the microphone 104.
[0039]
In some situations, one of the filters EC (1), EC (2) may over-filter the local signal. In such
situations, the filter is said to be "divergent". This can actually happen, especially when EC (2) is a
cross correlation filter of the type as shown for example in FIG. 1C. To address this possibility, it
is determined in step 208 whether EC (2) diverges. As an example, the integrator 106 may be
configured to determine whether the second adaptive echo cancellation filter has removed the
local signal s (t) by filtering excessively. This can be done by examining the expected value of the
cross-correlation between e2 (t) and the loudspeaker echo signal x1 (t). That is, it is expressed by
equation 8. (Expression 8) <img class = "EMIRef" id = "205972386-000010" /> Typically,
expression 9 holds. (Expression 9) <img class = "EMIRef" id = "205972386-000011" /> However,
when Expression 10 is less than a certain threshold (for example, about 0.2), EC (2) filters
excessively. Thus, the local signal s (t) is eliminated. (Equation 10) <img class = "EMIRef" id =
"205972386-000012" /> In such a situation, the integrator 106 may select e1 (t) as the
minimum echo output e3 (t). In order to stabilize the adaptive filtering process, at step 212, the
filter coefficient wt2 of EC (2) may be set as the filter coefficient wt1 of EC (1). And, at step 215,
EC (2) may be re-initialized to a state known to be zero or previously well adapted. For example,
the filter coefficients may be stored at regular intervals (eg, about every 10 to 20 seconds) and
used to re-initialize EC (2) as it diverges.
[0040]
Usually, when the cross correlation filter does not diverge, it is said to be well adapted. Since EC
(2) and EC (1) have complementary filtering properties, EC (1) will be under-predicted when EC
(2) is well adapted. In order to stabilize the adaptive filtering process, as shown in step 214, the
filter coefficients wt1 of the first adaptive filter EC (1) are replaced with the filter coefficients wt2
of the second adaptive filter EC (2). When the filter is implemented in software, the coefficients
wt1, wt2 may be stored in memory at the location specified by the pointer. The coefficients wt1,
15-04-2019
14
wt2 may be exchanged, for example, by switching pointers to wt1 and wt2.
[0041]
The minimum echo output e3 (t) may include some residual echo xe (t) from the loudspeaker
signal x (t). Device 100 optionally includes first and second echo residual predictors ER (1) and
ER (2) coupled to integrator 106, and echo residual predictors ER (1) and ER (2). A connected
residual echo cancellation module 108 may be included.
[0042]
The first echo residual prediction unit ER (1) generates a first residual echo prediction ER1 (t)
including cross-correlation analysis between the minimum echo output e3 (t) and the speaker
signal x (t). May be configured. From the cross-correlation analysis between the minimum echo
output e3 (t) and the loudspeaker signal x (t) as shown in step 222 of FIG. 2B, for example, by
determining the value of equation 11, the first residual The echo prediction ER1 (t) may be
determined. (Expression 11) <img class = "EMIRef" id = "205972386-000013" /> Here, the value
of Expression 11 is true when e3 (t) minimizes the expected value of the cross correlation of
Expression 12. It is. (Equation 12) <img class = "EMIRef" id = "205972386-000014" /> This
minimization problem will be essentially realized by adaptation. For example, assume that the
echo residual prediction unit ER (1) is in the initial state a unit filter (all values "1"). In all frames,
the first residual echo prediction ER1 (t) may increase as one goes towards the tangent direction
of the search surface. This may be realized by a Newton solver algorithm. The second residual
echo prediction unit ER (2) generates a second residual echo prediction ER2 (echo-distance
mismatch) between the minimum echo output e3 (t) and the speaker signal x (t). t) may be
configured to determine. From the echo distance mismatch between the minimum echo output
e3 (t) and the loudspeaker signal x (t) as shown in step 224 of FIG. 2B, for example, argmin (E
{(e3 (t)) <2> / By determining (x (t)) <2>}, the second residual echo prediction ER2 (t) may be
determined. Here, when e3 (t) minimizes the expected value of the quotient (e3 (t)) <2> / (x (t))
<2>, argmin (E {(e3 (t)) <2> / (X (t)) <2>} is true. Again, the minimization may be realized using a
Newton solver algorithm.
[0043]
The residual echo cancellation module 108 determines the minimum residual echo prediction
15-04-2019
15
ER3 (t) of the two residual echo predictions ER1 (t) and ER2 (t) and is filtered according to its
minimum value ER3 (t) The signal e3 (t) may be adjusted. As an example, the minimum residual
echo prediction ER3 (t) may be one of ER1 (t) and ER2 (t) that has the lowest energy and the
lowest correlation with x (t). For example, as shown in step 226 of FIG. 2B, it is set to the
minimum value of ER1 (t) and ER2 (t), and as shown in step 228, the resulting value of ER3 is e3
(t) And the residual echo cancellation filtered signal e3 '(t) is generated. If ER3 is equal to ER1 (t),
the residual echo xe (t) is minimized when the intensity of the local speech signal s (t) is not zero.
If ER3 (t) equals ER2 (t), the residual echo xe (t) is maximally removed (during only far-end
speech) when only the far-end echo x (t) is present .
[0044]
As an example, second-order norms N (1) and N (2) may be calculated for the two echo residual
predictors ER (1) and ER (2) respectively. N(1)=‖ER(1)‖ N(2)=‖ER(2)‖
[0045]
Under double talk situations, an echo residual predictor with lower norm may be applied to e3 (t)
to remove echo residual noise. Under single-talk situations, an echo residual predictor with a
larger norm may be applied to e3 (t) to remove echo residual noise.
[0046]
In echo cancellation, noise n (t) may be removed from the filtered signal e3 (t) or the residual
echo cancellation filtered signal e3 '(t). However, such noise cancellation may not be desirable.
The reason is that the remote recipient of signal e3 (t) or e3 '(t) may interpret the absence of
noise as an indication that all communication from microphone 104 has been lost. To address
this issue, the apparatus 100 may optionally include a noise canceller unit 110. The noise
cancellation module 110 may be configured to calculate the predicted noise signal n '(t) from the
microphone signal d (t), for example as shown in step 217 of FIGS. 2A-2B. The predicted noise
signal n '(t) may be attenuated by an attenuation factor α to form a reduced noise signal n "(t) =
αn' (t). The attenuated noise signal n ′ ′ (t) is added to e3 (t) as shown in step 218 of FIG. 2A,
or e3 ′ (t) as shown in step 230 of FIG. 2B. May be incorporated into the microphone output
signal s' (t).
15-04-2019
16
[0047]
In the embodiment of the present invention, the apparatus described in connection with FIGS.
1A-1C and the method described in connection with FIGS. 2A-2C are implemented as software on
a system having a programmable processor and memory. May be
[0048]
According to an embodiment of the present invention, a signal processing method operating as
described above, of the type described in connection with FIGS. 1 and 2A-B, is shown in FIG. It
may be implemented as a part.
System 300 may include processor 301 and memory 302 (eg, RAM, DRAM, ROM, etc.). The
signal processing device 300 may further comprise a plurality of processors 301 if parallel
processing is implemented. Memory 302 includes data and code configured as described above.
Specifically, program code 304 and signal data 306 may be stored in memory 302. The code 304
includes the echo canceling adaptive filters EC (1), ER (2), the integrator 106, the echo residual
filters ER (1), ER (2), the residual echo cancellation module 108, and the noise canceller 110
described above. It may be implemented. The signal data 306 may include the microphone signal
d (t) and / or a digital representation of the speaker signal x (t).
[0049]
The device 300 may also include well-known support functions 310 such as input / output (I / O)
elements 311, power supplies (P / S) 312, clocks (CLK) 313, cache memory 314. The device 300
may optionally include a mass storage device 315 such as a disk drive, a CD-ROM drive, a tape
drive for storing programs and / or data. The controller may also optionally include a display unit
316 and a user interface unit 318 to facilitate interaction between the controller 300 and the
user. The display unit 316 may be a cathode ray tube type or a flat panel screen. They display
text, numbers, graphic symbols and images. The user interface 318 may include a keyboard, a
mouse, a joystick, a light pen, and other devices. Further, the speaker 322 and the microphone
324 may be connected to the processor 301 via the input / output component 311. Processor
301, memory 302, and other components of system 300 may exchange signals (eg, code
instructions and data) with one another via system bus 320 as shown in FIG.
15-04-2019
17
[0050]
As used herein, the term input / output generally refers to any program, operation, or to transfer
data to / from system 300 and / or to / from peripheral devices. Refers to the device. All data
transfers could be considered as an output from one device and an input to another device.
Peripheral devices include devices that only input such as a keyboard and a mouse, devices that
only output such as a printer, and devices that operate as input and output devices such as an
overwritable CD-ROM. The term peripheral device includes external devices such as mouse,
keyboard, printer, monitor, microphone, game controller, camera, external Zip drive, scanner etc.
and internal devices such as CD-ROM drive, CD-R drive, internal modem etc. And other peripheral
devices such as a flash memory reader / writer and a hard drive.
[0051]
The processor 301 performs digital signaling on the signal data 306 in response to program
code instructions of the program 304 stored by the signal data 306 and the memory 302
acquired and executed by the processor module 301. A portion of the code of program 304 may
be one of a variety of different programming languages, such as assembly, C ++, Java, or many
other languages. The processor module 301 configures a general-purpose computer that is a
special purpose computer when executing a program such as the program code 304. Although
program code 304 has been described herein as being implemented as software executing on a
general purpose computer, task management may alternatively be performed using hardware
such as an application specific integrated circuit (ASIC). Those skilled in the art will appreciate
that the method is implemented. As such, it will be understood that embodiments of the present
invention may be implemented in whole or in part by software, hardware, or a combination
thereof.
[0052]
In one embodiment, among other things, program code 304 may include a set of processor
readable instructions to implement a method having features common to method 200 of FIG. 2A
and method 220 of FIG. 2B. Program code 304 may generally include the following instructions.
That is, the processor 301 causes the microphone signal d (t) to be filtered in parallel by the first
and second adaptive filters having complementary echo cancellation characteristics, and the echo
canceled output e1 (t) and e2 (t) , E1 (t) and e2 (t) to determine a minimum echo output e3 (t),
and a minimum echo output to generate a microphone output.
15-04-2019
18
[0053]
According to embodiments of the present invention, more robust yet accurate echo cancellation
is possible, which is possible only by cross-correlation analysis or only by voice activity detection
(double talk detection). Such an improved echo cancellation makes it possible to extract the local
speech s (t) from the microphone signal d (t) mostly occupied by the speaker echo x (t).
[0054]
Embodiments of the present invention may be used as presented herein and may be used with
other user input mechanisms. A mechanism for tracking or measuring azimuthal direction or
audio volume, and / or a mechanism for actively or passively tracking the position of an object, a
mechanism using machine vision, a combination of these, and the like. The tracked objects may
include ancillary controls or buttons that manipulate feedback to the system. Such feedback may
include, but is not limited to, light emission from the light source, distortion means for sound
quality, other suitable transmitters, modulators, controls, buttons, pressure pads, etc. It is not a
thing. It may affect the transfer and modulation of the same coding state and / or transfer
instructions to and from devices tracked by the system. Such devices may be part of, interact
with, or otherwise affect the system used in connection with embodiments of the present
invention.
[0055]
While the above is a complete description of the preferred embodiments of the invention, various
other variations, modifications, and equivalents may be substituted. Therefore, the scope of the
present invention should not be determined by the above description, but rather by the following
claims, the full equivalents of which are included in the scope. The features described herein may
be combined with any of the features described herein, whether preferred or not. In the following
claims, the quantity of each element is one or more, unless explicitly stated otherwise. Here, the
appended claims are construed to include means-plus-function limitations, except where
explicitly indicated in the given claim using the phrase "means for." It must not be done.
15-04-2019
19
Документ
Категория
Без категории
Просмотров
0
Размер файла
36 Кб
Теги
jp2010171985, description
1/--страниц
Пожаловаться на содержимое документа