close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2009094708

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2009094708
An audio signal processing apparatus of a speech communication system cancels each echo
sound of the other party speaker and the self-speech sound with high performance. An output of
an adaptive filter system having a function of canceling the echo sound of the other party
speaker is synthesized on the reference input side. The voice transmitted from the other party of
communication and the self voice picked up by the microphone can be amplified from the
speaker and output. Then, in the state in which the self-speaker speech is not picked up and the
other-speaker speech is input, the adaptive filter system executes the adaptive processing and the
output signal of the adaptive filter system is used as a reference input Do not combine with On
the other hand, in response to a state in which the self-speaker speech is picked up, the output
signal of the adaptive filter system is synthesized on the reference input side so that selfspeaking is performed, and adaptive processing of the adaptive filter system. Stop and almost
cancel the echo sound of the self-speaking sound. [Selected figure] Figure 5
Audio signal processing apparatus, audio signal processing method
[0001]
The present invention relates to an audio signal processing apparatus having an audio signal
processing function called so-called echo cancellation and a method thereof.
[0002]
In addition to hands-free calling with a telephone, and voice transmission / reception processing
system in a voice conference system, a video conference system, etc., it is configured to be able to
15-04-2019
1
make a call, talk, etc. between speakers who are at different places and positions. The voiced
sound system, which is also referred to as a speakerphone system, has already been put to
practical use and has become widespread.
In the above voice communication system, for example, communication terminals capable of
communicating with each other in accordance with a predetermined communication scheme are
arranged at a plurality of different places. In addition, the voice collected by the microphone on
one communication terminal side is transmitted to the other communication terminal, and the
other communication terminal on which this is received is emitted as sound from the speaker. .
This makes it possible to have conversations between speakers in remote places.
[0003]
However, in the voice communication system, the voice from the other communication terminal
device emitted from the speaker on one communication terminal device side is picked up again
by the microphone on the one communication terminal device side, and the other The sound is
emitted from the speaker of the communication terminal device. Then, such an operation is
repeated in a loop. This causes, for example, a phenomenon known as echo in which the voice
spoken by oneself can be mixedly heard like a echo in addition to the voice spoken by the other
party from the speaker. In addition, if the echo sound becomes large, the above-mentioned loop
is repeated infinitely and a phenomenon called howling occurs. In this manner, in the voice
communication system, the sound collected by the microphone is circulated by the
communication between the communication terminal devices, so that the voice quality of the
communication is degraded due to echo and howling, and the call system becomes difficult to
use, etc. You will have a problem.
[0004]
Therefore, it is known to provide an audio signal processing system for echo cancellation to a
speech communication system. As signal processing for such echo cancellation, what employ |
adopted the adaptive filter system is known. This adaptive filter system obtains the
characteristics of the impulse response for the transfer sound (echo path) between the speaker
and the microphone, and convolutes the above-mentioned impulse response to this input signal
with the sound to be emitted from the speaker as the input signal. This produces a signal
component of pseudo echo sound as an output. Then, the signal component of the echo sound is
picked up by the microphone and subtracted from the voice signal to be transmitted to the
15-04-2019
2
communication terminal apparatus on the opposite side. When the operation of such an adaptive
filter system converges, a voice whose echo sound has been canceled will be transmitted to the
communication terminal on the other party side. Therefore, the echo sound from the sound
emitted from the speaker will be transmitted. Will not be heard (cancelled).
[0005]
In addition, when using a conference system consisting of a speaker system, if the place where
the communication terminal device is installed is very wide, the voice of a certain speaker may be
transmitted from within the same venue. It can be considered that for a conference participant
who is at a distant position, the situation is such that it is too far to be heard easily. In response
to such a situation, as described in, for example, Patent Document 1, the communication terminal
apparatus causes the speaker to pick up the sound collected by the microphone on the near end
side and outputs the sound from the speaker. It is known to add the function (self-speech sound
output function) of a PA system. As a result, the voice of the near-end speaker input using the
microphone is amplified by the same near-end speaker and output, and the conference
participants are the speakers of the same conference room. It becomes possible to listen to the
voice clearly and loudly. Further, in Patent Document 1, the echo and the howling caused by the
circulation of the collected sound of the microphone by the communication between the
communication terminal devices and the echo and the howling of the self-speaking sound
generated by the above PA system are compatible. In order to output the picked up voice signal
by the microphone to the speaker side, a frequency dividing unit is provided, and the signal to be
output to the transmission signal encoding circuit to send the picked up voice signal to the other
party In the path, a configuration in which a gain regulator is provided is described.
[0006]
JP-A-9-307626
[0007]
The present invention is also premised on a configuration to which a self-speech sound output
function is added as an audio signal processing apparatus which is considered to constitute a
speech communication system.
Furthermore, as a configuration that cancels both echo (howing) caused by circulation of the
15-04-2019
3
collected voice of the microphone by communication between the communication terminal
devices and echo (howing) of the self-speaking voice, one having higher performance than
before. The goal is to make
[0008]
Therefore, in consideration of the problems described above, the present invention is configured
as an audio signal processing device as follows. That is, a predetermined process is performed
until the voice signal of the other party speaker received from the other party of communication
is received and the sound signal is emitted as a sound from the speaker while the collected voice
signal obtained by picking up the sound by the microphone is input as the desired signal. The
speaker output audio signal, which is an audio signal that has passed through the processing
steps, is input as a reference signal, and an output signal obtained by subtracting the cancellation
signal generated based on the reference signal from the desired signal is minimized. Adaptive
filter means adapted to execute adaptive processing so that the output signal becomes an audio
signal to be transmitted and output to the other party communication apparatus, and the output
signal of this adaptive filter means is a component of the audio signal for speaker output And
combining means for combining so as to be included as well as in a path until the output signal
of the adaptive filter means is combined by the combining means, Attenuation factor variable
means for variably setting the attenuation factor for excessive signal, the microphone sound of
the self-party speaker is not picked up by the microphone, and it is assumed that the voice signal
of the other speaker is included in the reference signal In response to the first voice state, the
adaptive processing of the adaptive filter means is made to be in an active tendency state, and
control is made so that the attenuation factor in the attenuation factor variable means is set to a
certain level or more. In the second voice state in which the speaker's voice is picked up by the
microphone, the adaptive processing of the adaptive filter means is made to be in a stop
tendency state, and the attenuation rate in the attenuation rate variable means is It comprises the
control means which controls to become the state set below a fixed level.
[0009]
In the voice signal processing apparatus having the above configuration, first, the adaptive filter
means is provided, so that in the voice communication system, an echo (other party speaker)
caused by circulation of the collected voice of the microphone by communication with the other
party (Echo sound) is given a function to cancel by adaptive processing. Moreover, in addition to
the fact that the voice transmitted from the other party of communication can be amplified and
outputted from the speaker by being provided with the synthesizing means, the voice signal
15-04-2019
4
processing device side is collected and the sound is collected by the microphone in a complete
form. A self-speech sound output function is provided in which the sound is amplified and output
from the speaker. In addition, the voice of the self-speaker (the voice of the speaker directly input
from the speaker to the microphone) is not picked up by the microphone, and the voice signal of
the other speaker is included in the reference signal. The adaptive processing of the adaptive
filter means is made to be in an active tendency state in response to the first voice state which is
supposed to be set, and the attenuation factor in the attenuation factor variable means is set to a
predetermined value or more Make it in the state. This means that the adaptive processing of the
adaptive filter means is effectively executed in a state in which the self-speaking sound output
from the speaker is suppressed, but this is an echo in the configuration without the self-speaking
sound output function. The operation is equivalent to the cancellation system. On the other hand,
in response to the second voice state in which the own-speaker voice is picked up by the
microphone, first, the damping rate in the damping rate variable means is set to a predetermined
value or less. Here, setting the attenuation factor in the attenuation factor variable means to a
certain value or less means that the speaker is made to actively output the self-speaking sound.
As a result, the self-speaker's voice is emitted as a sound from the speaker, and the self-speech
sound output function is performed. In addition, at this time, the adaptive processing by the
adaptive filter means is made to be in a state of stop tendency. That is, the responsiveness of the
adaptive processing at that time is made to be duller by a certain degree or more, but as long as
the adaptive filter means at this time is in a converged state, some adaptive filter means may The
self-speaking sound arriving at (i.e., the echo sound of the self-speaking sound) will be almost
properly canceled. The adaptive processing of the adaptive filter means in the state in which the
self-speaker speech is not picked up is to cancel the echo sound of the other party, that is, the
sound arriving via the spatial path from the speaker to the microphone. It is because it will
operate.
[0010]
Thus, according to the present invention, not only the echo sound of the other party's speaker
but also the echo sound of the self-speech sound as an audio signal processing apparatus of the
speech communication system to which the self-speech sound output function is added by the
adaptive filter. It is possible to cancel. Depending on the adaptive filter, high-quality, high-quality
echo cancellation effects can be obtained. That is, according to the present invention, high-quality
echo cancellation is realized for both the echo sound of the opposite speaker and the echo sound
of the self-speaking sound. Further, depending on the present invention, since one adaptive filter
means is shared by the cancellation of the echo sound of the opposite speaker and the
cancellation of the echo sound of the self-speaking sound, the amount of calculation and
resources can be reduced accordingly. Cost reduction and circuit scale reduction are expected.
15-04-2019
5
[0011]
As the best mode for carrying out the present invention (hereinafter referred to as an
embodiment), the present invention is applied to an audio transmission / reception system in a
television conference system (television conference system). In a teleconferencing system, in
general, communication terminal devices are installed in each conference hall in different places,
and from this communication terminal device, an image photographed by a camera device and a
voice collected by a microphone are transmitted to another communication terminal device It is
configured to receive images and sounds transmitted from other communication devices and
output them from the display device and the speaker, respectively. That is, the video conference
system includes a video transmission / reception system that mutually transmits and receives
images, and an audio transmission / reception system that mutually transmits and receives audio.
In the present embodiment, a communication terminal apparatus (voice communication terminal
apparatus) is provided which transmits and receives voice as the voice transmission / reception
system and is capable of outputting a loud sound from a speaker or the like.
[0012]
FIG. 1 shows an example of a system configuration of an audio transmission / reception system
in a video conference system, which is the basis of the present embodiment. In this case, two
places A and B, which are separated from each other, are used as the meeting place, and at each
of these places A and B, the voice communication terminal devices 1-1 and 1-2 forming the voice
transmission / reception system Will be installed. In this case, in places A and B, for example,
between persons in the same room and at a distance, for example, it is difficult to clearly hear the
other person's utterance contents with a voice volume of a normal conversation level. It is
assumed that the area is considerable. These voice communication terminal devices 1-1 and 1-2
are connected by a communication line corresponding to a predetermined communication
method to enable mutual communication. Further, the microphones 2-1 and 2-2 and the
speakers 3-1 and 3-2 are installed at the places A and B, respectively. The microphones 2-1 and
2-2 are for collecting the voices of the conference participants present in the places A and B,
respectively, and are provided at appropriate positions in each place. Speakers 3-1 and 3-2
respectively have voices of conference participants in other places (other party speaker voices)
and voices of conference participants using microphones 2-1 and 2-2 at the same place.
(Speaker's own speaker's voice) is amplified and output so that it can be heard by the conference
participants in that location, which are also provided at appropriate positions in each location. In
the following description, the voice communication terminal device 1, the microphone 2, the
speaker 3 and the like may be used when it is not particularly necessary to distinguish the same
15-04-2019
6
voice communication terminal device, the microphone and the speaker at the same distant place.
In the same way.
[0013]
First, at the place A, an audio signal obtained by collecting the sound by the microphone 2-1 is
input to the audio communication terminal device 1-1. The voice communication terminal device
1-1 transmits the input voice signal to the voice communication terminal device 1-2 via the
communication line. The voice communication terminal device 1-2 receives the voice signal
transmitted as described above, and outputs the voice signal from the speaker 3-2. As a result,
the meeting participants in the place B can listen to the voices of the meeting participants in the
place A. Similarly, the voice obtained by collecting the sound by the microphone 2-2 in the place
B is transmitted to the voice communication terminal 1-1 by the voice communication terminal 12. In the voice communication terminal device 1-1, the received voice signal is output from the
speaker 3-1. In this manner, the voice transmitting / receiving system of the video conference
system performs two-way voice communication, whereby, for example, a conference participant
in one place and a conference participant in another place ( It is possible to have a conversation
with the other speaker. Further, in the case of this video conference system, it is assumed that
there are a plurality of conference participants at each location, and for this purpose, all of the
conference participants at each location receive the voice of the other party. The speaker 3 is
provided so that it can be listened to. A system that outputs and monitors the other party's
speaker's voice from the speaker when performing interactive voice exchange in this way is also
referred to as a loudspeaker communication system or the like.
[0014]
Further, the voice communication terminal device 1 which is the basis of the present
embodiment is capable of causing the speaker 3 to output a voice signal acquired by the
microphone 2 as a loud sound. That is, in addition to the voice transmitted from the other voice
communication terminal device, the voice communication terminal device 1 in this case can also
output from the speaker 3 the voice collected by the microphone connected to itself. , Has a selfspeech sound output function. As mentioned above, although the places A and B which are the
conference halls in this case are assumed to be considerably wide, such a self-speaking sound
output function can be used in a certain conference as the same conference hall. It is provided
for the other conference participants to listen to the voice (speaker's own voice) when the
participant (the self-speaker) speaks with a sufficiently loud sound.
15-04-2019
7
[0015]
FIG. 2 shows a configuration example of the voice communication terminal device 1. For
confirmation, the voice communication terminal devices 1-1 and 1-2 shown in FIG. 1 have the
configuration shown in FIG. 2 in common. The voice communication terminal device 1 includes,
for example, a voice signal processing unit 11, a codec unit 12, and a communication unit 15, as
shown in this figure.
[0016]
An audio signal obtained by collecting the sound by the microphone 2 and an audio signal output
from the decoder 14 in the codec unit 12 described later are input to the audio signal processing
unit 11. Further, the audio signal processing unit 11 outputs an audio signal after an echo
cancellation process to be described later to the encoder 13 in the codec unit 12 and outputs an
audio signal to be output as a loudspeaker sound to the speaker 3. It is done. Note that, in
practice, it is an A / D converter that converts an analog audio signal obtained by collecting the
sound by the microphone 2 into a digital signal, or digital audio output from the audio signal
processing 11 Although a configuration such as a D / A converter, an amplification circuit, etc.
for converting a signal into an analog signal, amplifying it, and outputting it to the speaker 3
should be provided, here, the convenience of simplifying the explanation Above, illustration
about these parts is omitted. Further, each of these parts may be provided in the voice
communication terminal device 1 or may be provided for the device outside the voice
communication terminal device 1.
[0017]
As described above, the speech communication system, if used as it is, causes phenomena such
as echo and howling. That is, as shown in FIG. 2, the sound emitted into space from the speaker 3
reaches the microphone 2 through the space propagation path (echo path) E as the direct sound
and the indirect sound. That is, the voice of the other party (the other party's voice) transmitted
from the voice communication terminal on the other party of communication and emitted from
the speaker 3 is collected by the microphone 2 and is again transmitted to the other party's voice
communication terminal. Will be sent. Also on the other end of the communication, the sound
emitted from the speaker is collected by the microphone and transmitted to this voice
communication terminal. That is, in the loud speaker communication system, the sound once
15-04-2019
8
emitted into space is transmitted and received so as to circulate between the voice
communication terminals. As a result, the sound emitted from the speaker includes what the
voice you are currently speaking can hear like a echo with a certain delay time. This is an echo,
and if the loop is repeated more than a certain degree, howling occurs. Therefore, in the speech
communication system, it is practiced to provide an echo cancellation system for eliminating and
suppressing such an echo phenomenon. The audio signal processing unit 11 is configured to
have a signal processing function as the echo cancellation system. Further, the audio signal
processing unit 11 is also provided with a signal system for a self-amplifying sound for causing
the speaker 3 to output an audio signal obtained by collecting the sound by the microphone 2 in
response to the above-mentioned self-amplifying sound output function. ing. The voice signal
processing unit 11 is, for example, actually configured as a DSP (Digital Signal Processor).
Further, the configuration for echo cancellation in the audio signal processing unit 11 and the
signal system for the self-amplification sound will be described later.
[0018]
The audio signal subjected to the echo cancellation processing by the audio signal processing
unit 11 is input to the encoder 13 in the codec unit 14. The encoder 13 performs signal
processing such as audio compression encoding according to a predetermined method, for
example, on the input audio signal, and outputs the signal to the communication unit 15. The
communication unit 15 is configured to output the input transmission voice signal to another
voice communication terminal device via the communication line in accordance with a
predetermined communication scheme.
[0019]
Further, the communication unit 15 receives a transmission voice signal transmitted from
another voice communication terminal device, restores it to a voice signal of a predetermined
compression encoding format, and outputs the voice signal to the decoder 14 of the codec unit
12. The decoder 14 executes demodulation processing for compression encoding of the
reproduction audio signal input from the communication unit 15, converts the signal into a
predetermined PCM format digital audio signal, and outputs the digital audio signal to the audio
signal processing unit 11. The audio signal component output to the audio signal processing unit
11 in this manner eventually becomes the other party speaker audio output from the speaker 3.
[0020]
15-04-2019
9
FIG. 3 shows an example of a configuration that can be considered properly as the audio signal
processing unit 11 in which a self-speech loud-speaking function is added to the echo
cancellation function. In addition, in this figure, the microphone 2, the speaker 3, and the codec
unit 12 (the encoder 13 and the decoder 14) are shown together with the audio signal
processing unit 11.
[0021]
The audio signal processing unit 11 shown in FIG. 3 includes an adaptive filter system 20, a
transmission sound suppressor 23, a volume unit 24, and an adder 25. The adaptive filter system
20 and the transmission sound suppressor 23 correspond to the echo cancellation function, and
the volume unit 24 and the adder 25 correspond to the self-speech sound amplification function.
[0022]
First, the main content of the collected voice signal obtained by picking up the sound by the
microphone 2 is the sound emitted from the speaker 3 (that is, the other party's speaker voice as
echo sound) and the location on the self side. The conference participant of the present invention
uses the microphone 2, for example, and becomes a self-speaker voice that is picked up
(inputted) by speaking toward it. The adaptive filter system 20 is for removing and canceling the
opposite side speech as the above-mentioned echo sound in the collected sound signal, and
includes an adaptive filter 21 and a subtractor 22. The reference signal for the adaptive filter
system 20 is input to the input terminal of the adaptive filter 21. In this case, the output signal
from the decoder 14 is input. That is, an audio signal corresponding to the other party speaker's
voice transmitted from the other party communication terminal apparatus is input. Further, the
subtractor 22 is provided so as to subtract the output signal (signal for cancellation) of the
adaptive filter 21 from the sound collection voice signal by the microphone 2. Therefore, in the
adaptive filter system 20, it is a signal to be input to the subtractor 22 as a processing target, and
a self-voice obtained as a signal (desired signal) including a signal component to be canceled is
collected by the microphone 2. Sound signal. The output signal of the adaptive filter system 20 is
the output of the subtractor 22, but the output of the subtractor 22 fed back to the adaptive filter
21 is called an error signal or a residual signal.
[0023]
15-04-2019
10
In this configuration, the adaptive filter 21 of the adaptive filter system 20 receives the voice
signal of the other party's voice output from the decoder 14 as the reference signal. Although the
inside of the adaptive filter 21 is not shown in the drawings, the FIR (Finite Impulse Response)
digital filter according to the required order through which the above reference signal passes,
and the coefficients (filter coefficients of this digital filter) And a coefficient setting circuit
capable of variably setting. The output of the above digital filter is the output signal (signal for
cancellation) of the adaptive filter 21. Then, in the adaptive filter 21, the filter of the coefficient
unit of the order step required by the coefficient setting circuit is always obtained so that an
output signal (signal for cancellation) with which the residual amount indicated by the above
error signal is minimum is always obtained. The coefficients are changed and set. As a result, as
for the coefficient vector of the adaptive filter 21 (corresponding to the array of the coefficients
according to the order step), the voice signal of the other party's voice from the decoder 14 is
output from the speaker 3. Pseudo of a transmission path (hereinafter also referred to as
cancellation sound transmission path) until it is picked up by the microphone 2 and further input
as a processing target signal (desired signal) to the subtractor 22 of the adaptive filter system 20
It forms an impulse response that represents the transfer function. That is, this operation is an
operation of adaptively canceling the signal component of the sound obtained via the
cancellation sound transmission path according to the state of the processing target signal at that
time. The sound passing through the above-described cancellation sound transmission path is a
component of echo sound based on the voice signal of the other party's voice, as can be seen
from passing through the space propagation path E which is an echo path. Therefore, the output
signal (signal for cancellation) of the adaptive filter 21 is regarded as a pseudo echo of the voice
signal of the other party's voice. In the adaptive filter system 20, the subtractor 22 subtracts the
pseudo echo sound corresponding to the voice transmitted from the other party from the voice
collected by the microphone 2 on the own side. Thus, the audio signal processing unit 11
performs an operation of adaptively removing the component of the echo sound from the audio
signal of the self-side audio. Then, the voice communication terminal device 1 is configured to
transmit the voice signal from which the component of the echo sound has been removed to the
voice communication terminal device on the communication partner side.
As a result, the echo sound is also removed from the sound that is emitted from the speaker and
heard from the speaker by the voice communication terminal apparatus on the communication
partner side. In this way, an echo cancellation effect is produced.
[0024]
15-04-2019
11
In this case, the audio signal from which the echo sound component has been removed as
described above, that is, the output signal of the subtractor 22 is input to the encoder 13 via the
transmission sound suppressor 23. The transmission sound suppressor 23 is configured to be
able to output a level / gain attenuated with a set attenuation rate to an input audio signal, and is
provided, for example, to reinforce the above-described echo cancellation effect. That is, even
when the adaptive filter 21 converges and it is considered that a sufficiently effective echo
cancellation effect is obtained, in reality, some echo components may remain. In this way, in the
transmission sound suppressor 23, the adaptive filter 22 converges, and the content of the
collected voice signal does not include the self-speaker's voice, and possibly the echo sound of
the other-party's voice. When a so-called single talk state is detected, the input signal is given an
attenuation factor of almost 100% and is not output, or it is operated to give an attenuation
factor of a certain level or more and output Do. As a result, it is possible to make the other party's
communication terminal apparatus less likely or less likely to hear the echo residual component.
[0025]
The transmission voice signal to be input from the transmission sound suppressor 23 to the
encoder 13 is branched and input to the adder 25 via the volume unit 24. The adder 25
combines the transmission voice signal input as described above and the other party speech
voice output from the decoder 14 and outputs the synthesized voice signal to the speaker 3.
Here, as a typical situation to which the self-speaking sound output function corresponds, while
the conference participant at the place on the self side speaks toward this using the microphone
2, on the other side communication terminal device side , Suppose that the other party's speaker
does not speak into the microphone. In this situation, the content of the collected voice signal
obtained by picking up the sound by the microphone 2 is only the self-speaker's voice, and there
is no echo sound of the other-party's voice. Then, the collected voice signal is input from the
volume unit 24 to the adder 25 and output to the speaker 3 side. As a result, the speaker's 3 side
speaker's voice is emitted as a sound. Thus, the self-amplifying sound output function is
provided. The volume unit 24 is configured to change the level of the passing audio signal
according to, for example, a manual operation.
[0026]
However, in practice, in the configuration shown in FIG. 3, it is difficult to obtain a sufficient echo
cancellation effect on a steady basis. For example, as described above, the adaptive filter system
20 is configured to cancel the other party speaker's voice (echo sound) picked up by the
microphone 2 from the speaker 3 via the space propagation path E. Ru. Therefore, if the self-
15-04-2019
12
speaker voice is not collected and only the other-party speaker voice is collected by the
microphone 2, echo cancellation is properly performed as a result of the adaptive processing of
the adaptive filter system 20. You can get the condition you However, in the state where the
voice of the self-speaker is collected, the microphone 2 picks up the voice of the self-speaker not
passing through the above space propagation path E, and the adaptive filter system 20 An
operation is performed to minimize the desired signal by inputting the desired signal of the
subject's own speaker voice that is not the cancellation target and using the reference signal
corresponding to the other speaker voice. As a result, the filter coefficients and the like in the
adaptive filter system 20 are rather far away from the setting contents for canceling the other
party speaker's voice which is the original cancellation target. Further, in the case of having the
self-aural sound output function, the component of the self-speaker's voice is also collected from
the speaker 3 to the microphone 2 via the same space propagation path E. Therefore, echo sound
and howling can be generated also for the self-speaker speech. However, as described above,
when the self-speaker speech is picked up by the microphone 2, it is not possible to cancel the
echo sound passing through the space propagation path E. It also becomes difficult to expect a
good echo cancellation effect. Thus, the configuration shown in FIG. 3 has a problem that the
proper echo cancellation effect by the adaptive filter system 20 is impaired by providing the selfamplifying sound output function.
[0027]
Therefore, according to the present embodiment, in the case where the self-speaking sound
output function is provided, echo cancellation is performed well for both the echo sound of the
other party speaker's voice and the self-speaker's voice. . The configuration for this will be
described below.
[0028]
FIG. 4 shows a configuration example of the audio signal processing unit 11 according to the
present embodiment. In this figure, the same parts as those in FIG. 3 are denoted by the same
reference numerals, and the description thereof is omitted. In this figure, first, the configuration
for canceling the echo sound of the other party's speaker voice is composed of the adaptive filter
system 20 and the transmission sound suppressor 23 and is the same as FIG. Then, an adder 25
and a suppressor 26 for a self-speech sound are provided corresponding to the self-speech sound
output function. In this case, the volume unit 24 provided in FIG. 3 is omitted.
15-04-2019
13
[0029]
The speech signal suppressor 26 inputs a voice signal Y (k) ((k) indicates time) of the stage input
to the transmission sound suppressor 23 as an output signal of the adaptive filter system 20
(subtractor 22), As described later, a predetermined attenuation factor is variably given and
output as an audio signal Ys (k). The audio signal Ys (k) is input to the adder 25. The adder 25
inputs the audio signal Ys (k) and the audio signal Xd (k) output from the decoder 14, adds them,
combines them, and outputs the result as an audio signal X (k). The audio signal X (k) is input as
a reference signal to the adaptive filter system 20 (adaptive filter 21), branched, and also output
to the speaker 3 side. In FIG. 3, the audio signal from the decoder 14 side of the stage to be input
to the adaptive filter system 20 (adaptive filter 21) by the adder 25 and the output signal from
the adaptive filter system 20 side (output of the transmit sound suppressor 23) In FIG. 4, the
signal after the output of the decoder 14 and the output of the adaptive filter system 20 are
synthesized by the adder 25 is sent to the reference signal of the adaptive filter system 20 and
the speaker 3. Output signal.
[0030]
The signal path corresponding to the self-amplifying sound output function in this case is as
follows. For example, when a self-speaker voice is input to the microphone 2, the voice signal
component thereof is input to the self-speech suppressor 27 via the adaptive filter system 20.
The voice signal component of the self-speaker's voice having passed through the self-speech
suppressor 28 is output from the adder 25 to the speaker 3. As a result, the self-speaker voice
collected by the microphone 2 is emitted as a sound from the speaker 3 in the same place. That
is, the self-amplifying sound output function is realized.
[0031]
FIG. 5 shows an example of the processing procedure executed by the audio signal processing
unit 11 having the configuration shown in FIG. 4 during its operation. The process shown in this
figure is realized by a program called an instruction given to the DSP when the audio signal
processing unit 11 is configured by the DSP. In addition, when the process shown in this figure is
initially started, the adaptive process of the adaptive filter system 20 is also started in the
execution state. For confirmation, in the state where the adaptive filter system 20 is executing
the adaptive processing, the signal X (k) output from the adder 25 at that time is input as a
reference signal, and the subtractor 22 with the collected voice signal M (k) from the microphone
15-04-2019
14
2 input to the 22 as the desired signal so that the error signal (Y (k)) that is the output of the
subtractor 22 is minimized. Vary the coefficient vector of the FIR filter.
[0032]
First, in step S101, the level (value) of the error signal (audio signal) Y (k) output from the
subtractor 22 is the level of the audio signal X (d) input from the decoder 14 to the adder 25.
With respect to (value), it is determined whether or not a certain ratio or less (Y (k) ≦ Xd (k) * m
(m is a predetermined number due to a positive less than 1)) or not. Here, the state in which the
level of the voice signal Y (k) is lower than or equal to a constant rate with respect to the voice
signal X (d) means that the other party's speaker voice above a certain level regarded as valid
from the decoder 14 Voice signal is being input, while in the microphone 2, a state in which the
self-speaker voice of a certain level or higher that is enabled is not picked up (the other party's
single talk state (first voice state Correspond to)). That is, when an audio signal of the other party
speaker's voice considered to be effective is input from the decoder 14, the audio signal Xd (k)
which is the audio signal has a large level (amplitude) value of a predetermined level or more.
Will have. On the other hand, assuming that the adaptive filter system 20 converges on the voice
signal Y (k) in the state of canceling the echo sound of the other party speaker's voice, the space
propagation path E from the speaker 3 is The echo sound of the other-party speaker's voice
arriving at the microphone 2 via R.sub.2 is properly canceled, so the state becomes a very small
level.
[0033]
On the other hand, as states (talk states) other than the above-mentioned "state of other party
single talk", a. A state in which the voice signal of the other party speaker voice considered to be
valid is not output from the decoder 14 but a self-speaker voice above a certain level considered
to be valid is picked up by the microphone 2 (self-side single Talk state (referred to as second
voice state) b. A state in which the voice signal of the other party speaker voice considered to be
valid is output from the decoder 14 and the self-speaker voice above a certain level considered to
be valid is also picked up by the microphone 2 (double talk state (Called second voice state) c. A
state in which the voice signal of the other party's speaker voice considered to be valid is not
output from the decoder 14 and also the self-speaker voice above a certain level considered to be
valid is not picked up by the microphone 2 (non-talk state There will be three states of In these
states, the level of the audio signal Y (k) will exceed the above-mentioned constant ratio with
respect to the level of the audio signal Xd (k). That is, first, in the self-side single talk state, the
signal of the self-speaker voice collected by the microphone 2 passes without being canceled by
15-04-2019
15
the adaptive filter system 20, so the voice signal Y (k) is The level is correspondingly large
corresponding to the self-speaker speech. On the other hand, the audio signal Xd (k) has a very
small level because there is no output of the audio signal to be regarded as valid from the
decoder 14. Therefore, the error signal Y (k) becomes larger than the signal Xd (k) and exceeds
the above-mentioned fixed rate. Further, in the double talk state, although there is a certain
degree of difference, both the signal of the self-speaker's speaker voice obtained by collecting the
sound by the microphone 2 and the signal of the other-party speaker's voice from the decoder 14
The level difference between the audio signal Y (k) and the audio signal Xd (k) is smaller than
that in the other party single-talk state because the level is considered to be a certain level or
more considered to be effective. Become. Further, in the non-talking state, both the signal of the
self-speaker's speaker voice obtained by collecting the sound by the microphone 2 and the signal
of the other-party speaker's voice from the decoder 14 have a certain level or more at which
Even in this case, the level difference between the audio signal Y (k) and the audio signal Xd (k) is
smaller than that in the other party single talk state, and therefore the above-mentioned constant
ratio is exceeded. It becomes.
[0034]
If the determination result in step S101 is affirmative because the other party single talk state
has occurred, the process proceeds to step S102. In step S102, by setting the attenuation factor
to a certain level or more with respect to the suppressor 26 for self-amplification sound, the
input signal is cut off in the suppressor 26 for self-amplification sound and it is made equivalent
to not outputting.
[0035]
In step S103 following step S102, it is determined whether or not the adaptive filter system 20 is
in a state in which sufficient convergence has occurred. For example, if it is determined that the
coefficient vector in the FIR filter of the adaptive filter 21 has reached a predetermined state
regarded as sufficiently converged, a positive determination result is obtained here. Alternatively,
for example, after the adaptive filter 21 is configured to be able to output, for example, an
evaluation value as the degree of convergence for the state of convergence of itself, step S103 is
performed even if this evaluation value is referred to. Can be realized.
[0036]
15-04-2019
16
If it is determined in step S103 that the adaptive filter system 20 has not converged yet and a
negative determination result is obtained, the process proceeds to step S104 to execute the
adaptive processing for the adaptive filter system 20 (activation Control to be in the state of
tendency). For example, if the adaptive processing as the adaptive filter system 20 has been
executed until the time point of this step S104, in step S104, the previous adaptive processing is
continued. On the other hand, if the adaptive processing as the adaptive filter system 20 has
been stopped, execution of the adaptive processing is started in step S104. It should be noted for
confirmation that the signal-arranged state is set for the suppressor 26 for the self-amplifying
sound in step S102, and as the adaptive processing to be executed in step S104, as described
above, Good, good things are obtained.
[0037]
On the other hand, if it is determined in step S103 that the adaptive filter system 20 has
converged and a positive determination result is obtained, the process proceeds to step S105 to
stop the adaptive filter system 20 from executing the adaptive processing (stopping tendency
State of Also in this case, if the adaptive processing of the adaptive filter system 20 has been
executed at the time up to step S105, the adaptive processing is changed to a state in which the
adaptive processing is stopped in step S105. Also, if the adaptive processing has been stopped,
this state will be continued. Here, for example, when the state where the adaptive processing has
been performed is changed to the stop state in step S105, the coefficient vector of the FIR filter
in the adaptive filter 21 of the adaptive filter system 20 has a fixed setting state immediately
before the stop. Will be maintained. That is, the audio signal M (k) input to the adaptive filter
system 20 is output from the adaptive filter 21 (signal for cancellation) Ep (k) in the subtractor
22 with the coefficient vector thus fixed. And is output as an audio signal Y (k). In the case of the
other party single talk state, there is no particular problem even if the adaptive filter system is in
the converged state and the adaptive processing is continued. However, if the adaptive
processing is stopped as in step S105, for example, it is not necessary to execute the operation
required for the adaptive processing, so that the processing load and resources can be reduced. If
it is determined that the procedures of steps S104 and S105 are performed, the process returns
to, for example, step S101.
[0038]
If a negative determination result is obtained in step S101, that is, in the case of any of the selfside single talk state, the double talk state, and the non-talk state, the process proceeds to step
15-04-2019
17
S106. In step S106, as in the previous step S103, it is determined whether or not the adaptive
filter system 20 has converged. However, with regard to what degree of convergence the
adaptive filter system 20 is determined to be in a converging state in correspondence with the
other party's single talk state and the other talk states. Different conditions may be set in step
S103 and step S106. Furthermore, the discrimination process may be performed after setting the
condition of the convergence degree adapted to each of the single-talk side, the double-talk state,
and the non-talk state in the step S106 in practice.
[0039]
If an affirmative determination result is obtained in step S106, the process proceeds to step
S107, and a predetermined attenuation factor less than or equal to a predetermined value is set
for the suppressor 26 for a self-amplifying sound to input an input signal Make it equivalent to
passing it. On the other hand, when an affirmative determination result is obtained in step S106,
a predetermined attenuation rate corresponding to a predetermined level or more (not
necessarily the same attenuation rate as step S102) is set in step S108. In this case, it is assumed
that the input signal is shut off in the self-sounding sound suppressor 26 so as not to be output.
[0040]
After executing the procedures of steps S107 and S108, the adaptive processing of the adaptive
filter system 20 is stopped in step S109 in the same manner as step S105, and the process
returns to step S101. It should be noted for confirmation that, when the adaptive processing
which has been executed up to now is to be stopped by this step S109, in the adaptive filter 21 of
the adaptive filter system 20 as in the case of step S105. As for the coefficient vector of the FIR
filter, the setting state immediately before the stop is fixed and maintained.
[0041]
Depending on the processing of FIG. 5 described so far, depending on the call state (talk state) of
the voice communication terminal device 1 on the near end side, the execution of the adaptive
processing of the adaptive filter system 20 and the suppressor 26 for self-amplification sound ,
Will be controlled as follows. First, in the other party single talk state, the process of step S104
or step S105 is performed through steps S102 to S103. As a result, first, with regard to the
suppressor 26 for the self-amplification sound, a state in which the signal is blocked and not
15-04-2019
18
output in step S102 is set.
[0042]
The reason for setting the signal blocking state for the self-amplifying sound suppressor 26 in
accordance with the other party single talk state as described above is as follows. First, in the
other party single talk state, the self-speaker voice considered to be effective is not in the state of
being picked up, that is, no voice signal requiring self-speaking is obtained. Therefore, there is no
problem even if the signal interruption state is set for the self-amplifying sound suppressor 26.
Note that the audio signal processing unit 11 when the self-amplifying sound suppressor 26 is in
the signal blocking state in this way forms a circuit configuration equivalent to an echo
cancellation system that does not have a normal self-amplifying sound output function. It can be
said that Also, if the suppressor for self-amplifying sound 26 remains in the signal passing state,
it is included in the audio signal Y (k) when the actual adaptive filter system 20 is not sufficiently
converged in the other party single talk state. The residual component of the echo sound to be
output is again input to the adaptive filter 21 and the speaker 3 via the system (the self-speech
sound suppressor 26 and the synthesizer 25) for the self-speech sound output. The reference
signal for the adaptive filter system 20 should be only the audio signal from the decoder 14, and
the audio signal component reinputted as described above is not a component to be included in
the reference signal. For this reason, when the voice signal via the suppressor 26 for the selfamplification sound is input to the adaptive filter system 20 as a reference signal, the proper
adaptive processing of the adaptive filter system 20 may be disturbed. Also, in reality, even if the
adaptive filter system 20 converges sufficiently, a residual component of a certain amount of
echo sound may appear in the error signal Y (k). Therefore, by setting the signal blocking state
for the self-amplification sound suppressor 26 in step S102, the adaptive processing of the
normal and favorable adaptive filter system 20 is secured.
[0043]
In the other party single talk state, there may be a situation where, for example, the self-speaker
voice is temporarily picked up by the microphone 2 and transits to the double talk state.
However, in the other party single talk state, since the conference participant listens mainly to
the other party speaker voice outputted from the decoder 14, at that time, for example,
temporarily participates in a certain conference in the same place. Even if a person utters a voice,
the conference participant feels no sense that this can not be heard from the speaker. Therefore,
even if the state transition as described above occurs, there is no particular problem in setting
the signal blocking state for the self-amplifying sound suppressor 26.
15-04-2019
19
[0044]
Further, as described above, the transmission sound suppressor 23 is used to suppress the
residual component of the echo sound in the audio signal Y (k) output when the adaptive filter
system 20 converges. At this point, the adjustment of the attenuation factor in the transmission
noise suppressor 23 is correspondingly subtle and the control is also somewhat sophisticated.
For example, if an extreme attenuation factor is set, the possibility that the voice heard on the
other party's voice communication terminal will be unnatural is increased. On the other hand, in
the single-talk state on the other side, even if a strong attenuation factor of, for example, 100% or
near this is set for signal output cutoff for the suppressor 26 for the self-amplification sound,
There is no hindrance in any way.
[0045]
When the adaptive filter system 20 is not in the converged state as the determination result in
step S103 in the same single party talk on the other side, it is assumed that the adaptive filter
system 20 executes the adaptive processing (steps S103 and S104). When it is in the converged
state, the adaptive processing of the adaptive filter system 20 is stopped (steps S103 and S105)
First, the other party single talk state is the other party talker who should originally cancel A
voice signal component effective as voice is input to the near end. This means that, if the
adaptive filter system 20 does not converge, the adaptive filter system 20 is actively subjected to
adaptive processing so that the echo sound of the other party speaker's voice converges in a
canceled state. It can be said that it is time to do it. Therefore, when the adaptive filter system 20
is not in a converged state, the adaptive processing is performed. Then, in the present
embodiment, as described above, the processing of step S104 causes the signal suppressor 26 to
be in the signal blocking state, whereby the reference signal (X (k (k (k ) Is only the component of
the audio signal Xd (k) from the decoder 14). For this reason, the adaptive processing performed
corresponding to step S104 is an appropriate operation for canceling the original cancellation
target sound.
[0046]
Further, according to the process of FIG. 5, the self-speaking sound in the single-talk state on the
self side, the double-talk state, or the non-talk state corresponding to the case where the negative
15-04-2019
20
determination result is obtained in step S101. For the suppressor 26, the signal passing state is
set in response to the state in which the adaptive filter system 20 converges (S106, S107), and
the signal blocking state is set in response to the non-converging state (S106). , S108). In
addition, with regard to the adaptive filter system 20, the adaptive processing is uniformly
stopped (S109). The reason for setting the state of the audio signal processing unit 11 will be
described corresponding to each of the above three states.
[0047]
First, consider the correspondence between the double talk state and the self-side single talk
state. In the double talk state, the other party speaker voice considered to be valid is obtained as
a signal Xd (k), and the self-speaker voice to be valid is obtained as a voice signal (desired signal)
M (k). It is in the state of On the other hand, in the self-side single talk state, although the valid
self-speaker voice is obtained as a voice signal (desired signal) M (k), the other-party speaker
voice considered valid is the signal Xd ( This is common to the above-mentioned double talk state
in that the voice signal of the self-speaker's voice is obtained, which is not obtained as k).
[0048]
Thus, in the state where at least the voice signal of the self-speaker's voice is obtained, the voice
signal of the self-speaker's voice is reproduced as much as possible from the speaker 3 as long as
it has the self-voice amplification function. (Self-speaking) should be. From this point of view, it is
sufficient to set the signal passing state for the self-amplifying sound suppressor 26. However,
the adaptive filter system 20 originally inputs only the audio signal component corresponding to
the other party speaker's voice from the decoder 14 as the reference signal Xd (k), and also as
the desired signal M (k). It is possible to converge and cancel the echo sound of the other party
speaker's voice by inputting only the voice signal component of the voice that has reached the
microphone 2 from the speaker 3 via the space transmission path E (k). It is possible. Assuming
that the adaptive filter system 20 is not in convergence, and the self-sounding suppressor 26 is
in the signal passing state, in the double talk state, the reference signal X (k) of the adaptive filter
system 20 is: The signal Ys (k) output from the substantial amount of the self-amplification
suppressor 26, that is, the speech signal component of the self-speaker speech is included (in the
self-side talk state, the self-speaker speech The audio signal component becomes dominant).
Further, the desired signal M (k) also includes a considerable amount of the component of the
self-speaker's voice obtained by uttering toward the microphone 2 (in the self-side single talk
state, the self-speech Audio signal component of the person's voice becomes dominant). If the
adaptive processing of the adaptive filter system 20 is performed in this state, the echo sound of
15-04-2019
21
the other party speaker's voice, which is the original purpose of the adaptive filter system 20, can
not be converged to a cancelable state. Also, coefficient vectors far from convergence are set.
Then, in this double talk state, a large amount of echo sound of the other party's speaker's voice
will remain, and the sound heard from the speaker 3 will be very inconvenient. Further, in the
following, for example, when transitioning to the other party's single talk state, the time to reach
convergence also becomes longer accordingly.
[0049]
On the basis of this, in the double talk state or the self-side single talk state, first, when the
adaptive filter system 20 converges, the adaptive speech signal suppressor 26 is brought into the
signal passing state, and the adaptive filter system As for 20, the adaptive processing is to be
stopped. Thereby, first, the voice signal of the self-speaker's voice picked up by the microphone 2
passes from the adaptive filter system 20 through the suppressor 26 for the self-speech sound,
and further as a sound from the speaker 3 via the adder 25. It will be output. That is, it is output
as a self-speaking sound. However, at this time, the adaptive processing of the adaptive filter
system 20 is stopped in a state in which the converged state (coefficient vector) so far is fixed.
For this reason, the adaptive filter system 20 does not generate such a change that the speech
signal of the self-speaker speech enters the dominant reference signal Xd (k) and moves away
from the convergence state. At this time, the transmitted sound E (k) transmitted from the
speaker 3 to the microphone 2 via the space propagation path E appropriately includes or
becomes dominant the component of the self-speaker's voice. And this will occur as echo sound.
However, the echo sound of the self-speaker voice is also transmitted from the speaker 3 to the
microphone 2 via the space propagation path E. Therefore, since the adaptive filter system 20 is
fixed in the convergence state, the echo sound of the other-party speaker's voice and the echo
sound of the self-speaker's voice are appropriately canceled as well.
[0050]
In addition, when the adaptive filter system 20 does not converge, the signal blocking state is set
for the self-amplifying sound suppressor 26. Assuming that the voice of the side speaker is
output from the speaker 3, since the adaptive filter system 20 does not converge, a large amount
of echo sound of the voice of the side speaker remains, making it very difficult to hear. And the
possibility of howling will increase. Therefore, in this case, the echo sound and the howling are
suppressed as much as possible, and priority is given to the thing to be canceled, so that the
voice of the self-side speaker is not output from the speaker 3. In the double talk state, the echo
sound of the other party speaker's voice remaining depending on the degree of convergence will
15-04-2019
22
be heard, but compared to the situation where the echo sound of the self side speaker voice is
also added to this, more echo A sound-suppressed state is to be obtained. Also, at this time, the
adaptive processing of the adaptive filter system 20 is stopped, which causes the desired signal
M (k) to be contained (or dominated) by the component of the self-speaker speech. The adaptive
filter system 20 no longer changes in the direction away from convergence.
[0051]
Further, in the non-talking state, neither the voice signal of the other party's speaker voice nor
the voice signal of the own speaker's voice is obtained, and therefore, the valid reference signal X
consisting of the voice signal of the other speaker's voice Both of d) and the effective desired
signal M (k) consisting of the voice signal of the echo sound of the other party speaker's voice
are in a state where neither is obtained. In this case, even if adaptive processing is performed by
the adaptive filter system 20, the converging operation can not be obtained. Therefore, stopping
the adaptive processing of the adaptive filter system 20 prevents the adaptive filter system 20
from transitioning to a state further away from convergence, for example, to the other party
single talk state. At the time of transition, the adaptive processing can be started from the state
which is considered as closest to convergence as possible. In addition, when the adaptive filter
system 20 is in a converged state, a signal passing state is set for the self-amplifying sound
suppressor 26 so that, for example, from the non-talk state to the self-side single talk state or
double talk state. When transitioning to a state in which the voice signal of the self-speaker voice
is obtained as the signal M (k), for example, the self-speaker voice may be output from the
speaker 3 quickly without interruption of the beginning portion. It will be possible. In addition, if
the signal suppressor 26 is set to the signal stop state in response to the state in which the
adaptive filter system 20 is not converged, the single side talk state or the double talk state (self
side) When transitioning to a state in which the voice signal of the speaker voice is obtained as
the signal M (k)), the adaptive filter system 20 has not converged in the self-side single talk state
and the double talk state described above. The state of the audio signal processing unit 11
corresponding to the above is obtained.
[0052]
By adopting the configuration as the audio signal processing device 11 according to the present
embodiment in this manner, as long as the adaptive filter system 20 has reached a converged
state, not only in the other party single talk state but also in the double talk state. Also in this
case, it is possible to cancel the echo sound of the other party speaker's voice. Furthermore, in
the double talk state, the echo sound of the self-speaker's voice is also canceled. In addition, the
15-04-2019
23
echo sound of the self-speaker speech is canceled also in the self-side talk state. That is, it is
possible to properly cancel both the echo sound of the other party's speaker voice and the echo
sound of the own speaker voice. Also, in this case, for example, as in Patent Document 1, the echo
sound of the other party speaker's voice and the echo sound of the self-speaker's voice are used
instead of using the frequency division unit or the gain adjuster. In any of the above, since echo
cancellation is performed by the adaptive filter, for example, the sound after the echo
cancellation processing is an unnatural sound change due to an unnatural sound volume change
or an unnatural sound quality due to partial loss of the frequency band High quality products can
be obtained without any change. In addition, since the adaptive filter is generally realized by
digital signal processing, high sound quality can also be achieved by digital signal processing.
Thus, the echo cancellation function according to the present embodiment achieves high
performance as it corresponds to both the echo sound of the other party's speaker's voice and
the echo sound of the own's speaker's voice. Furthermore, in the configuration of the present
embodiment, an adaptive filter for canceling the echo sound of the other party's speaker's voice
is originally used also for canceling the echo sound of the own's speaker's speech . That is, no
adaptive filter is additionally provided for canceling the echo sound of the voice of the selfspeaker, and the amount of calculation and resources can be reduced accordingly.
[0053]
FIG. 6 shows a configuration example of the voice communication terminal device 1 as a
modified example of the embodiment. In this figure, an internal configuration example of the
audio signal processing unit 11 is shown as in FIG. First, in this figure, two microphones 2A and
2B are connected to the voice communication terminal 1. That is, as a system configuration, two
microphones 2 are provided for one voice communication terminal device 1. In practice, this
means that two microphones are provided at a place as one conference hall, but this allows the
conference participants to place the microphones 2A and 2B close to each other when speaking.
You will be able to use the person you are using, for example, the less you carry around the
microphone, the better the conference will go. Providing multiple microphones in this manner is
more effective as the conference room is wider.
[0054]
The audio signal processing unit 11 shown in FIG. 4 is configured to be provided with an echo
cancellation function and a self-speech sound output function corresponding to the connection of
the two microphones 2A and 2B. For this purpose, adaptive filter systems 20A and 20B,
transmission sound suppressors 23A and 23B, self-sounding sound suppressors 26A and 26B,
15-04-2019
24
and adders 25A, 25B, 27A, 27B and 28 are provided. Here, assuming that the microphone 2A in
FIG. 6 corresponds to the microphone 2 shown in FIG. 4, the adaptive filter system 20A, the
suppressor 26A for a self-speaking sound, the suppressor 23A for a transmission sound, and the
adder 25A among the above-described portions. Respectively correspond to the adaptive filter
system 20, the suppressor for self-voiced sound 26, the suppressor for transmitted sound 23,
and the adder 25 in FIG. In addition, in the configuration shown in FIG. 6, in response to the
addition of the microphone 2B, the adaptive filter system 20B, the suppressor 26B for a selfspeaking sound, the suppressor 23B for a transmission sound, and the adders 25B, 27A, 27B, 28
are further added. It is provided.
[0055]
First, after the voice pickup voice signal of the microphone 2B is input to the adaptive filter
system 20B, the suppressor 26B for a self-amplifying sound, the suppressor 23B for a
transmission sound, and the adder 25B together with an adaptive filter system 20A, a suppressor
for a self-speech 26A, the transmission sound suppressor 23A, and the adder 25A are connected
in the same connection manner.
[0056]
Further, in this case, the respective outputs of the transmission sound suppressors 23A and 23B
are synthesized by the adder 28, and then input to the encoder 13 as a transmission signal for
the voice communication terminal apparatus of the other party.
[0057]
In this case, the audio signal output from the decoder 14 is branched and input to the adders
27A and 27B.
In the adder 27A, the audio signal from the decoder 14 and the audio signal from the suppressor
26B for self-enlarging sound are synthesized and input to the adder 25A.
The adder 25A synthesizes the audio signal from the adder 27A and the audio signal from the
suppressor 26A for self-enlarging sound, outputs this audio signal as a reference signal for the
adaptive filter system 20A, and branches it to a speaker Output to 3
15-04-2019
25
[0058]
Further, the adder 27B synthesizes the audio signal from the decoder 14 and the audio signal
from the suppressor 26A for a self-enlarging sound and outputs the synthesized signal to the
adder 25B. The adder 25B synthesizes the audio signal from the adder 27B and the audio signal
from the suppressor 26B for a self-amplification sound, and outputs it as a reference signal of the
adaptive filter system 20B.
[0059]
In this configuration, first, the collected voice signal obtained by the microphone 2A is input to
the decoder 14 through the adaptive filter system 20A, the transmission sound suppressor 23A,
and the synthesizer 28, and the voice collected voice similarly obtained by the microphone 2B.
The signal is input to the decoder 14 through the adaptive filter system 20B, the transmission
sound suppressor 23B, and the synthesizer. With this signal system, for example, both the selfspeaker's voice that is picked up at a valid level by the microphone 2A and the self-speaker's
voice that is picked up at a valid level by the microphone 2B are It is possible to transmit to the
voice communication terminal.
[0060]
Further, in the configuration shown in FIG. 6, from the speaker 3, the other party speaker voice
based on the voice signal from the decoder 14 and the self-speaker voice collected by the
microphone 2A (first self-party voice The speaker voice) and the self-speaker voice (second selfspeaker voice) picked up by the microphone 2B are released. Then, via the space propagation
path E1, the component of each echo sound of the other party speaker voice, the first own party
speaker voice, and the second own party speaker voice from the speaker 3 to the microphone 2A.
Will arrive. Similarly, from the speaker 3 to the microphone 2B, the echo sound of the other
party speaker voice, the first self-speaker voice, and the second self-speaker voice via the space
propagation path E2. Components of will arrive. Then, in this modification, it is necessary to
cancel the echo sound components collected by the microphones 2A and 2B via the space
propagation paths E1 and E2.
[0061]
15-04-2019
26
In response to the above-mentioned need, first, the reference signal for the adaptive filter system
20A, that is, the output of the adder 25A, is the voice signal of the other party's voice from the
decoder 14 and the suppressor for self-voiced speech 26A. It is configured to be a combination
of the collected voice signal obtained by the obtained microphone 2A and the collected voice
signal obtained by the microphone 2B obtained through the suppressor 26B for a self-voiced
tone. The adaptive filter system 20A inputs this reference signal and performs adaptive
processing using the signal input from the microphone 2A as a desired signal. Similarly, the
reference signal for the adaptive filter system 20B, that is, the output of the adder 25B is the
voice signal from the decoder 14 and the collected voice signal obtained by the microphone 2B
obtained via the self-sounding voice suppressor 26B. It is configured to be synthesized with the
collected voice signal obtained by the microphone 2A obtained through the suppressor 26A for
self-amplifying sound. The adaptive filter system 20B inputs this reference signal and executes
adaptive processing using the signal input from the microphone 2B as a desired signal.
[0062]
Then, in this modification, a set of [adaptive filter system 20A, a suppressor 26A for a selfenlarging sound] corresponding to the microphone 2A, and an adaptive filter system 20A, a
suppressor 26A for self-enlarging sound corresponding to the microphone 2B. Each of the sets is
independently performed to execute the procedure shown in FIG.
[0063]
By adopting such a configuration, first, as an output of the adaptive filter system 20A (subtractor
22), an audio signal component corresponding to the following sound is appropriate from the
collected audio signal obtained by collecting the sound by the microphone 2A. A canceled signal
is obtained.
That is, the echo sound of the other party speaker's voice, the echo sound of the self-speaker
voice that has been picked up and returned effectively by the microphone 2A, and the voice
picked up effectively by the microphone 2B to reach the microphone 2A from the speaker 3 Selfspeaker speech is canceled. Further, as an output of the adaptive filter system 20B (subtractor
22), a signal in which an audio signal component corresponding to the following sound is
properly canceled can be obtained from a collected sound signal obtained by collecting the sound
by the microphone 2B. That is, the echo sound of the other party speaker's voice, the echo sound
of the self-side speaker voice collected and returned effectively by the microphone 2B, and the
self who has been picked up by the microphone 2A and reached the microphone 2B from the
15-04-2019
27
speaker 3 Side talker voice is canceled.
[0064]
Further, for confirmation, the self-speaker voice of a valid level is picked up by only one of the
microphones 2A and 2B, and the self-speaker of a valid level is picked up by the other
microphone. In the situation where no sound is picked up, the procedure of FIG. 5 is as follows.
Here, it is assumed that the microphone 2A is used and the microphone 2B is not used for the
input of the self-speaker voice. In this situation, the first self-speaker voice can be obtained at a
valid level for the sound collected by the microphone 2A. This means that a single-talk state on
the self-side or a double-talk state is present when viewed from the set side of the [adaptive filter
system 20A and the suppressor 26A for self-amplifying sound]. For this reason, the ratio of the
levels of the audio signal Y1 (k) and the audio signal Xd (k) falls within a certain range, as shown
in FIG. 5 by the set side of [Adaptive filter system 20A, suppressor 26A for self-enlarging sound].
A negative determination result can be obtained by step S101. Therefore, the suppressor 26A for
the self-amplifying sound is in the signal passing state if the adaptive filter system 20 is
converged, and is in the signal blocking state if the adaptive filter system 20 is not converged. In
addition, the adaptive filter system 20A stops the adaptive processing. On the other hand, if the
distance between the microphones 2A and 2B is appropriately set apart, the component of the
first self-speaker voice talking to the microphone 2A will not be picked up by the microphone 2B.
That is, even if the conference room is in a state corresponding to the single-talking state on the
self side or the double-talking state, in the system from the microphone 2B side, substantially no
talk state (other party talker This means that the voice signal of the voice is not obtained) or the
other party single talk state (if the voice signal of the other party speaker voice is obtained). If it
is desired to obtain the proper operation of the adaptive filter system 20 and the operation for
the self-speaking sound output, in such a case, with respect to the system on the microphone 2B
side, the non-talk state or the other party is required. It is preferable that the operation of the
adaptive filter system 20B and the suppressor 21B for a self-sounding sound corresponding to
the single talk state. In the present embodiment, if the procedure shown in FIG. 5 is applied, it is
accurately determined that the communication device is in the substantially non-talking state or
the other party single-talking state.
[0065]
That is, assuming that the set of [adaptive filter system 20B and suppressor 26B for selfenlarging sound] corresponding to the system on the microphone 2B side executes step S101 in
the procedure of FIG. Since the level difference between the audio signal Y1 (k) and the audio
15-04-2019
28
signal Xd (k) is actually small corresponding to the actual non-talking state, a negative
determination result can be appropriately obtained, and the step The procedure after S106 will
be performed. In other words, it is possible to obtain the setting state of the adaptive filter system
20 and the suppressor for self-voiced sound 26 corresponding to the non-talking state. In
addition, in response to the other party single talk state, the level of the audio signal Xd (k)
actually becomes considerably larger than that of the audio signal Y1 (k). Is obtained, and the
procedure after step S102 is executed, and in fact, it is possible to obtain the setting state of the
adaptive filter system 20 corresponding to the other party's single talk state and the suppressor
26 for a self-amplifying sound.
[0066]
In steps S102, S107, and S108 in the process of FIG. 5 to be executed by the audio signal
processing unit 11, the attenuation rate corresponding to the two states of the signal blocking
state and the signal passing state for the self-amplifying sound suppressor 26. However, in
practice, continuous control of changing the value of the damping rate (or a control value based
on the damping rate) may be performed. For example, a control value λ is defined, which
indicates the degree of signal passage in the self-amplifying sound suppressor 26. The control
value λ is assumed to be λ = 1 when the signal completely passes, and λ = 0 when the signal is
completely blocked. Then, in practice, in setting the attenuation factor for the self-amplifying
sound suppressor 26, for example, λ = (max (1, Y / Xd) * w (max (1, Y / Xd) is 1 and This means
that the larger value is selected between the level of the audio signal Y and the level of the audio
signal Xd, and the coefficient w indicates the degree of convergence of the adaptive filter
system). Thus, it is possible to set the attenuation rate of the suppressor 26 for the selfamplification sound more flexibly according to the control value obtained as described above.
Similarly, with regard to the adaptive filter system, in steps S104, S105, and S109 of FIG. 5, the
binary control is performed in which the adaptive processing is either executed or stopped.
Control can be performed. In other words, for adaptive processing, it is possible to make a
continuous transition between the state of tendency to activate this (state of active tendency) and
the state of tendency to stop or close to stop (state of stop tendency). can do. For this purpose,
for example, for the step size parameter μ, which is one of the parameters of the adaptive filter
system and for setting the coefficient update amount of the FIR filter, μ = (1−λ) * (max (1, Y /
The configuration can be such that the response speed of the adaptive processing of the adaptive
filter system 20 is changed by performing the operation represented by Xd). If such continuous
control is to be performed, for example, the above-mentioned other party single talk state, the
self side single talk state, the double talk state, and the intermediate state between the non-talk
states are also possible. An adapted signal processing operation can be obtained. For example,
even in the double talk state, if the adaptive filter system 20 does not converge in the talk state
where the self-speaker speech is small and it is assumed that the other party single talk state is
15-04-2019
29
close, While the attenuation rate is increased to some extent so that the self-speaking sound can
be suppressed, it is possible to activate the adaptation processing to a certain extent to direct the
convergence direction.
[0067]
Moreover, as an adaptive algorithm to be adopted for the adaptive filter system shown in FIG. 4
and FIG. 6, etc., in addition to those which have been known up to now, among the future
technologies proposed from now on It should be selected. Also, the adaptive filter system shown
in FIG. 4 and FIG. 6 shows the most basic configuration for the convenience of making the
description easy to understand, and in practice it is a more developed and improved
configuration. May be taken. Further, in the description of the embodiments described above, the
audio signal processing unit 11 is configured to execute the audio signal processing
corresponding to the entire audible band, for the convenience of making the description easy to
understand. For example, in practice, a picked up voice signal obtained by picking up a sound by
the microphone 2 and a voice signal received by the decoder 14 are divided for each
predetermined frequency band in practice. A configuration as shown in FIG. 4 or 5 may be
assigned to each frequency band, that is, a so-called filter bank configuration may be adopted.
[0068]
Further, in the above embodiments, the audio signal processing unit 11 as an echo canceller has
been described as performing digital signal processing, but at least a part of the same echo
cancellation operation is configured by an analog circuit, for example. The present invention is
also applicable to the case where Moreover, in the description of the embodiments so far, it is
assumed that two voice communication terminals 1-1 and 1-2 communicate in a one-to-one
relationship in the video conference system, but this will be described This is because the
simplest form of the video conference system is taken as an example in consideration of
simplicity. Therefore, in reality, it is conceivable to construct a teleconference system with three
or more voice communication terminals and perform one-to-many communication, but such a
system configuration is also based on the present invention. Are applicable to individual voice
communication terminals. Further, although the processing of the transmission voice signal and
the reproduction voice signal in the voice communication terminal device 1 is mainly performed
by digital signal processing, when the digital signal processing is performed, the transmission
voice signal and the reproduction voice signal are processed. There is no particular limitation on
the format. For example, in the case of outputting a reproduction audio signal, it may be
considered in some cases that a ΔΔ modulated bit stream type audio signal is reproduced by
15-04-2019
30
class D amplification. In the embodiment, a voice communication terminal device provided for
voice transmission and reception in a video conference system is taken as an example, but in
addition to this, for example, it is a voice conference system or hands-free call in a telephone
device The present invention is applicable to all devices that can be regarded as so-called loudspeaking communication systems, including functions and the like.
[0069]
It is a block diagram which shows the structural example of the audio | voice transmission-andreception system in the video conference system which becomes the basis of embodiment of this
invention. It is a block diagram which shows the internal structural example of the voice
communication terminal device shown by FIG. It is a figure which shows one structure as an
audio | voice signal processing part which added the self voice loud-speaking function to the
echo cancellation function. It is a figure which shows the structural example of the audio | voice
signal processing part as embodiment. It is a flowchart which shows the example of a procedure
which the audio | voice signal processing part of embodiment performs. It is a figure which
shows the structural example of the audio | voice signal processing part as a modification of
embodiment.
Explanation of sign
[0070]
1 (1-1, 1-2) voice communication terminal device, 2 (2-1, 2-2) microphone, 3 (3-1, 3-2) speaker,
11 voice signal processing unit, 12 codec unit, 13 Encoder, 14 decoder, 15 communication unit,
20 adaptive filter system, 21 adaptive filter, 22 subtracter, 23 (23A, 23B) suppressor for
transmission sound, 24 volume unit, 25 (25A, 25B), 27A, 27B, 28 addition , 26 (26A · 26B)
Suppressor for self sounding sound
15-04-2019
31
Документ
Категория
Без категории
Просмотров
0
Размер файла
56 Кб
Теги
description, jp2009094708
1/--страниц
Пожаловаться на содержимое документа