close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2011172081

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011172081
The present invention provides a speech communication apparatus, method, and program that
enable automatic volume control for each microphone with which the tracking performance does
not deteriorate without increasing the calculation amount and memory amount of an adaptive
filter. A main channel estimation unit 430 estimates one or more of microphones as a main
channel, an addition unit 440 adds a collected sound as a sound signal, and an echo cancellation
unit cancels an echo of the added sound signal 450. A voice detection unit 461 for detecting
voice from a voice signal subjected to echo cancellation, and a voice signal in which an initial
value is set based on the main channel when voice is detected, and an echo cancellation process
is performed with the initial value. To calculate the superposition gain, the gain superposition
unit 463 for superpositioning the superposition gain and the echo-canceled sound signal, and the
channel-by-channel superposition gain for making the superposition gain correspond to the main
channel A storage unit 464 is provided. [Selected figure] Figure 4
Speech communication method, speech communication device, speech communication program
[0001]
The present invention relates to a speech communication method, a speech communication
system, and a speech communication program for automatically controlling a speech level output
from a speaker according to a speech level or the like input to a microphone in
telecommunication using a sound signal. .
[0002]
15-04-2019
1
As a function of a loudspeaker communication apparatus used for teleconferencing, there is an
echo canceler which prevents the generation of acoustic echo using an adaptive filter as shown
in FIG.
Some echo cancelers do not require initial learning for an adaptive filter by using a voice switch
in combination as shown in FIG. For example, in the echo canceller shown in FIG. 1, the adaptive
filter 914 estimates a transfer path corresponding to the acoustic echo path 913 from the
speaker 911 to the microphone 912. The adaptive filter 914 synthesizes the pseudo echo signal
based on the estimated transmission path, and cancels the echo signal by subtracting the pseudo
echo signal from the echo signal. For example, in the echo canceller shown in FIG. 2, the BG
adaptive filter 924 performs adaptive processing, and the echo cancellation filter 925 estimates a
pseudo echo with the coefficients transferred from the BG adaptive filter 924. The voice switch
control circuit 926 mainly inserts a loss when the power is turned on, and reduces the amount of
the inserted loss as the adaptive filter converges (see Non-Patent Document 1).
[0003]
In addition, as a function of the loudspeaker communication device used for the
telecommunication conference, there is an automatic volume control device as shown in FIG. In
the automatic volume control apparatus of FIG. 3, first, the input signal is cut out by the cutout
window and stored in the buffer 931. Next, the audio signal identification circuit 932 identifies
whether the input signal is an audio signal. The volume amplification circuit 933 determines the
amplification factor of the input signal based on the identification result of the audio signal
identification circuit 932. The waveform superposition circuit 934 superposes a cosine window
based on the amplification factor determined by the volume amplification circuit 933 on the
input signal, and sets the superposition result as an output signal. An algorithm used for the
speech signal identification circuit 932 is a hidden Markov model, vector quantization, a neural
network or the like (see Patent Document 1).
[0004]
JP-A-8-250944
[0005]
Kitawaki Nobuhiko ed., "Future Network Technology Digital Voice and Audio Technology,"
Ohmsha Press, pp 218-255.
15-04-2019
2
[0006]
For example, when there are a plurality of speakers in a telecommunications conference etc. and
microphones are installed for each speaker, the distance between the speaker and the
microphone close to the speaker and the size of the speaker's voice are respectively Due to the
difference, the output sound level varies from one microphone to another, which makes it
difficult to hear the output sound.
[0007]
In order to solve this problem, it is necessary to perform automatic volume control individually
for each of a plurality of microphones.
However, while the adaptive filter operates by estimating the acoustic echo path as a linear
system, the automatic volume control as described above performs non-linear processing, so that
the control can not be disposed between the adaptive filter and the acoustic echo path.
Therefore, automatic volume control needs to be performed on the transmission network side
rather than the adaptive filter.
For this reason, when performing automatic volume control individually for each of a plurality of
microphones, it is necessary to input as many sound signals to the adaptive filter as there are
microphones, and the operation amount and memory amount of the adaptive filter are The
problem of increasing in proportion to the number of microphones arises.
[0008]
In order to solve this new problem, it is necessary to add input sound signals from a plurality of
microphones to one and then input them to the adaptive filter to reduce the operation amount
and memory amount of the adaptive filter. However, since automatic volume control as described
above takes time for control, when the microphones (main channel, hereinafter the same) which
pick up the sound changes frequently due to the change of the speaker, the following
performance is deteriorated. Furthermore, the subject called, arises.
15-04-2019
3
[0009]
The present invention has been made to solve these problems, and each of a plurality of
microphones whose tracking performance does not deteriorate due to frequent transition of the
main channel without increasing the operation amount and memory amount of the adaptive
filter. It is an object of the present invention to provide a speech communication method, a
speech communication system, and a speech communication program that enable individual
automatic volume control for
[0010]
The voice communication method according to the present invention includes main channel
estimation processing, addition processing, echo cancellation processing, speech detection
processing, superposition gain calculation processing, gain superposition processing, and
channel-by-channel superposition gain storage processing.
In the main channel estimation process, one or more of N (where N is an integer of 2 or more)
microphones are estimated as the main channel. In the addition process, the sound collected by
the microphone is added as a sound signal. In the echo cancellation process, the echo of the
added sound signal is canceled. In the voice detection process, voice is detected from the sound
signal subjected to the echo cancellation process. In the superposition gain calculation process,
when speech is detected, an initial value is set based on the estimated main channel, and the
superposition gain is calculated using this initial value and the sound signal subjected to the echo
cancellation process. . In the gain superposition process, the superposition gain calculated by the
superposition gain calculation process and the sound signal subjected to the echo cancellation
process are superposed. In the channel-by-channel superposition gain storage process, the
superposition gain calculated by the superposition gain calculation process is stored in the
channel-by-channel superposition gain storage unit in association with the estimated main
channel. Here, as the initial value set in the superposition gain calculation process, the past
superposition gain stored for each main channel in the channel-by-channel superposition gain
storage unit can be used at the time of the initial value setting process. In addition, the
microphones are divided into M (where M is an integer of 2 or more) groups so that each group
has a plurality of microphones, and for the sound collected by the microphones of each group,
Processing can also be performed.
[0011]
15-04-2019
4
According to the present invention, input sound signals from a plurality of microphones are
added by addition processing, and echo cancellation processing is performed on the added sound
signals, so the amount of operation and memory of the adaptive filter are not increased.
Furthermore, one or more of the N microphones are estimated as the main channel, and the
superposition gain is calculated using the initial value set based on the estimated main channel
and the sound signal subjected to the echo cancellation, and superposition is performed.
Therefore, the frequent change of the main channel does not degrade the tracking performance.
Therefore, the speech communication method according to the present invention enables
individual automatic volume control for each of a plurality of microphones, in which the tracking
performance does not deteriorate even if the main channel changes frequently, without
increasing the operation amount and memory amount of the adaptive filter. Do.
[0012]
The figure explaining a prior art example. The figure explaining a prior art example. The figure
explaining a prior art example. FIG. 2 is a block diagram showing the configuration of a
loudspeaker communication device according to Embodiment 1 and Modification 1; 3 is a
flowchart showing the operation of the loudspeaker communication device according to the first
embodiment. 10 is a flowchart showing the operation of the loudspeaker communication device
according to the modification 1; FIG. 6 is a block diagram showing the configuration of a
loudspeaker communication apparatus according to a second embodiment. 6 is a flowchart
showing the operation of the loudspeaker communication device according to the second
embodiment.
[0013]
Hereinafter, embodiments of the present invention will be described in detail.
[0014]
A loudspeaker communication apparatus and a loudspeaker communication method according to
a first embodiment of the present invention will be described with reference to FIGS.
FIG. 4 is a block diagram showing the configuration of the loudspeaker communication device
400 according to the first embodiment. FIG. 5 is a flowchart showing the operation of the
15-04-2019
5
loudspeaker apparatus 400 according to the first embodiment. Loudspeaker communication
device 400 includes a speaker 911, N (where N is an integer of 2 or more, the same applies
hereinafter) microphones 912-1 to N, a main channel estimation unit 430, an addition unit 440,
and an echo cancellation unit 450. , And an automatic volume control unit 460. The automatic
volume adjustment unit 460 includes a voice detection unit 461, a superposition gain calculation
unit 462, a gain superposition unit 463, and a channel-by-channel superposition gain storage
unit 464.
[0015]
The sound signal input from the network 470 to the loudspeaker communication device 400 is
loudspeakered from the speaker 911 via the echo cancellation unit 450. The microphones 912-1
to 912 -N convert the collected sound into a sound signal, and input the sound signal to the
addition unit 440 and the main channel estimation unit 430. The main channel estimation unit
430 estimates one or more of the N microphones 912-1 to N as a main channel (S 530). As a
main channel estimation method, there is a method of comparing time signal powers of sound
signals output from the microphones 912-1 to 912 -N and using a microphone with the
maximum power as the main channel. However, it is not necessary to limit to the said estimation
method, and if it is a method which can be estimated with high followability with respect to the
change of a speaker, other methods may be used. If the number of main channels to be estimated
is 1 or more and less than N, any number of main channels may be used. When the main
channels are estimated by comparing time signal powers, predetermined ones are sequentially
determined in descending order of time signal powers. A number of microphones may be used as
the main channel, or all microphones whose signal power has been output for a time exceeding a
predetermined threshold value may be used as the main channel. Also in this case, as long as it is
a method that can be estimated with good followability with respect to the change of speakers,
any other method may be used. The adding unit 440 adds the sound collected by the microphone
as a sound signal (S540). The echo cancellation unit 450 cancels the echo of the added sound
signal (S550). The echo cancellation unit 450 may be an echo canceller of the general type
shown in FIG. 1 or may be of a type using a voice switch in combination as shown in FIG.
[0016]
The voice detection unit 461 detects whether or not voice is present from the sound signal
whose echo has been canceled (S561). The voice detection method may be a method of
determining that voice is detected when a sound signal level exceeding the threshold is input.
The threshold used for detection is predetermined from the ambient noise level. However, it is
15-04-2019
6
not necessary to limit to the said detection method, and as with the method in the main channel
estimation unit 430, another method can be used as long as it can detect with good followability
to the change of the speaker.
[0017]
When speech is detected, the superposition gain calculation unit 462 sets an initial value based
on the estimated main channel (S 562 a), and superimposes this initial value and the sound signal
subjected to the echo cancellation process. The gain is calculated (S 562 b). If no sound is
detected, no adjustment of the output sound level is performed. The method of calculating the
superposition gain will be described in detail below. At time t, when the main channel is
estimated to be Cn, the calculated superposition gain is GCn [t], the input sound signal level from
the main channel Cn is VCn [t], and the target output of the sound signal to be output to the
network 470 Assuming that the sound signal level is Vd, the initial value is Gi, and the time
constant is α, the superposition gain GCn [t] is added by multiplying the forgetting factor: GCn
[t] = (1-α) × Vd / VCn [t ] + Α × Gi, it is calculated. In the above equation, the ratio Vd / VCn [t]
of the target output sound signal level to the input sound signal level is used, but another method
may be used.
[0018]
Here, as the initial value Gi, it is possible to use k as a positive integer, and to use the
superimposition gain GCn [t−k] of the corresponding main channel past k time. Therefore,
assuming that the initial value Gi is a superposition gain in the past k times, the superposition
gain at time t is: GCn [t] = (1−α) × Vd / VCn [t] + α × GCn [t−k] , Is calculated. Further, in
order to calculate the initial value Gi at time t, a plurality of superposition gains from time t−k to
time t−1 may be used. In this case, when the main channel is Cn at time t, the initial value Gi and
the superposition gain GCn [t] are calculated using the time constants αt, αt−1, αt−2,.
Calculated
[0019]
[0020]
GCn [t] = αt × Vd / VCn [t] + Gi where
15-04-2019
7
[0021]
[0022]
The method of calculating the superposition gain is not limited to the method represented by the
above equation, but using a method of smoothing and calculating the change of the
superposition gain with time using the superposition gains of the multiple corresponding main
channels. Also good.
[0023]
In addition, when the main channel estimation unit 430 sets a plurality of main channels to be
estimated, the estimated main channels are n of C1, C2, C3, ..., Cn, and the input sound signal
level from each main channel is VC1. When [t], VC2 [t], VC3 [t],..., VCn [t], the superposition gains
GC1, C2, C3,.
[0024]
[0025]
Can be calculated.
Here, a (n) is a correction coefficient depending on the number n of main channels, and takes a
value larger than one.
[0026]
The gain superposition unit 463 superimposes the superposition gain calculated by the
superposition gain calculation unit 462 and the sound signal subjected to the echo cancellation
process (S563).
The channel-by-channel superposition gain storage unit 464 stores the superposition gains
calculated by the superposition gain calculation process (S562a, S562b) in association with the
15-04-2019
8
estimated main channel (S564).
For example, if the main channel Cn at time t, the channel-based superposition gain storage unit
464 stores the superposition gain GCn [t] in the storage area corresponding to the channel Cn.
At this time, for the channel that has not been estimated to be the main channel, at time t, the
superposition gain stored in the channel-by-channel superposition gain storage unit 464 is held
as it is, and read out as needed during calculation after time t. It shall be used.
In addition, in the case of using a plurality of superposition gains from time t−k to time t−1 to
calculate the initial value Gi, for example, in the storage area corresponding to the channel Cn
and time t−k, superposition is performed. The gain GCn [t−k] is stored.
Therefore, the number of superposition gains to be stored is N for each channel and k for each
time, and the total number of memories is N × k.
[0027]
When the number of microphones is N and the number of main channels to be estimated is n (n
is an integer of 2 or more), the superposition gain is a combination of selecting n main channels
from N microphones, It is calculated NCn way. Therefore, the channel-based superposition gain
storage unit 464 stores a total of NCn superposition gains in the storage area corresponding to
the combination of the channels. In addition, when a plurality of superposition gains from time
t−k to time t−1 are used to calculate the initial value Gi, the total number of superposition gains
stored is NCn × k. Thus, the superposition gain stored in the channel-by-channel superposition
gain storage unit 464 is read out as needed by the superposition gain calculation unit 462 when
calculating the initial value and the superposition gain anew. When the power is turned on, the
speakerphone 400 sets the value 1 to each storage area of the channel-by-channel superposition
gain storage unit 464 in advance. Each time the superimposition gain is calculated, the
superimposition gain of the storage area corresponding to the main channel is updated or stored
in the storage area corresponding to each time.
[0028]
15-04-2019
9
According to this embodiment, the addition unit 440 adds a plurality of input sound signals (S
540), and the echo cancellation unit 450 performs echo cancellation on the added sound signal
(S 550). There is no increase. Furthermore, the main channel estimation unit 430 estimates one
or more of the N microphones as the main channel (S530), and the superposition gain calculation
unit 462 sets an initial value based on the estimated main channel (S562a), The superposition
gain is calculated using the initial value and the sound signal subjected to echo cancellation (S
562 b), and the gain superposition unit 463 superimposes the superposition gain on the sound
signal (S 563). Also, there is no deterioration in the tracking performance. Therefore, the
loudspeaker communication device and the loudspeaker communication method according to the
present embodiment do not increase the operation amount and the memory amount of the
adaptive filter, and the automatic performance for each of the plurality of microphones does not
deteriorate due to frequent change of the main channel. Enables volume control.
[0029]
Furthermore, in the present embodiment, the main channel estimation unit 430 estimates a
plurality of microphones as the main channel, so that automatic volume control can be
appropriately performed even when speaker voice is divided and input to a plurality of
microphones. it can. Furthermore, in the present embodiment, since the superposition gain
calculation unit 462 uses the superposition gain calculated in the past at time k as an initial value
when calculating the present superposition gain, the superposition gain calculated in the past is
reflected. Automatic volume control can be performed. Furthermore, since the superimposition
gain calculation unit 462 multiplies the initial value by the forgetting factor, the degree of
reflecting the superimposition gain calculated in the past can be set arbitrarily.
[0030]
[Modification 1] A modification 1 of the embodiment 1 will be described with reference to FIGS. 4
and 6. In the first modification, the weight gain calculation unit 462 'sets the superposition gain
stored in the channel-by-channel superposition gain storage unit 464 as an initial value at the
time of initial value calculation (S662a). Specifically, the initial value Gi is set as the superposition
gain GCn [t-1] stored in the channel-by-channel superposition gain storage unit 464 at time t-1.
In detail, the superposition gain GCn [t] at the time t and the main channel Cn is calculated by the
following equation using the time constant α and the target output sound signal level Vd. GCn [t]
= (1−α) × Vd / VCn [t] + α × GCn [t−1] The calculated superposition gain GCn [t]
corresponds to the main channel in the channel-by-channel superposition gain storage unit 464.
15-04-2019
10
When the main channel is estimated to be n again after time t, it is used as an initial value Gi for
new superposition gain calculation. Since this modification has such a configuration, the same
effect as that of the first embodiment can be obtained.
[0031]
A loudspeaker communication apparatus and a loudspeaker communication method according to
a second embodiment of the present invention will be described with reference to FIGS. 7 and 8.
FIG. 7 is a block diagram showing the configuration of a loudspeaker communication device 700
according to the second embodiment. FIG. 8 is a flowchart showing the operation of the
loudspeaker apparatus 700 according to the second embodiment. In the second embodiment, the
microphones are divided into M (where M is an integer of 2 or more, the same applies
hereinafter) groups so that each group has a plurality of microphones, and the sound collected
by the microphones of each group is The process described in Example 1 is performed for each
group. The loudspeaker communication device 700 has N m (where m is a positive integer, N m
is an integer of 2 or more, and so forth) microphones 912-m to N m in the m-th microphone
group. A group of M microphones, main channel estimation units 430-1 to M for each
microphone group, addition units 440-1 to M for each microphone group, an echo cancellation
unit 750, and an automatic volume adjustment unit for each microphone group And 460-1 to M.
The automatic volume adjusters 460-1 to 460-M of the loudspeaker communication device 700
are, for each of the microphone groups, voice detectors 461 to 1-M, superposition gain
calculators 462 to 1-M, and gain superposition units 463 to 1-M. And a channel-by-channel
superposition gain storage unit 464-1-M.
[0032]
The sound signal input from the network 470 to the loudspeaker communication device 700 is
loudspeakered from the speaker 911 via the echo cancellation unit 750. The microphones 9121-1 to 912 -M-NM convert the collected sound into a sound signal, and input them to the adding
units 440-1 to M and the main channel estimation units 430-1 to M, respectively. The main
channel estimation units 430-1 to M estimate one or more of the microphones of each group as
the main channel (S830-1 to M). The adding units 440-1 to M add the sounds collected by the
microphones as sound signals for each group (S540-1 to M). The echo cancellation unit 750
cancels the echo of the sound signal added for each group (S550-1 to M). The voice detection
units 461-1 to 461-M detect whether voice is present or not from the sound signal in which the
echo is canceled for each group (S 561-1-M). When speech is detected, superposition gain
calculation units 462-1 to M set an initial value based on the estimated main channel (S 562 a-1
15-04-2019
11
to M), and perform echo cancellation processing with this initial value. The superposition gain is
calculated using the received sound signal (S562b-1 to M). The gain superimposing units 463-1M superimpose the superimposition gains calculated by the superimposition gain calculation
units 462-1-M on the sound signals subjected to the echo cancellation processing for each group
(S <b> 563-1-M). The channel-by-channel superposition gain storage units 464-1 to M store the
superposition gains calculated by the superposition gain calculation processing (S562a-1 to M,
S562b-1 to M) in association with the estimated main channel ( S564-1 to M).
[0033]
Since this embodiment has such a configuration, the same effect as that of the first embodiment
can be obtained. Further, in the present embodiment, since processing is performed for each of
the M groups, individual automatic volume control for each of the plurality of microphones can
be performed without degrading the followability to the input sound from more microphones. It
can be carried out. Furthermore, since a sound signal is output for each microphone group,
output from the corresponding speaker is possible for each microphone group, and the number
of speakers on the output side can be expanded to M.
15-04-2019
12
Документ
Категория
Без категории
Просмотров
0
Размер файла
23 Кб
Теги
jp2011172081, description
1/--страниц
Пожаловаться на содержимое документа