close

Вход

Забыли?

вход по аккаунту

?

JP2001100785

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2001100785
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
speech recognition apparatus, and more particularly, to an audiovisual apparatus such as a TV,
radio, audio system, etc. for reproducing multi-channel audio including 2-channel stereo, The
present invention relates to a voice recognition device for AV equipment that can control AV
equipment by voice and input information to AV equipment by voice even in a state where the
voice is loudened. 2. Description of the Related Art Conventionally, as a document describing a
technology for performing speech recognition in a state where audio is expanded from a speaker,
Japanese Patent Application Laid-Open No. 5-22779 (title of the invention "voice recognition
remote control device") is known. ing.
[0002]
FIG. 23 is a block diagram showing the configuration of a conventional voice recognition device
for AV equipment using the technique disclosed in the above-mentioned publication. The voice
recognition device shown in FIG. 23 is used for an AV apparatus having one speaker 201.
Referring to FIG. 23, the conventional voice recognition apparatus for AV equipment includes a
microphone 202, a voice recognition unit 203, and an echo canceller 204.
[0003]
09-05-2019
1
About the conventional audio | voice recognition apparatus for AV apparatuses comprised as
mentioned above, the operation | movement is described using FIG. FIG. 24 is a diagram showing
time waveforms of signals input to or output from the respective components in the speech
recognition apparatus of FIG. In FIG. 24, it is assumed that the user emits a voice for voice
control in a state where the audio signal is expanded from the speaker 201.
[0004]
When the user emits a voice in a state where the audio signal is not amplified from the speaker
201, the microphone 202 outputs a voice signal with a very good S / N shown at 211 in FIG.
However, when an audio signal of a TV program shown at 212 in FIG. 24 is inputted to the
speaker 201, an echo signal shown at 213 in FIG. 24 which is similar to the speaker input 212 is
mixed in the output of the microphone 202.
[0005]
Therefore, the microphone 202 outputs a signal having a very low S / N to recognize the user's
voice as shown by 214 in FIG. 24 in which the user's voice 211 and the echo signal 213 are
added. Of course, even if the microphone output 214 with a low S / N ratio is input to the speech
recognition unit 203, the speech recognition unit 203 can not expect a sufficient speech
recognition result.
[0006]
Therefore, in the speech recognition apparatus shown in FIG. 23, the echo signal 213 coming
from the speaker 201 to the microphone 202 is estimated by the adaptive digital filter in the
echo canceller 204. Then, the echo signal 213 is completely canceled by subtracting this
estimated echo signal from the microphone output 214 by a subtraction circuit in the echo
canceller 204, and only the user speech 211 is extracted.
[0007]
The echo canceller 204 is provided with a speaker input 212 which is an input signal to the
09-05-2019
2
speaker 201. The adaptive digital filter in the echo canceller 204 estimates the echo signal 215
from the waveform of the speaker input 212 and the impulse response of the echo path from the
speaker 201 to the microphone 202 stored therein. Next, a subtraction circuit within the echo
canceller 204 subtracts this estimated echo signal 215 from the microphone output 214, thereby
obtaining an echo canceller output 216.
[0008]
As understood from the comparison between the echo canceller output 216 and the waveform
211 of the user's voice, even if the audio is loudened from the speaker 201, voice echo
cancellation by the echo canceller 204 as described above causes voice to be voiced. It can be
expected that the recognition unit 203 performs accurate speech recognition.
[0009]
However, the voice recognition device shown in FIG. 23 has a major drawback that it only
supports the monaural audio system AV device and can not be used for a multichannel audio
system AV device using a plurality of speakers. It was
[0010]
FIG. 25 is a block diagram showing the configuration of another conventional voice recognition
device for AV equipment.
The speech recognition apparatus shown in FIG. 25 is used in a two-channel audio AV apparatus
having two speakers 221 and 222.
Referring to FIG. 25, another conventional speech recognition apparatus includes a microphone
223, a speech recognition unit 224, and two echo cancellers 225 and 226.
[0011]
In this conventional example, echoed sound coming from the speaker 221 to the microphone
223 and echoed sound coming from the speaker 222 to the microphone 223 are estimated by
the adaptive digital filter in the echo canceller 225 and the adaptive digital filter in the echo
09-05-2019
3
canceller 226. By subtracting these two estimated values from the output signal of the
microphone, only the user voice is extracted. Unlike the speech recognition apparatus shown in
FIG. 23, the speech recognition apparatus shown in FIG. 25 can be applied to stereo AV
equipment.
[0012]
However, since the speech recognition apparatus shown in FIG. 25 requires echo cancellers
corresponding to the number of audio channels, it has the disadvantage of becoming an
extremely expensive speech recognition apparatus when used for multi-channel audio AV
equipment. The Furthermore, in a system using a plurality of echo cancellers in this way, mutual
interference occurs between echo cancellers, so the adaptive operation of the echo canceller is
extremely unstable, and echo noise increase and oscillation occur due to adaptation failure. A
major drawback was also known:
[0013]
SUMMARY OF THE INVENTION In a voice recognition apparatus for AV equipment, voice
recognition can be performed while reproducing audio with a speaker, that multichannel audio
can be supported, reliability is high, and cost is low. It is strongly requested.
[0014]
However, as described above, since the conventional voice recognition device for AV equipment
requires echo cancellers as many as the number of audio channels, when it is used for AV
equipment of multi-channel audio system, the problem is that the cost becomes extremely high.
Had.
Furthermore, due to mutual interference between echo cancellers, the adaptation operation of
the echo cancellers becomes extremely unstable, causing echoes to increase and / or oscillate
due to adaptation failure, resulting in another problem that speech recognition performance is
degraded. Also had.
[0015]
Therefore, an object of the present invention is to realize a voice recognition device for multi-
09-05-2019
4
channel AV equipment that can perform high-accuracy voice recognition in the state where
multi-channel sound is output from a speaker, and is inexpensive. is there.
[0016]
The first aspect of the present invention is used in an AV device that outputs multi-channel sound
through a plurality of speakers, recognizes user voice input through a microphone, and then
recognizes the AV device. A speech recognition apparatus for causing the speaker to perform a
predetermined processing operation, the monauralization means for monauralizing a multichannel signal directed to a plurality of speakers, the output of a microphone (hereinafter,
microphone output), and the output of the monauralization means Hereinafter, a monaural signal
is given, and echoes of multi-channel sound are estimated based on the monaural signals, and
one echo canceler which removes the echo from the microphone output, an output of one echo
canceler Voice recognition means for recognizing user voice based on (hereinafter referred to as
echo canceller output) Eteiru.
[0017]
In the first aspect of the invention, the multi-channel signal is converted to monaural and given
to one echo canceler, and since the one echo canceler removes echo sound of multi-channel
sound from the microphone output, only one echo can be obtained regardless of the number of
channels. Only by providing the canceller, voice recognition can be performed in the state where
multi-channel sound is output from the speaker.
Further, unlike the case where a plurality of echo cancelers are provided, mutual interference
among the echo cancelers does not occur and the speech recognition performance is not
degraded.
[0018]
According to a second invention, in the first invention, a multi-channel signal is input to the
plurality of speakers.
[0019]
In the second aspect of the invention, since multi-channel sound is output from a plurality of
09-05-2019
5
speakers, echo sound can not be completely canceled out by the monauralized signal.
However, if the monaural degree of the multi-channel signal is close to "1", the echo can be
almost canceled.
At least, as long as the monaural degree of the multichannel signal is not "0", part of the echo can
be canceled. Here, the monaural degree of a multi-channel signal means the ratio of components
(monaural components) commonly contained in all channels in the signal, and if the signals of all
channels are completely uncorrelated, monaural The degree is "0", and if it is the same, the
monaural degree is "1".
[0020]
A third aspect of the invention according to the first aspect of the invention further comprises
switching means for inputting any of the multi-channel signal and the monaural signal to a
plurality of speakers.
[0021]
In the third aspect of the present invention, any of multi-channel sound and monaural sound can
be selectively output from a plurality of speakers.
[0022]
A fourth invention according to the third invention further comprises voice detection means for
detecting a user voice based on the monauralized signal and the echo canceller output, and the
switching means does not detect user voice by the voice detection means. It is characterized in
that a multi-channel signal is inputted to a plurality of speakers, and a monaural signal is
inputted to a plurality of speakers when the user's voice is detected by the voice detection means.
[0023]
In the fourth aspect of the invention, multi-channel sound is required when voice recognition
does not need to be performed (user voice is not detected), and monauralized sound is needed
when voice recognition needs to be performed (user voice is detected). Since the output is
performed, speech recognition can be performed with sufficiently high accuracy.
09-05-2019
6
[0024]
According to a fifth invention, in the third invention, start command means for commanding start
of the speech recognition operation, end command means for commanding the end of the speech
recognition operation, and commands from the start command means and the end command
means. And state setting means for setting the speech recognition means to either the operating
state or the standby state, and the switching means sets the multi-channel signals to a plurality of
multi-channel signals when the speech recognition means is set to the standby state by the state
setting means. When the voice recognition means is set to the operating state by the state setting
means, the monaural signal is input to the plurality of speakers.
[0025]
In the fifth aspect of the invention, the multi-channel sound is output when the voice recognition
means is in the standby state ("OFF" state), and the monaural sound is output when the voice
recognition means is in the operating state ("ON" state). Speech recognition can be performed
with high accuracy.
[0026]
A sixth invention according to the fifth invention further comprises monaural degree
determination means for determining the monaural degree of the multichannel signal, and an
arbitrary monaural means for monauralizing the multichannel signal to an arbitrary monaural
degree, The means completely converts the multi-channel signal into monaural, and the
arbitrary-monaural converting means converts the multi-channel signal to the predetermined
monaural degree when the judgment result of the monaural degree judging means is lower than
the predetermined monaural degree. It is characterized by becoming monaural.
[0027]
In the sixth aspect of the invention, since the monaural degree of the multi-channel signal is
always equal to or more than the predetermined monaural degree, the three-dimensional effect is
greatly impaired even when the voice recognition means is in the operating state ("ON" state).
Instead, the speech recognition performance can be performed with high accuracy (that is, the
three-dimensional feeling and the speech recognition performance can be balanced).
[0028]
In a seventh aspect according to the fifth aspect, the multi-channel signal is a signal of three or
more channels, and the multi-channel signal further comprises two-channeling means for
converting the multi-channel signal into two channels, The output (hereinafter referred to as
“two-channel signal”) of the above is converted to monaural, and the switching means is
09-05-2019
7
characterized in that any one of a multi-channel signal, two-channel signal and monaural signal is
input to a plurality of speakers.
[0029]
In the seventh aspect, any of multi-channel sound, dual-channel sound and monaural sound can
be selectively output from the plurality of speakers.
[0030]
An eighth invention according to the seventh invention further comprises voice detection means
for detecting a user voice based on the monauralized signal and the echo canceller output, and
the switching means sets the voice recognition means in a standby state. When set, the multichannel signal is input to a plurality of speakers, and the state setting means sets the voice
recognition means to the operating state, but when user voice is not detected by the voice
detection means, 2 channels The present invention is characterized in that the conversion signal
is input to the plurality of speakers, and when the user's voice is detected by the voice detection
means, the monaural conversion signal is input to the plurality of speakers.
[0031]
In the eighth aspect of the invention, when the voice recognition means is in the standby state
("OFF" state), multi-channel sound is active, but in the operating state ("ON" state), voice
recognition does not need to be performed (user voice Multi-channel sound is not detected if it is
not detected, and monaural sound is output if voice recognition needs to be performed (user
voice is detected). Speech recognition can be performed with accuracy.
[0032]
The ninth invention is the cancellation monitoring means for monitoring whether echo sound is
sufficiently canceled in the echo canceller based on the monauralization signal and the echo
canceller output according to the fifth invention, the monauralization signal and the echo
canceller output Voice detection means for detecting user voice based on the above and
attenuating means for attenuating the multi-channel signal, wherein the attenuation means is a
voice detection means in a state where the monitoring result of the cancellation monitoring
means indicates insufficient cancellation. Is characterized in that it attenuates multi-channel
signals when it detects user speech.
[0033]
09-05-2019
8
In the ninth aspect, when the user's voice is detected in a state where the echo is not sufficiently
canceled, the mixing of the echo is suppressed by lowering the level of the sound output from the
plurality of speakers.
As a result, the speech recognition performance in a state where the echo is not sufficiently
canceled is improved.
[0034]
In a tenth invention according to the fifth invention, the echo canceller estimates the impulse
response of the echo path between the plurality of speakers and the microphone, and calculates
echo sound from the estimated impulse response and the monaural signal. And subtracting
means for subtracting the output of the adaptive digital filter from the microphone output.
[0035]
In the tenth aspect of the invention, echo sound of multi-channel sound can be removed from the
microphone output, and only user speech can be provided to the speech recognition means.
[0036]
According to an eleventh aspect of the present invention, in the tenth aspect, the monaural
adaptation sound for promoting the adaptation of the adaptive digital filter when the switching
means switches the inputs to the plurality of speakers from the multichannel signal to the
monauralization signal. It further comprises an adaptive sound generating means for generating.
[0037]
In the eleventh aspect, when the input to the speaker is switched from the multi-channel signal to
the monaural signal, the monaural adaptive sound is output from the plurality of speakers, so the
monaural sound immediately after the switching is silent. Even in this case, the impulse response
held by the digital filter can be forcibly adapted to the impulse response of the echo path.
[0038]
A twelfth invention according to the tenth invention further comprises adaptive control means
for controlling the adaptation speed of the adaptive digital filter, wherein the adaptive control
means comprises a fast adaptation speed for monaural and a slow adaptation speed for
multichannel. It is characterized in that the fast adaptation speed is selected when the state
09-05-2019
9
setting means sets the speech recognition means in the operating state, and the slow adaptation
speed is selected when the state setting means is in the standby state.
[0039]
In the twelfth aspect of the invention, the adaptive speed of the adaptive digital filter in the echo
canceller is controlled to a high speed when the speech recognition means is set to the operating
state, and to a low speed when the speech recognition means is set to the standby state. Echo
cancellation suitable for monaural and multi-channel can be performed.
That is, in the case where multi-channel sound is output from the speaker, there are a lot of
stereo components that are noise when viewed from the adaptive digital filter, so the noise
resistance is improved by setting the slow adaptation speed. In the case where there is no stereo
component, by making the adaptation speed fast, it is possible to improve the ability to follow
changes in the impulse response of the echo path.
As a result, an excellent echo cancellation effect is realized in the standby state, and the speech
recognition performance immediately after the transition to the operating state is enhanced.
[0040]
In a thirteenth aspect based on the twelfth aspect, the adaptive control means is provided with an
identification signal indicating whether the signal input to the plurality of speakers is a multichannel signal or a monaural signal, and the adaptive control means In the case of indicating
monaural, it is characterized in that the fast adaptation speed is selected regardless of whether
the state setting means sets the speech recognition means in the operating state or in the
standby state.
[0041]
In the thirteenth aspect of the present invention, whether the signal input to the plurality of
speakers is a multi-channel signal or a monaural signal is determined based on the identification
signal, and in the case of a monaural signal, the state setting means sets the voice recognition
means into an operating state. Since the fast adaptation speed is selected regardless of whether it
is set to the standby state or the standby state, the ability to follow changes in the impulse
response of the echo path does not deteriorate, and as a result, excellent echo cancellation effect
in the standby state Is realized, and the speech recognition performance immediately after the
09-05-2019
10
transition to the operating state is enhanced.
[0042]
A fourteenth aspect of the invention according to the tenth aspect is the monaural degree
determination means for determining the monaural degree of the multi-channel signal, and an
adaptive control means for controlling the adaptation speed of the adaptive digital filter based on
the determination result of the monaural degree determination means. Are further equipped.
[0043]
In the fourteenth invention, the adaptation speed of the adaptive digital filter is controlled based
on the monaural degree of the multi-channel signal, so that echo cancellation suitable for multichannel signals having various monaural degrees can be performed.
That is, when the monaural degree is low, the adaptation speed is reduced to improve the noise
resistance.
On the other hand, when the monaural degree is high, noise resistance is not necessary because
there are few stereo components which are noises when viewed from the adaptive digital filter.
Therefore, as in the fifteenth invention described below, by making the adaptation speed faster, it
is possible to enhance the ability to follow changes in the impulse response of the echo path.
As a result, particularly when the monaural degree is high, an excellent echo cancellation effect
can be realized, and the speech recognition performance immediately after the transition to the
operating state is enhanced.
[0044]
A fifteenth invention is characterized in that in the fourteenth invention, the adaptive control
means increases the adaptation speed of the adaptive digital filter as the degree of mono of the
multi-channel signal is higher.
09-05-2019
11
[0045]
A sixteenth invention according to the tenth invention further comprises a non-volatile memory,
wherein the non-volatile memory acquires and stores an impulse response estimated by the
adaptive digital filter when the power is turned "OFF". When the power is turned on, the
estimated impulse response at the time of the stored power supply "OFF" is given to the adaptive
digital filter, and the adaptive digital filter uses the estimated impulse response at the power
supply "off" given from the non-volatile memory as an initial value. , And start the estimation of
the impulse response.
[0046]
In the sixteenth invention, since the estimated impulse response at the time of the power supply
"OFF" is stored and estimation of the impulse response is started with the power supply "ON" as
the initial value, "0" is set as the initial value. Compared to the case, the estimation error
immediately after the power supply "ON" becomes smaller, as a result, the speech recognition
performance is improved.
[0047]
A seventeenth invention according to the fifth invention further comprises voice detection means
for detecting a user voice based on the monauralized signal and the echo canceller output,
wherein the activation command means is for setting the state setting means when the button is
pressed. It is a button switch that issues a start command, and that the end command means is a
time-limited switch that issues an end command to the state setting means when the state in
which the voice detection means does not detect user voice continues for a predetermined time
or more. It is characterized.
[0048]
In the seventeenth aspect, the speech recognition operation can be ended automatically.
[0049]
According to an eighteenth aspect of the present invention, in the fifth aspect, the speech
recognition system further comprises speech detection means for detecting the user speech
based on the monauralized signal and the echo canceller output, and the activation command
means detects the user speech by the speech detection means. The voice switch is a voice switch
that issues a start command to the state setting means, and the finish command means is a timelimited switch that issues a finish command to the state setting means when the state where the
voice detection means does not detect user voice continues for a predetermined time or more. It
is characterized by
09-05-2019
12
[0050]
In the eighteenth aspect, the speech recognition operation can be automatically started and
ended.
[0051]
BEST MODE FOR CARRYING OUT THE INVENTION Embodiments of the present invention will be
described below with reference to the drawings.
First, an AV apparatus in which the present invention is used will be described.
FIG. 1 is a block diagram showing an example of the configuration of an AV apparatus in which
the present invention is used.
The AV device shown in FIG. 1 is a television receiver for receiving a television broadcast.
In television broadcasting here, it is assumed that a multi-channel (including two channels; the
same applies hereinafter) sound system is adopted.
[0052]
In FIG. 1, the AV device includes an antenna 1, a receiving unit 2, an AV processing unit 3, a
controller 4, a control panel 5, a microphone 6, a voice recognition device 7, a display unit 8, and
a speaker unit 9. And have.
[0053]
The antenna 1 captures a radio wave transmitted from a broadcasting station and converts it into
an electric signal.
09-05-2019
13
The receiver 2 extracts a signal included in a specific frequency band from the electrical signal
output from the antenna 1.
The AV processing unit 3 processes a signal output from the receiving unit 2 and outputs a video
signal and a multi-channel audio signal (hereinafter, a multi-channel signal).
[0054]
The controller 4 receives a control signal from the control panel 5 or the speech recognition
device 7 and sends the reception unit 2 and / or the AV processing unit 3 to, for example, change
the reception channel or increase or decrease the volume, or "ON" / OFF the main power supply.
Execute a predetermined process such as
The display unit 8 includes a display, and receives an image signal from the AV processing unit 3
to display an image.
The speaker unit 9 includes a plurality of speakers (9a, 9b,...), Receives a multi-channel signal
from the AV processing unit 3, and outputs multi-channel sound.
[0055]
The control panel 5 (which may be provided in the receiver main body or in the remote control)
is constituted by a button or the like, and generates a control signal corresponding to the user's
button operation.
The microphone 6 converts the voice emitted by the user into an electrical signal.
The voice recognition device 7 receives the electrical signal output from the microphone 6 and
generates a control signal corresponding to the user's voice.
[0056]
09-05-2019
14
Here, the signal output from the receiving unit 2 may be an analog signal or a digital signal.
In the former case, the AV processing unit 3 is configured by a circuit that processes the signal
output from the receiving unit 2 in an analog manner.
In the latter case, the AV processing unit 3 is configured by a circuit that digitally processes the
signal output from the receiving unit 2.
[0057]
In the television receiver configured as described above, the antenna 1 captures the radio wave
transmitted from the broadcasting station and converts it into an electric signal, and the
receiving unit 2 extracts a signal of a specific frequency band from the electric signal. Extract.
Next, the AV processing unit 3 processes the signal output from the receiving unit 2 and outputs
a video signal and a multi-channel signal.
The video signal output from the AV processing unit 3 is given to the display unit 8 and the video
is displayed on the display.
On the other hand, a multi-channel signal is given to the speaker unit 9, and multi-channel sound
is output from a plurality of speakers.
[0058]
The user can operate the control panel 5 to cause the television receiver to switch reception
channels and the like.
That is, the control panel 5 generates a control signal corresponding to the user's button
09-05-2019
15
operation, and the controller 4 receives the control signal, and causes the receiving unit 2 and /
or the AV processing unit 3 to switch receiving channels.
[0059]
Also, the user can cause the television receiver to switch reception channels and the like by
inputting sound through the microphone 6.
That is, the voice recognition device 7 generates a control signal corresponding to the voice of
the user, and the controller 4 receives the control signal, and causes the receiving unit 2 and / or
the AV processing unit 3 to switch the receiving channel.
[0060]
Note that although a television receiver that outputs multi-channel sound has been described
above as an example of an AV device to which the present invention is used, the present
invention is not limited to a television receiver, and for example, it outputs multi-channel sound It
may be used for radio receivers.
Alternatively, any device having a function of outputting multi-channel sound, such as a multichannel audio system including a player, an amplifier, and a speaker unit 9 etc. for reproducing
media such as CD and DVD in which multi-channel signals are described. , Can be used in the
system.
[0061]
First Embodiment FIG. 2 is a block diagram showing the configuration of a voice recognition
device for AV equipment according to a first embodiment of the present invention.
The speech recognition device 7 of FIG. 2 corresponds to the speech recognition device 7
provided in the AV device of FIG.
09-05-2019
16
However, in the present embodiment, in the AV apparatus, a 2-channel signal is output from the
AV processing unit 3, and 2-channel sound is output through the two speakers 9 a and 9 b
included in the speaker unit 9.
[0062]
In FIG. 2, the speech recognition device 7 includes a monaural unit 13, one echo canceller 14,
and a speech recognition unit 15.
The signals input to the speakers 9a and 9b are two-channel signals output from the AV
processing unit 3 of FIG.
[0063]
The 2-channel signal directed to the speakers 9a and 9b is branched and input to the monaural
unit 13, and the monaural unit 13 monauralizes the 2-channel signal.
A signal output from the microphone 6 (hereinafter referred to as a microphone output) and a
signal output from the monaural unit 13 (hereinafter referred to as a monaural signal) are
supplied to the echo canceller 14. Only the signal corresponding to the user's voice (hereinafter,
user voice) is extracted.
[0064]
Here, the operation principle of the echo canceller 14 will be briefly described. The echo
canceller 14 includes an adaptive digital filter 14a and a subtraction circuit 14b. The microphone
output includes, in addition to the user's voice, a signal (hereinafter, echo signal) generated as a
result of the sound output from the speakers 9a and 9b reflecting indoors and getting into the
microphone 6.
[0065]
09-05-2019
17
The monauralization signal is input to the adaptive digital filter 14a, and the signal output from
the subtraction circuit 14b is fed back, and the adaptive digital filter 14a estimates a
reverberation signal based on the two signals. The estimated echo signal thus obtained and the
microphone output are supplied to the subtraction circuit 14b, and the subtraction circuit 14b
subtracts the estimated echo signal from the microphone output. As a result, the echo canceller
14 outputs the user voice from which the echo signal has been removed.
[0066]
The voice recognition unit 15 recognizes the user voice from the echo canceller 14 and
generates a control signal indicated by the voice. The control signal generated in this manner is
transmitted to the controller 4 of FIG. 1, and the controller 4 controls the receiving unit 2 and the
AV processing unit 3 to execute processing such as switching of reception channels in the
television receiver. Ru.
[0067]
FIG. 3 is a block diagram showing a hardware configuration of the speech recognition device 7 of
FIG. In FIG. 3, the speech recognition device 7 includes a CPU 10, a RAM 11, and a ROM 12. A
predetermined program is stored in advance in the ROM 12. In this program, (a) an algorithm for
monauralizing a two-channel signal, (b) an algorithm for removing an echo signal from a
microphone output, and (c) for recognizing a user voice and generating a control signal The
algorithm of is described. The CPU 10 operates in accordance with the above program while
using the RAM 11 as a work area. By this, the function of each block shown in FIG. 2 is realized.
Note that the function of each block may be realized by a dedicated hard circuit instead of
software.
[0068]
The operation of the audio recognition device 7 for AV device configured as described above will
be described below using FIG. FIG. 4 is a diagram showing time waveforms of signals input to or
output from the respective components in the speech recognition device 7 of FIG. First, in a state
where the left speaker input 21 shown in FIG. 4 and the right speaker input 22 shown in FIG. 4
09-05-2019
18
are input to the speaker 9a and the speaker 9b, the user emits the audio shown in 23 in FIG.
think of. At this time, the microphone 6 outputs a microphone output signal shown in 24 of FIG.
4 to which the echo sound of the left speaker input 21 and the echo sound of the right speaker
input 22 and the user voice 23 are added. On the other hand, the left speaker input 21 and the
right speaker input 22 are also input to the monaural unit 13, and are added here to obtain a
monaural signal shown in 25 of FIG.
[0069]
The monaural signal 25 is input to the echo canceller 14, and the echo canceller 14 estimates the
estimated echo signal shown in 26 of FIG. 4 from the monaural signal 25 and the estimated
impulse response stored internally. Inside the echo canceller 14, the estimated estimated echo
signal 26 is subtracted from the microphone output signal 24 to obtain an echo canceller output
signal 27 shown in FIG. 4. This signal is input to the speech recognition unit 15. A comparison of
the echo canceller output signal 27, the user speech 23 and the microphone output signal 24
shows that the echo signal is canceled out quite effectively.
[0070]
Next, the reason why a stereo signal (hereinafter, a two-channel signal is appropriately referred
to as a stereo signal) can be canceled by one echo canceller 14 will be described. Assuming that
the transfer characteristic (impulse response) from the right channel speaker 9a to the
microphone 6 is Hr, the transfer characteristic from the left channel speaker 9b to the
microphone 6 is H1, and the right channel signal is Sr and the left channel signal is S1. The echo
signal Se mixed in the output of the microphone 6 is Se = (Sr * Hr + Sl * Hl).
[0071]
At this time, if the left and right transfer characteristics are substantially equal Hr ≒ Hl (≒ H),
then Se ≒ (Sr + Sl) * H, and if the left and right channel signals are almost equal Sr ≒ Sl ≒ S, Se
≒ S * (Hr + Hl). Therefore, it is understood that even one echo canceller 14 can cancel out if
either assumption holds.
[0072]
09-05-2019
19
The biggest factors that determine the transfer characteristics Hr and Hl are the distance
between the speakers 9a and 9b and the microphone 6 and the reflection structure of the room,
but in the actual listening state, the microphone 6 for speech recognition and for the right
channel The distances between the speaker 9a and the speaker 9b for the left channel are, of
course, approximately equal distances when the microphone 6 is placed near the user, for
example. Also, even when installed on the TV, it will be equidistant if installed in the center of the
TV. Furthermore, the reverberation structure of the room is naturally almost the same.
[0073]
At high frequencies, since the wavelength is short, a slight distance difference causes phase
inversion, so that even if they are approximately equidistant, the matching of the transfer
characteristics including the phase is insufficient. However, at low and mid frequency
frequencies, the transfer characteristics are often matched fairly well, so the assumption of Hr ≒
Hl holds, and even with one echo canceller 14, a certain degree of cancellation effect can be
expected.
[0074]
Furthermore, in the creation of sound in an actual TV program etc., the center localized sound
(monaural component) is mixed equally on the left and right channels at a relatively high level,
and the sound (stereo component) to be localized on the left and right with this monaural
component is relatively Often mixing at low levels. That is, the center sound source is mainly
made, and a considerable part of the left and right speaker inputs shown at 21 and 22 is the
monaural component. In the case of such an audio signal mainly based on a center sound source,
the assumption of SrSSl holds, and even in a system using one echo canceller 14, echo can be
canceled effectively. From the above reasons, it has been confirmed that a considerable echo
cancellation effect can be obtained even with the voice recognition device 7 configured as shown
in FIG. 2 in an actual TV viewing state.
[0075]
As described above, according to this embodiment, since a single echo canceler can cope with a
09-05-2019
20
stereo source (two-channel signal), an inexpensive voice recognition apparatus for AV equipment
can be realized. In addition, since only one echo canceller is used, there is no mutual interference
among the echo cancelers, and it is possible to obtain a practically extremely important effect
that stable operation can be guaranteed.
[0076]
In the first embodiment (and the following second to fourth, sixth, and thirteenth embodiments),
in the AV apparatus of FIG. 1, a 2-channel signal is output from the AV processing unit 3, and the
2-channel signal is output through the speaker unit 9. Although the sound is amplified, instead,
multi-channel signals such as 4-channel signals and 6-channel signals are output from the AV
processing unit 3, and multi-channel audio such as 4-channel sound and 6-channel sound is
output through the speaker unit 9. It may be output. In this case, the program description (or the
configuration of the dedicated hard circuit) of the ROM 12 may be partially modified so that the
monaural unit 13 of FIG. 2 performs monauralization of the multichannel signal. In that case, the
monaural unit 13 may add the signals of all the channels, or may add only the main channel
signals such as the front left, right and center. In addition, when adding, each channel may be
weighted and added instead of adding equally.
[0077]
Second Embodiment FIG. 5 is a block diagram showing the configuration of a voice recognition
device for AV equipment according to a second embodiment of the present invention. The speech
recognition device 7 of FIG. 5 corresponds to the speech recognition device 7 provided in the AV
device of FIG. However, in the present embodiment, in the AV apparatus, a 2-channel signal is
output from the AV processing unit 3, and 2-channel sound is output through the two speakers 9
a and 9 b included in the speaker unit 9.
[0078]
In FIG. 5, the speech recognition device 7 includes a monaural unit 33, one echo canceller 34, a
speech recognition unit 35, a speech detection unit 37, and a switching unit 36. That is, the voice
recognition device 7 of FIG. 5 is obtained by adding a voice detection unit 37 and a switching
unit 36 to the voice recognition device 7 of the first embodiment shown in FIG. The signals input
to the speakers 9a and 9b are two-channel signals output from the AV processing unit 3 of FIG.
09-05-2019
21
[0079]
The 2-channel signal directed to the speakers 9a and 9b is branched and input to the monaural
unit 33, and the monaural unit 33 monarizes the 2-channel signal. The signal (microphone
output) output from the microphone 6 and the signal (monaural signal) output from the
monaural unit 33 are applied to the echo canceller 34. The echo canceller 34 receives the user's
voice from the microphone output. Only the corresponding signal (hereinafter, user voice) is
extracted. The operation principle of the echo canceller 34 has been described in the first
embodiment.
[0080]
The voice detection unit 37 receives the output (monaural signal) of the monaural unit 33 and
the output (user voice) of the echo canceller 34, and the voice detection unit 37 determines the
user based on the level ratio of both outputs. Detect voice. When the voice detection unit 37
detects user voice, the switching unit 36 switches the input to the speakers 9a and 9b from the
two-channel signal (during non-detection) to a monaural signal. In addition, when the voice
detection unit 37 changes from the state where user voice is detected to the state where it is not
detected, the input to the speakers 9a and 9b is switched from the monauralization signal (during
detection) to the two-channel signal.
[0081]
The speech recognition unit 35 starts a speech recognition operation when the speech detection
unit 37 detects a user speech. That is, it recognizes the user voice from the echo canceller 34 and
generates a control signal indicated by the voice. The control signal generated in this manner is
transmitted to the controller 4 of FIG. 1, and the controller 4 controls the receiving unit 2 and the
AV processing unit 3 to execute processing such as switching of reception channels in the
television receiver. Ru.
[0082]
09-05-2019
22
The hardware configuration of the speech recognition device 7 of FIG. 5 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, in
addition to the algorithms (a) to (c) described in the first embodiment, (d) an algorithm for
detecting user speech, and (e) input signals to the speakers 9a and 9b. An algorithm for
switching is described. The CPU 10 operates in accordance with the above program while using
the RAM 11 as a work area. By this, the function of each block shown in FIG. 5 is realized. Note
that the function of each block may be realized by a dedicated hard circuit instead of software.
[0083]
The operation of the audio recognition device 7 for AV device configured as described above will
be described below using FIG. FIG. 6 is a diagram showing time waveforms of signals input to or
output from the respective components in the speech recognition device 7 of FIG. First, as
described in the first embodiment, in the state where the left speaker input 41 shown in FIG. 6
and the right speaker input shown 42 in FIG. 6 are input to the speakers 9a and 9b, Consider the
case where the user utters the voice shown at 43 in FIG. At this time, the signal of 44 in FIG. 6 is
output from the monaural unit 33. In the speech recognition device 7 of FIG. 3, the speech
detection unit 37 determines whether or not the user has made speech, and the periods A to B
and C to D shown in FIG. Then, the switching unit 36 switches the input to the speakers 9a and
9b to the stereo signal side, and switches to the monaural signal side during the period B to C
during which the user is emitting sound. At this time, a signal output from the microphone 6 is
shown at 45 in FIG.
[0084]
The monaural signal 44 of FIG. 6 is always input to the echo canceller 34, and the echo canceller
34 estimates the estimated echo signal 46 shown in FIG. 6 from the monaural signal 44 and the
estimated impulse response stored internally. presume. Inside the echo canceller 34, the
estimated echo signal 46 is subtracted from the microphone output signal 45 to obtain an echo
canceller output signal shown at 47 in FIG.
[0085]
In the speech recognition apparatus 7 of FIG. 5, the speech detection unit 37 monitors the level
ratio between the monaural signal 44 and the echo canceller output signal 47, and is expected
09-05-2019
23
from the level of the monaural signal 44 and the transfer characteristics of the echo path. When
the level of the echo canceller output signal 47 rises above the level of the echo signal, it is
determined that the user has made voice, and the switching unit 36 switches the input to the
speakers 9a and 9b to the monaural signal 44. When the input to the speakers 9a and 9b is
switched to a monauralized signal (Sr + Sl), the echo signal Se becomes Se = (Sr + Sl) * (Hr + Hl),
and in principle, echoing is performed by one echo canceller 34. The signal can be completely
erased. In the configuration of FIG. 2 above, in the case of an audio signal with a strong stereo
component where the assumption of Sr ≒ Sl does not hold, the cancellation effect of the echo
canceller 34 is naturally lost, and echoes are made to the speech input to the speech recognition
unit 35. Although the signal is mixed to deteriorate the performance of speech recognition, in the
configuration of FIG. 5, even in this case, the echo signal can be completely cancelled, and speech
recognition unit 35 can perform speech recognition with high accuracy. .
[0086]
The echo signal contained in the echo canceller output signal 47 in the speech recognition
apparatus 7 of FIG. 5 is extracted and shown in 48 of FIG. 6, and in 49 of FIG. 6 is an echo
canceller output signal 27 of FIG. The echo signal contained in is extracted and shown.
Comparing these 48 and 49, in the present embodiment, the echo signal is more effectively
canceled during the period in which the user voice of B to C is input, and the S / N for voice
recognition is greatly improved. Know that
[0087]
As described above, according to the present embodiment, user voice with better S / N can be
extracted compared to the first embodiment by switching to monaural playback only when the
user emits voice, usually in stereo playback. The recognition performance can be improved.
[0088]
Third Embodiment FIG. 7 is a block diagram showing the configuration of a voice recognition
device for AV equipment according to a third embodiment of the present invention.
The speech recognition device 7 of FIG. 7 corresponds to the speech recognition device 7
provided in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 2channel signal is output from the AV processing unit 3, and 2-channel sound is output through
09-05-2019
24
the two speakers 9 a and 9 b included in the speaker unit 9.
[0089]
In FIG. 7, the voice recognition device 7 includes a monaural unit 53, one echo canceller 54, a
voice recognition unit 55, a start command unit 581, a finish command unit 582, a state setting
unit 57, and a switching unit. And 56. That is, the voice recognition device 7 of FIG. 7 is the same
as the voice recognition device 7 (first embodiment) of FIG. 2 except for the start command unit
581, the end command unit 582, the state setting unit 57, and the switching unit 56. It is added.
The signals input to the speakers 9a and 9b are two-channel signals output from the AV
processing unit 3 of FIG.
[0090]
The 2-channel signal directed to the speakers 9a and 9b is branched and input to the monaural
unit 53, and the monaural unit 53 monarizes the 2-channel signal. The signal (microphone
output) output from the microphone 6 and the signal (monaural signal) output from the
monaural unit 53 are applied to the echo canceller 54, and the echo canceller 54 receives the
user's voice from the microphone output. Only the corresponding signal (hereinafter, user voice)
is extracted. The operation principle of the echo canceller 54 has been described in the first
embodiment.
[0091]
The activation instruction unit 581 instructs activation of the speech recognition operation. The
end command unit 582 commands the end of the speech recognition operation. The state setting
unit 57 receives an instruction from the start instruction unit 581 and the end instruction unit
582, and sets the operation state of the speech recognition unit 55 (that is, the speech
recognition operation is "ON" / "OFF"). When the state setting unit 57 sets the voice recognition
operation to the "ON" state, the switching unit 56 switches the input to the speakers 9a and 9b
from the 2-channel signal (in the "OFF" state) to the monaural signal. Further, when set to the
"OFF" state, the input to the speakers 9a and 9b is switched from the monauralization signal (in
the "ON" state) to the two-channel signal.
09-05-2019
25
[0092]
The voice recognition unit 55 executes / ends voice recognition in accordance with the setting of
the state setting unit 57. That is, the user voice from the echo canceller 54 is recognized, and the
control signal indicated by the voice is generated. The control signal generated in this manner is
transmitted to the controller 4 of FIG. 1, and the controller 4 controls the receiving unit 2 and the
AV processing unit 3 to execute processing such as switching of reception channels in the
television receiver. Ru.
[0093]
The hardware configuration of the speech recognition device 7 of FIG. 7 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, in
addition to the algorithms (a) to (c) described in the first embodiment and the algorithm (e)
described in the second embodiment, (f) the operation of the speech recognition unit 55 An
algorithm for setting a state is described. The CPU 10 operates in accordance with the above
program while using the RAM 11 as a work area. Thus, the function of each block shown in FIG.
7 is realized.
[0094]
The start instruction unit 581 and the end instruction unit 582 are realized by the buttons
constituting the control panel of FIG. Further, the functions of the blocks other than the start
instruction unit 581 and the end instruction unit 582 may be realized by dedicated hardware
circuits instead of software.
[0095]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. In the speech recognition apparatus 7 of FIG. 7, in the speech
recognition apparatus 7 of FIG. 5, the control of the switching unit 56 performed by the speech
detection unit 37 is performed by the activation command unit 581 and the termination
command unit 582 and the state setting unit 57. It is a configuration to be performed. When the
voice recognition function is to be used, first, the start command unit 581 sends a start signal of
09-05-2019
26
the voice recognition function to the state setting unit 57, and the state setting unit 57 controls
the switching unit 56 to perform the speaker 9a and the speaker In the standby state where the
input signal to 9b is switched from the stereo signal to the monaural signal and the state of voice
recognition is input to the loudspeakers 9a and 9b, the monaural signal is input to the
loudspeakers 9a and 9b. Migrate.
[0096]
In the operating state, the sense of stereo is lost but there is no big problem in listening to the
audio signal. That is, in the operation state, the cancellation effect of the echo signal is in the best
state, and high-accuracy speech recognition can be expected, but the sense of stereo is lost. The
end signal of the voice recognition function is sent to the state setting unit 57, and the state
setting unit 57 controls the switching unit 56 to switch the input signal to the speaker 9a and the
speaker 9b from the monaural signal to the stereo signal Is shifted from the operation state in
which the monaural signal is input to the speakers 9a and 9b to the standby state in which the
stereo signals are input to the speakers 9a and 9b.
[0097]
FIG. 8 is a diagram showing time waveforms of signals input to or output from the respective
components in the speech recognition device 7 of FIG. In FIG. 8, 61 indicates an input signal to
the speech recognition unit 55 in the operation state, and 62 indicates an echo signal included in
the signal. If the signals 61 and 62 in FIG. 8 are compared with the signals 47 and 48 in FIG. 6
above, the speech recognition device 7 in FIG. 7 has more head and tail portions than the speech
recognition device 7 in FIG. It can be seen that the S / N is significantly improved. In the
configuration of FIG. 5 above, the speech detection requires a detection time of several tens of
msec, so the S / N at the beginning of several tens of msec is bad and there is a disadvantage that
it is difficult to recognize the consonant at the beginning. With this arrangement, this drawback
is completely eliminated.
[0098]
As described above, according to the present embodiment, the user voice with a better S / N ratio
than that of the second embodiment is extracted by switching to the monaural playback only
when the voice recognition function is required, usually in stereo playback. This can further
improve the recognition performance.
09-05-2019
27
[0099]
Fourth Embodiment FIG. 9 is a block diagram showing the configuration of a voice recognition
device for AV equipment according to a fourth embodiment of the present invention.
The speech recognition device 7 of FIG. 9 corresponds to the speech recognition device 7
provided in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 2channel signal is output from the AV processing unit 3, and 2-channel sound is output through
the two speakers 9 a and 9 b included in the speaker unit 9.
[0100]
In FIG. 9, the speech recognition device 7 includes a complete monaural unit 75, a monaural
degree determination unit 76, an arbitrary monaural unit 77, an echo canceller 73, a speech
recognition unit 74, and a start instruction unit 792. , An end instruction unit 793, a state setting
unit 791, and a switching unit 78. That is, the speech recognition device 7 of FIG. 9 is obtained
by adding the monaural degree determination unit 76 and the arbitrary degree monaural
conversion unit 77 to the speech recognition device 7 (third embodiment) of FIG. The complete
monaural unit 75 is called “complete” to distinguish it from the arbitrary monaural unit 77,
but is similar to the monaural unit 53 in FIG. 7). The signals input to the speakers 9a and 9b are
two-channel signals output from the AV processing unit 3 of FIG.
[0101]
The two-channel signal directed to the speakers 9a and 9b is branched and input to the complete
monaural unit 75. The complete monaural unit 75 completely converts the two-channel signal
into monaural. Furthermore, the 2-channel signal directed to the speakers 9a and 9b is branched
and input to the monaural degree determination unit 76 and the arbitrary degree monaural
conversion unit 77, and the monaural degree determination unit 76 determines the monaural
degree of the 2-channel signal. In response to the determination result of the monaural level
determination unit 76, the arbitrary degree monaural unit 77 monauralizes the two-channel
signal to an arbitrary degree.
09-05-2019
28
[0102]
That is, the arbitrary degree monaural unit 77 performs processing to increase the monaural
degree of the two-channel signal in accordance with the monaural degree of the two-channel
signal. For that purpose, a function (processing strength determination characteristic; reference
numeral 101 in FIG. 12A) for determining based on the monaural degree the arbitrary monaural
unit 77 determines at what intensity the processing to increase the monaural degree should be
performed. (Represented by).
[0103]
Here, the monaural degree of the two-channel signal means the ratio of the signal component
(monaural component) contained commonly in both channels in the signal, and if the signals of
both channels are completely uncorrelated to each other The monaural degree is "0", and when
the monaural degree is the same, the monaural degree is "1".
[0104]
A signal (microphone output) output from the microphone 6 and a signal (complete monaural
signal) output from the complete monaural unit 75 are supplied to the echo canceller 73, and the
echo canceller 73 receives the signal from the microphone output of the user. Only the signal
corresponding to the voice (hereinafter, user voice) is extracted.
The operation principle of the echo canceller 73 has been described in the first embodiment.
[0105]
The activation instruction unit 792 instructs activation of the speech recognition operation. The
end instruction unit 793 instructs the end of the speech recognition operation. The state setting
unit 791 receives an instruction from the start instruction unit 792 and the end instruction unit
793, and sets the operation state of the voice recognition unit 74 (that is, "ON" / "OFF" the voice
recognition operation).
[0106]
09-05-2019
29
The signal output from the arbitrary monaural unit 77 (hereinafter, an arbitrary monaural signal)
and the two-channel signal from the AV processing unit 3 of FIG. 1 are supplied to the switching
unit 78, and the switching unit 78 When the setting unit 791 sets the voice recognition
operation to the "ON" state, the input to the speakers 9a and 9b is switched from the 2-channel
signal (in the "OFF" state) to an arbitrary monaural signal. Further, when set to the "OFF" state,
the input to the speakers 9a and 9b is switched from the arbitrary monauralization signal (in the
"ON" state) to the two-channel signal.
[0107]
The voice recognition unit 74 executes / ends voice recognition according to the setting of the
state setting unit 791. That is, the user voice from the echo canceller 73 is recognized, and a
control signal indicated by the voice is generated. The control signal generated in this manner is
transmitted to the controller 4 of FIG. 1, and the controller 4 controls the receiving unit 2 and the
AV processing unit 3 to execute processing such as switching of reception channels in the
television receiver. Ru.
[0108]
The hardware configuration of the speech recognition device 7 of FIG. 9 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, the
algorithm of (a) to (c) described in the first embodiment, the algorithm of (e) described in the
second embodiment, and (f) described in the third embodiment In addition to the above
algorithms, (g) an algorithm for determining the monaural degree of the two-channel signal, and
(h) an algorithm for monifying the two-channel signal to any degree are described. The CPU 10
operates in accordance with the above program while using the RAM 11 as a work area. Thus,
the function of each block shown in FIG. 9 is realized.
[0109]
The start instruction unit 792 and the end instruction unit 793 are realized by the buttons
constituting the control panel of FIG. In addition, the functions of the blocks other than the start
instruction unit 792 and the end instruction unit 793 may be realized by dedicated hardware
09-05-2019
30
circuits instead of software.
[0110]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. In the speech recognition device 7 of FIG. 7, in the speech recognition
operation state ("ON" state), the stereo signal is completely monaural and then reproduced by the
speakers 9a and 9b, so there is no stereo sense at all. was there. On the other hand, in the voice
recognition device 7 of FIG. 2 that performs stereo reproduction even in the voice recognition
operation state, the amount of echo cancellation of the echo canceller 14 when an audio signal
with low monaural degree where the assumption of Sr ≒ Sl largely falls is input. The problem is
that the However, as described above, in a stereo program such as a normal TV, as the
cancellation amount of the echo canceller 14 is greatly deteriorated, the audio signal with low
correlation is less mixed in the left and right channels. In most cases, the left and right sound
sources are mixed at a relatively weak level to the center sound source mixed evenly. For this
reason, even in stereo broadcasting, it is often the case that the correlation between the left and
right channels is extremely strong and the assumption of SrSrSl holds. The problem is how to get
through the low correlation time.
[0111]
Therefore, in the present embodiment, the monaural signal is completely input to the echo
canceller 73 by the complete monaural unit 75, but the speech recognition standby state ("OFF"
state) to the speakers 9a and 9b. Then, in the operation state, the output of the monaural unit 77
is input. The monaural degree determination unit 76 monitors the monaural degree of the signal,
and only when it is determined that the monaural degree is low, the arbitrary degree monaural
unit 77 strengthens the degree of the monaural processing. As a result, it is possible to always
maintain a certain correlation or more in the left and right channels.
[0112]
When the voice recognition function is to be used, the start command unit 792 of FIG. 9 sends
the start signal of the voice recognition function to the state setting unit 791 as in the voice
recognition device 7 of FIG. The setting unit 791 controls the switching unit 78 to switch the
input signal to the speakers 9a and 9b from the stereo signal to the output of the monaural unit
09-05-2019
31
77 arbitrarily, and the state of voice recognition is input to the speakers 9a and 9b. From the
standby state, transition is made to the operating state in which the monaural signal of arbitrary
degree is input to the speakers 9a and 9b. The monaural degree determination unit 76 constantly
monitors the monaural degree of the audio signal, and only when it is determined that the
monaural degree is low, the arbitrary monaural unit 77 performs the monaural processing for
the arbitrary degree. Although the sense of stereo of the audio signal is lost only for a
momentary time where the monaural degree is low, a sufficient amount of echo cancellation can
be obtained.
[0113]
FIG. 10 shows details of the monaural degree determination unit 76 of FIG. In FIG. 10, the
monaural degree determination unit 76 includes an adder 81, a subtractor 82, a level
comparator 83, and a monaural degree calculation unit 84.
[0114]
In the case of a perfect monaural signal, since Sr = S1, the output of the adder 81 is 2Sr, the
output of the subtracter 82 is “0”, and {(output level of subtractor 82) / (adder 81 The output
of the level comparator 83 for obtaining the output level) is also "0". On the other hand, in the
case of a perfect stereo signal, that is, when Sr and Sl are completely uncorrelated, the output of
the adder 81 is Sr + Sl and the output of the subtractor 82 is Sr-Sl. Since Sr and Sl are completely
uncorrelated, the level of Sr + Sl and the level of Sr-Sl are equal, and the output of the level
comparator 83 becomes "1". Next, the monaural degree calculation unit 84 calculates {1- (level
comparator output)}, and the monaural degree determination unit 76 outputs “1” for a perfect
monaural signal and “0” for a perfect stereo signal Do.
[0115]
As described above, the monaural degree determination unit 76 can output the value between 1
and 0 according to the monaural degree of the input signal and monitor the value to determine
the monaural degree of the input signal.
[0116]
09-05-2019
32
FIG. 11 shows details of the arbitrary degree monaural unit 77 of FIG.
Referring to FIG. 11, the arbitrary degree monaural unit 77 includes a processing strength
determination unit 91, attenuators 921 to 924, and adders 931 and 932. The output of the
monaural degree calculation unit 84 of FIG. 10 is input to the processing strength determination
unit 91 of FIG. 11, and the processing strength determination unit 91 determines the processing
strength of monaural processing according to this value. The amount of attenuation of the
attenuators 921 to 924 is controlled according to the processing intensity.
[0117]
FIG. 12 is a diagram showing the strength of monaural processing performed by the processing
strength determination unit 91 of FIG. 11 and the gains (attenuation amounts) realized through
the attenuators 921 to 924 of FIG. In FIG. 12A, a characteristic 101 indicates the relationship
between the monaural degree input to the processing intensity determination unit 91 in FIG. 11
and the processing intensity output from the processing intensity determination unit 91. In FIG.
12B, the characteristic 102 and the characteristic 103 indicate how the gains of the attenuators
921 to 924 are controlled by the processing strength output from the processing strength
determination unit 91. Characteristic 102 shows the gains of attenuator 921 and attenuator 924,
and characteristic 103 shows the gains of attenuator 922 and attenuator 923.
[0118]
In the present embodiment, when the monaural degree of the input signal is in the range of 1.0
to 0.5, as indicated by the characteristic 101, the processing strength determination unit 91 sets
the attenuators 921 to 924 as the monaural processing processing strength. Output "0". When
the monaural processing strength is “0”, the arbitrary monaural unit 77 does not perform the
monaural processing as shown by the characteristic 102 and the characteristic 103.
[0119]
The processing strength determination unit 91 outputs the monaural processing strength of
“0” or more only when the monaural degree of the input signal becomes 0.5 or less. For
example, when a complete stereo signal of monaural degree “0” is input, the processing
09-05-2019
33
strength determination unit 91 outputs “0.5” as the monaural processing strength to the
attenuators 921 to 924, and at this time, The arbitrary degree monaural unit 77 outputs a signal
having a monaural degree of “0.5”.
[0120]
According to the control method shown in FIG. 12, when the processing intensity of monaural
processing is "0", the right channel signal is Sr, the left channel signal is Sl, and the stereo signal
from the AV processing unit 3 in FIG. Input to 9a and 9b. When the processing intensity is "1",
both channels are {(Sr + Sl) / 2}, and a completely monaural signal is input to the speakers 9a and
9b. In the characteristics shown in FIG. 12, the maximum value of the processing intensity is
limited to 0.5. The reason for limiting the range to such a range is to obtain a practically
sufficient amount of echo cancellation while securing naturalness in terms of hearing.
[0121]
Even if the degree of monaural conversion is limited in this way, the sense of stereo of the audio
signal is impaired in the operation state of speech recognition, though for a short time.
Therefore, at the same time as the use of the voice recognition function ends, an end signal of the
voice recognition function is sent from the end instruction unit 793 to the state setting unit 791,
and the state setting unit 791 controls the switching unit 78 to the speakers 9a and 9b. The
audio signal is switched from the output of the monaural conversion unit 77 to the stereo signal,
the state of voice recognition is changed to the speaker 9a and 9b from the operation state in
which the monaural signal is input to the speakers 9a and 9b. Switch to the input standby state.
As a result, it is possible to always obtain a sufficient amount of echo cancellation while securing
a certain degree of stereo feeling or more.
[0122]
As described above, according to the present embodiment, even in the operation state of the
voice recognition function, in the case of an ordinary stereo signal, reproduction is performed as
it is, and monauralization processing is performed only on stereo signals with extremely low
monaural degree. By the addition, although the echo cancellation effect is slightly reduced
compared to the third embodiment, the deterioration of the stereo feeling can be suppressed to a
much smaller level while always securing the echo cancellation amount of a certain level or more.
09-05-2019
34
[0123]
Fifth Embodiment FIG. 13 is a block diagram showing the configuration of a voice recognition
device for AV equipment according to a fifth embodiment of the present invention.
The speech recognition device 7 of FIG. 13 corresponds to the speech recognition device 7
provided in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 4channel signal is output from the AV processing unit 3, and 4-channel sound is output through
the four speakers 9 a to 9 d included in the speaker unit 9.
[0124]
In FIG. 13, the speech recognition device 7 includes a two-channelization unit 115, a
monauralization unit 116, one echo canceller 113, a speech recognition unit 114, a start
instruction unit 1192, an end instruction unit 1193, and speech. A detection unit 117, a state
setting unit 1191, and a switching unit 118 are provided. That is, in the speech recognition
apparatus 7 (third embodiment) of FIG. 7, the speech recognition apparatus 7 of FIG. 13 switches
the switching unit 56 for switching between two choices among the three choices. In addition to
the switching unit 118, a 2-channeling unit 115 and a voice detection unit 117 are added. The
voice detection unit 117 is the same as the voice detection unit 37 (see the second embodiment)
of FIG. 5. The signals input to the speakers 9a to 9d are 4-channel signals output from the AV
processing unit 3 of FIG.
[0125]
The four-channel signal directed to the speakers 9a to 9d is branched and input to the twochannelization unit 115, and the two-channelization unit 115 converts the four-channel signal
into two channels. The output of the 2-channelization unit 115 (hereinafter, referred to as 2channelization signal) is input to the monauralization unit 116, and the monauralization unit 116
monarizes the 2-channelization signal.
[0126]
09-05-2019
35
The signal (microphone output) output from the microphone 6 and the signal (monaural signal)
output from the monaural unit 116 are supplied to the echo canceller 113, and the echo
canceller 113 receives the user's voice from the microphone output. Only the corresponding
signal (hereinafter, user voice) is extracted. The operation principle of the echo canceller 113 has
been described in the first embodiment.
[0127]
The activation instruction unit 1192 instructs activation of the speech recognition operation. The
end instruction unit 1193 instructs the end of the speech recognition operation. The state setting
unit 1191 receives an instruction from the start instruction unit 1192 and the end instruction
unit 1193, and sets the operation state of the speech recognition unit 114 (that is, the speech
recognition operation is "ON" / "OFF"). The voice detection unit 117 receives the output
(monaural signal) of the monaural unit 116 and the output (user voice) of the echo canceller
113, and the voice detection unit 117 determines the user based on the level ratio of both
outputs. Detect voice.
[0128]
A signal (monaural signal) output from the monaural unit 116, a signal (two-channel signal)
output from the 2-channel unit 115, and a 4-channel signal from the AV processing unit 3 of FIG.
When the state setting unit 1191 sets the voice recognition operation to the "ON" state, the
switching unit 118 receives an input to the speakers 9a to 9d from the 4-channel signal (in the
"OFF" state) 2 Switch to channelized signal. Furthermore, in the "ON" state, when the voice
detection unit 117 detects user voice, the input to the speakers 9a to 9d is switched from the 2channel signal (in the "ON" state) to the monaural signal. Also, when the state setting unit 1191
sets the voice recognition operation to the “OFF” state, the switching unit 118 inputs the
signals to the speakers 9 a to 9 d from the 2-channel signal (in the “ON” state) or the monaural
signal Switch to 4-channel signal.
[0129]
The voice recognition unit 114 executes / ends voice recognition in accordance with the setting
of the state setting unit 1191. That is, it recognizes the user's voice from the echo canceller 113
09-05-2019
36
and generates a control signal indicated by the voice. The control signal generated in this manner
is transmitted to the controller 4 of FIG. 1, and the controller 4 controls the receiving unit 2 and
the AV processing unit 3 to execute processing such as switching of reception channels in the
television receiver. Ru.
[0130]
The hardware configuration of the speech recognition device 7 of FIG. 13 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, the
algorithms (a) to (c) described in the first embodiment and the algorithms (d) and (e) described in
the second embodiment (however, switching the input signal to the speaker) In addition to the
algorithm of (f) in the third embodiment, the algorithm for (i) 2-channeling the 4-channel signal
is described in addition to the algorithm (f) described in the third embodiment. The CPU 10
operates in accordance with the above program while using the RAM 11 as a work area. Thus,
the function of each block shown in FIG. 13 is realized.
[0131]
The start instruction unit 1192 and the end instruction unit 1193 are realized by the buttons
constituting the control panel of FIG. Further, the functions of the blocks other than the start
instruction unit 1192 and the end instruction unit 1193 can be realized by dedicated hard
circuits instead of software.
[0132]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. In 4-channel stereo, since the sound creation characterized by 360degree sound image localization is made, the correlation between channels is extremely weak.
Therefore, when reproducing 4-channel stereo signals through the four speakers 9a to 9d
included in the speaker unit 9 of FIG. 1, the voice recognition device 7 of FIG. 2 can not obtain a
sufficient amount of echo cancellation, and as a result, it is accurate Often can not be performed.
Therefore, as in the speech recognition device 7 of FIG. 5, the speech recognition device 7 of FIG.
7, and the speech recognition device 7 of FIG. 9, the speaker 9a is only in the operation state of
speech recognition or only when the user emits speech. The sound from ~ 9d will be monaural to
ensure the required echo cancellation amount.
09-05-2019
37
[0133]
However, when a 4-channel stereo signal is converted to a monaural signal at a stretch and the
user is made to hear it, the user's dissatisfaction due to the loss of stereo feeling (threedimensional effect) is extremely large. Therefore, in the present embodiment, in the operation
state of voice recognition, the 4-channel signal is converted into two channels to make the user
listen to 2-channel stereo sound, and furthermore, monaural conversion is performed only when
the user emits voice in this operation state. Have the user listen to the sound. As a result, even in
the operating state, a sufficient echo cancellation amount can be secured while maintaining a
reasonable sense of stereo.
[0134]
In FIG. 13, a 2-channeling signal is generated from the input 4-channel signal by the 2-channel
processing unit 115, and a monaural processing signal is generated by the monaural processing
unit 116. The echo canceller 113 always receives a monaural signal. In a standby state in which
the voice recognition function is not used, four channel signals are input to the speakers 9a to
9d.
[0135]
When the voice recognition function is to be used, first, the start command unit 1192 of FIG. 11
sends a start signal of the voice recognition function to the state setting unit 1191, and the state
setting unit 1191 controls the switching unit 118 to The input signals to the speakers 9a to 9d
are switched from four channel signals to two channels and the voice recognition state is two
channels to the speakers 9a to 9d from the standby state in which the four channels are input to
the speakers 9a to 9d. Move to the operating state where is input. Conversion from 4 channels to
2 channels is possible by adding the signals of the right and left front channels to obtain a right
channel signal and adding the left and right front and rear channel signals to produce a left
channel signal. The monaural conversion can be performed by adding the four channel signals or
adding two left and right two channel conversion signals.
[0136]
09-05-2019
38
In the operating state, the voice detection unit 117 monitors the levels of the monaural signal
and the echo canceller output signal, and when the level of the echo canceller output signal rises
above the level expected from the monaural signal, the user The switching unit 118 switches the
input of the speakers 9a to 9d from the two-channel signal to the monaural signal.
[0137]
As described above, according to the present embodiment, the reproduction mode such as 4channel reproduction when the speech recognition function is not used, 2-channel reproduction
when the speech recognition function is activated, and monaural reproduction when speech is
input By gradually switching the signal, it is possible to ensure a proper stereo feeling even in the
standby state, and to obtain a sufficient amount of echo cancellation.
[0138]
Sixth Embodiment FIG. 14 is a block diagram showing a configuration of a voice recognition
device for AV equipment according to a sixth embodiment of the present invention.
The speech recognition device 7 of FIG. 14 corresponds to the speech recognition device 7
provided in the AV device of FIG.
However, in the present embodiment, in the AV apparatus, a 2-channel signal is output from the
AV processing unit 3, and 2-channel sound is output through the two speakers 9 a and 9 b
included in the speaker unit 9.
[0139]
In FIG. 14, the voice recognition device 7 includes a monaural unit 125, one echo canceller 123,
a voice recognition unit 124, a start command unit 1282, a finish command unit 1283, a state
setting unit 1281, and a switching unit. 127 and an adaptive sound generator 126. That is, the
speech recognition device 7 of FIG. 14 is obtained by adding an adaptive sound generation unit
126 to the speech recognition device 7 (third embodiment) of FIG. 7. The signals input to the
speakers 9a and 9b are two-channel signals output from the AV processing unit 3 of FIG.
09-05-2019
39
[0140]
The adaptive sound generation unit 126 generates a monaural adaptive sound in association
with the setting of the state setting unit 1281. That is, in response to the transition of the speech
recognition operation from the “OFF” state to the “ON” state by the setting of the state
setting unit 1281, the adaptive sound generation unit 126 generates a monaural adaptive sound.
[0141]
The above-mentioned adaptive sound has the effect of promoting the adaptive operation of the
echo canceller 123. That is, as the voice recognition operation shifts from the "OFF" state to the
"ON" state, the outputs from the speakers 9a and 9b are switched from the two-channel sound to
the monaural sound. Assuming that the level of the 9b output is 0 (that is, no sound) or a value
close to 0, the echo canceller 123 does not proceed with the adaptation of the digital filter 123a
that has been adapted to two channels to monaural.
[0142]
At this time, when a high level monaural sound is output from the speakers 9a and 9b, the echo
canceller 123 can not cancel the sound. As a result, echo sound may be mixed into the speech
recognition unit 124, and the user speech may not be recognized correctly. Therefore, when the
speech recognition operation shifts from the "OFF" state to the "ON" state, the monaural adaptive
sound is output from the speakers 9a and 9b, and the digital filter 123 is forcibly adapted to
monaural. The operations of the components other than the adaptive sound generation unit 126
are the same as in the third embodiment, and thus the description thereof is omitted.
[0143]
The hardware configuration of the speech recognition device 7 of FIG. 14 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, the
algorithm of (a) to (c) described in the first embodiment, the algorithm of (e) described in the
second embodiment, and (f) described in the third embodiment And (j) an algorithm (or sampling
data of the adaptive sound) for generating the adaptive sound is described. The CPU 10 operates
in accordance with the above program while using the RAM 11 as a work area. Thus, the
09-05-2019
40
function of each block shown in FIG. 14 is realized.
[0144]
The start instruction unit 1282 and the end instruction unit 1283 are realized by the buttons
constituting the control panel of FIG. Also, the functions of the blocks other than the start
instruction unit 1282 and the end instruction unit 1283 can be realized by dedicated hard
circuits instead of software.
[0145]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. The speech recognition device 7 shown in FIG. 14 has the disadvantage
that the speech recognition device 7 shown in FIG. 7 has, that is, the echo cancellation amount of
the echo canceller 54 is not sufficient immediately after transition from the speech recognition
standby state to the operation state. It is a solution.
[0146]
In the speech recognition device 7 of FIG. 7, in the standby state of speech recognition, stereo
signals are input to the speakers 9a and 9b, and a monaural signal is input to the echo canceller
54. Not complete echo cancellation can not be performed. Therefore, when the voice recognition
function is to be used, the echo canceller 54 is fully adapted by switching to the operating state
and switching the input signals to the speakers 9a and 9b to monaural signals, and complete
echo cancellation is performed. It was like that. However, even if this is done, the adaptation of
the echo canceler 54 does not proceed unless the speakers 9a and 9b make a sound. Therefore,
when a long silent interval continues immediately after switching, and the user starts to output
sound while inputting a voice, the echo sound from the speakers 9a and 9b can not be
sufficiently canceled.
[0147]
Therefore, in the voice recognition device 7 of FIG. 14, the monaural adaptive sound for
09-05-2019
41
promoting the adaptation of the echo canceller 123 is transmitted from the adaptive sound
generation unit 126 to the speakers 9a and 9b immediately after the transition from the standby
state to the operation state. It is configured to input for a few seconds. As the adaptive sound,
synthetic speech such as "Please input speech" can be considered.
[0148]
As described above, according to the present embodiment, by causing the monaural adaptive
sound to be output from the speakers 9a and 9b immediately after transitioning from the
standby state to the operating state, a sufficient amount of echo cancellation can be obtained
even immediately after the transition. It is possible to guarantee.
[0149]
Seventh Embodiment FIG. 15 is a block diagram showing the configuration of a voice recognition
device for AV equipment according to a seventh embodiment of the present invention.
The speech recognition device 7 of FIG. 15 corresponds to the speech recognition device 7
provided in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 2channel signal is output from the AV processing unit 3, and 2-channel sound is output through
the two speakers 9 a and 9 b included in the speaker unit 9.
[0150]
In FIG. 15, the speech recognition device 7 includes a monaural unit 135, one echo canceller
133, a speech recognition unit 134, a start instruction unit 1382, an end instruction unit 1383, a
state setting unit 1381, and a switching unit. 136, a cancellation monitoring unit 1371, a voice
detection unit 1372 and an attenuation unit 1373. That is, the voice recognition device 7 of FIG.
15 is obtained by adding a cancellation monitoring unit 1371, a voice detection unit 1372 and
an attenuation unit 1373 to the voice recognition device 7 of the third embodiment (FIG. 7). The
signals input to the speakers 9a and 9b are two-channel signals output from the AV processing
unit 3 of FIG.
[0151]
09-05-2019
42
The cancellation monitoring unit 1371 receives the output (monaural signal) of the monaural
unit 135 and the output (user voice) of the echo canceller 133, and monitors the level fluctuation
of each output, thereby causing echoing in the echo canceller 133. It is determined whether the
sound is sufficiently canceled (that is, whether the adaptation of the digital filter 133a to
monaural is sufficiently advanced). That is, when the level of the monaural signal rises sharply, if
the level of the user's voice also rises sharply, the echo is not sufficiently canceled out.
Conversely, if it hardly rises, the echo is sufficient. It can be said that
[0152]
The attenuation unit 1373 attenuates the two-channel signal input from the AV processing unit 3
of FIG. 1 in relation to the monitoring result of the cancellation monitoring unit 1371 and the
setting of the state setting unit 1381. That is, when echo echo is not sufficiently canceled in the
echo canceller 133, the setting of the state setting unit 1381 receives the transition of the speech
recognition operation from the "OFF" state to the "ON" state, and the attenuation unit 1373
Temporarily attenuate the 2-channel signal.
[0153]
By attenuating the two-channel signal as described above, it is possible to prevent the mixing of
echo sound into the speech recognition unit 134. That is, as the voice recognition operation
shifts from the "OFF" state to the "ON" state, the outputs from the speakers 9a and 9b are
switched from the two-channel sound to the monaural sound. Assuming that the level is 0 (that
is, no sound) or a value close to 0, the echo canceller 133 does not proceed with the adaptation
of the digital filter 133a adapted to two channels to monaural.
[0154]
At this time, when a high level monaural sound is output from the speakers 9a and 9b, the echo
canceller 133 can not cancel the sound. Therefore, when echoing sound is not sufficiently
canceled, the level of the monauralized signal output from the speakers 9a and 9b when the
speech recognition operation shifts from the "OFF" state to the "ON" state is By reducing the
frequency, mixing of echo sound into the speech recognition unit 134 is prevented. The
operations of components other than the cancellation monitoring unit 1371, the voice detection
09-05-2019
43
unit 1372 and the attenuation unit 1373 are the same as in the third embodiment, and thus the
description thereof is omitted.
[0155]
The hardware configuration of the speech recognition device 7 of FIG. 15 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, the
algorithm of (a) to (c) described in the first embodiment, the algorithm of (e) described in the
second embodiment, and (f) described in the third embodiment In addition to the above
algorithms, (k) an algorithm for monitoring whether echoes are sufficiently canceled, and (l) an
algorithm for attenuating a two-channel signal to a speaker are described. The CPU 10 operates
in accordance with the above program while using the RAM 11 as a work area. Thereby, the
function of each block shown in FIG. 15 is realized.
[0156]
The start instruction unit 1382 and the end instruction unit 1383 are realized by the buttons
constituting the control panel of FIG. Further, the functions of the blocks other than the start
instruction unit 1382 and the end instruction unit 1383 can be realized by dedicated hard
circuits instead of software.
[0157]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. The voice recognition device 7 of FIG. 15 has the same disadvantages as
the voice recognition device 7 of FIG. 7 as the voice recognition device 7 of FIG. 14 above, that is,
the echo of the echo canceler 54 immediately after transition from the standby state to the
operating state. It solves the drawback that the amount of cancellation is not sufficient.
[0158]
As described above, the voice recognition device 7 shown in FIG. 15 monitors the output signal
level of the monaural unit 135 and the output signal level of the echo canceller 133 to determine
09-05-2019
44
whether echo sound is sufficiently canceled. A voice detection unit 1372 that monitors the
monitoring unit 1371, the output signal level of the monaural unit 135, and the output signal
level of the echo canceller 133 to determine whether the user has made a voice, the speakers 9a
and 9b And an attenuation unit 1373 for attenuating the input signal to the input unit.
Immediately after the transition from the standby state to the operating state, the adaptation of
the echo canceller 133 is not perfect, and naturally the echo cancellation effect in the echo
canceller 133 is also bad. If, after the transition to the operating state, a silent section continues
in the input signals to the speakers 9a and 9b, and the echo canceller 133 can not adapt, the user
emits a voice and the user is emitting a voice. When sound starts to be output from the speakers
9a and 9b, the echo canceler 133 can not sufficiently cancel the echo from the speakers 9a and
9b, and the sound to the voice recognition unit 134 is reflected from the speakers 9a and 9b.
Will be mixed.
[0159]
Therefore, in the present embodiment, a voice detection unit 1372 and an attenuation unit 1373
are provided, and a silent section continues in the monauralized signal, and the cancellation
monitoring unit 1371 determines that echo echo is not sufficiently canceled by the echo canceler
133. Furthermore, when the voice detection unit 1372 detects a user voice, the attenuation unit
1373 attenuates the input signals to the speakers 9a and 9b, thereby reducing the mixing of
echo sound into the user voice. When the monaural signal to the speakers 9a and 9b changes
from silence to a sound while the user does not emit voice, the attenuation amount of the
attenuation unit 1373 is set to "0", and the monaural signals output from the speakers 9a and 9b.
The adaptation noise of the echo canceller 133 is promoted by using the noise as the adaptation
sound. In addition, when the adaptation of the echo canceller 133 progresses and the residual
echo becomes smaller, the attenuation amount is controlled to "0" also at the time of speech
detection.
[0160]
As described above, according to the present embodiment, in the state in which the amount of
echo cancellation immediately after the transition from the standby state (the speech recognition
operation is “OFF”) to the operation state (the “ON” state) is not sufficient. When sound is
emitted, it is detected and an appropriate attenuation is inserted in the input signal to the
speakers 9a and 9b to reduce the level of sound from the speakers 9a and 9b, thereby preventing
the mixing of echo sound. The speech recognition performance in the state where the echo
cancellation amount is not sufficient is improved.
09-05-2019
45
[0161]
Eighth Embodiment FIG. 16 is a block diagram showing the configuration of a voice recognition
device for AV equipment according to an eighth embodiment of the present invention.
The speech recognition device 7 of FIG. 16 corresponds to the speech recognition device 7
provided in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 2channel signal is output from the AV processing unit 3, and 2-channel sound is output through
the two speakers 9 a and 9 b included in the speaker unit 9.
[0162]
In FIG. 16, the speech recognition device 7 includes a monaural unit 145, one echo canceller
143, a speech recognition unit 144, a start instruction unit 1482, an end instruction unit 1483, a
state setting unit 1481, and a switching unit. 146 and an adaptive control unit 147. That is, the
speech recognition apparatus 7 of FIG. 16 is obtained by adding an adaptive control unit 147 to
the speech recognition apparatus 7 (third embodiment) of FIG. 7. The signals input to the
speakers 9a and 9b are two-channel signals output from the AV processing unit 3 of FIG.
[0163]
The adaptive control unit 147 controls the adaptation speed of the adaptive digital filter 143 a in
the echo canceller 143 in relation to the setting of the state setting unit 1481. That is, the digital
filter 143a has a variable adaptation speed to the input signal, and the adaptive control unit 147
stores in advance a fast adaptation speed for monaural and a slow adaptation speed for two
channels. The adaptive control unit 147 receives the voice recognition operation transitioning
from the “OFF” state to the “ON” state by the setting of the state setting unit 1481 (at the
same time, the speaker output switches from 2-channel sound to monaural sound). Changes the
adaptation speed of the digital filter 143a from a slow adaptation speed to a fast adaptation
speed. Also, in response to the speech recognition operation transitioning from the "ON" state to
the "OFF" state, the adaptation speed of the digital filter 143a is changed from the fast adaptation
speed to the slow adaptation speed. The operations of the components other than the adaptive
control unit 147 are the same as in the third embodiment, and thus the description thereof is
omitted.
09-05-2019
46
[0164]
The hardware configuration of the speech recognition device 7 of FIG. 16 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, the
algorithm of (a) to (c) described in the first embodiment, the algorithm of (e) described in the
second embodiment, and (f) described in the third embodiment In addition to the algorithm of
(m), an algorithm for controlling the adaptation speed of the echo canceller is described. The CPU
10 operates in accordance with the above program while using the RAM 11 as a work area.
Thereby, the function of each block shown in FIG. 16 is realized.
[0165]
The start instruction unit 1482 and the end instruction unit 1483 are realized by the buttons
constituting the control panel of FIG. Also, the functions of the blocks other than the start
instruction unit 1482 and the end instruction unit 1483 can be realized by dedicated hard
circuits instead of software.
[0166]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. As is well known, the adaptation of the echo canceller 143 is to
sequentially correct the estimated impulse response in the direction in which the output becomes
"0". The impulse response of the system (echoic path) between the speakers 9a and 9b and the
microphone 6 changes momentarily under the influence of furniture, people, windows, curtains,
etc., so a sufficient amount of cancellation is required without the adaptive operation. I can not
get it. However, if the input signal to the echo canceller 143 includes a signal that can not be
canceled no matter how the impulse response is corrected, such as noise, an error occurs in the
estimated impulse response, and this error causes the echo cancellation amount to make worse.
[0167]
In the sequential correction of the estimated impulse response, the adaptation speed can be
controlled by changing the amount of correction per time. When the correction amount per time
09-05-2019
47
is large, the adaptation speed is fast, and when the correction amount is small, the adaptation
speed is slow. When the adaptation speed is increased, the system fluctuation is strong (i.e., the
impulse response fluctuation of the echo path can be quickly followed), but the noise is weak (i.e.
the noise tends to make the adaptive operation unstable). Conversely, if the adaptation speed is
reduced, the system will be less susceptible to fluctuations, but will be more susceptible to noise.
Therefore, in an actual apparatus, an adaptation speed was selected so as to satisfy both of the
system's ability to track fluctuations and noise resistance.
[0168]
In the speech recognition device 7 of FIG. 7, in the speech recognition operating state, monaural
sound is output from the speakers 9a and 9b, and the echo sound of the monaural sound is
canceled by the monaural signal, so that relatively fast adaptation is performed. Good operation
is also possible with speed. However, in the standby state, since echo sound of stereophonic
sound is canceled by the monauralized signal, the error included in the estimated impulse
response becomes extremely large at the same adaptation speed as the operating state. Since the
echo canceller 143 repeats the adaptive operation to cancel signal components that can not be
canceled in principle, the impulse response that has been estimated will be destroyed. As
described above, the voice recognition device 7 of FIG. 7 has a disadvantage that the amount of
echo cancellation immediately after the transition from the standby state to the operation state is
extremely small because the adaptation performance in the standby state is poor.
[0169]
Therefore, in the speech recognition device 7 of FIG. 16, by providing the adaptive control unit
147 for controlling the adaptation speed of the echo canceller 143, a sufficient amount of echo
cancellation can be obtained immediately after switching from the standby state to the operating
state. I made it. That is, the adaptive control unit 147 sets, to the echo canceller 143, different
adaptation speeds in a standby state in which stereo signals are input to the speakers 9a and 9b
and an operation state in which monaural signals are input. Specifically, in the standby state, the
adaptive control unit 147 reduces the adaptation speed to secure the estimation accuracy of the
impulse response. On the other hand, in the operating state, by increasing the adaptation speed, a
sufficient echo cancellation effect can be obtained immediately after the transition from the
standby state to the operating state.
[0170]
09-05-2019
48
As described above, according to the present embodiment, the adaptive speed of the echo
canceller 143 (the adaptive digital filter 143a therein) is set to the fast speed and the standby
state when the voice recognition unit 144 is set to the active state. Since the control is performed
at a low speed when it is running, echo cancellation suitable for each of monaural and
multichannel can be performed. That is, when multi-channel sound is output from the speakers
9a and 9b, there are a lot of stereo components that are noise when viewed from the adaptive
digital filter 143a, so the noise resistance is improved by setting the slow adaptation speed, while
In the case of monauralized sound, since there is no stereo component, by making the adaptation
speed fast, it is possible to improve the followability to the fluctuation of the impulse response of
the echo path.
[0171]
Further, by changing the adaptation speed of the echo canceller 143 according to the state of the
speech recognition operation as described above, an excellent echo cancellation effect can be
realized even immediately after the transition from the standby state to the operation state.
[0172]
Ninth Embodiment FIG. 17 is a block diagram showing the configuration of a speech recognition
system for AV equipment according to a ninth embodiment of the present invention.
The speech recognition device 7 in FIG. 17 corresponds to the speech recognition device 7
provided in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 2channel signal is output from the AV processing unit 3, and 2-channel sound is output through
the two speakers 9 a and 9 b included in the speaker unit 9.
[0173]
In FIG. 17, the speech recognition device 7 includes a monauralization unit 155, one echo
canceller 153, a speech recognition unit 154, a start instruction unit 1582, an end instruction
unit 1583, a state setting unit 1581, and a switching unit. An adaptive control unit 157 is
provided. That is, the speech recognition device 7 of FIG. 17 has the same configuration as that
of the speech recognition device 7 (eighth embodiment) of FIG. The difference from the speech
09-05-2019
49
recognition device 7 of FIG. 16 is the following point. That is, the sound signal from the AV
processing unit 3 in FIG. 1 has two channels (stereo) or monaural, and the AV processing unit 3
in FIG. Will be given further. The signals input to the speakers 9a and 9b are two-channel or
monaural signals output from the AV processing unit 3 of FIG.
[0174]
The adaptive control unit 157 controls the adaptation speed of the adaptive digital filter 153 a in
the echo canceller 153 in association with the setting of the state setting unit 1581 and the
monaural / stereo identification signal. That is, the digital filter 153a has a variable adaptation
speed to the input signal, and the adaptive control unit 157 previously stores a fast adaptation
speed for monaural and a slow adaptation speed for two channels. The adaptive control unit 157
receives the voice recognition operation transitioning from the “OFF” state to the “ON” state
(at the same time, the speaker output switches from 2-channel sound to monaural sound) by the
setting of the state setting unit 1581. The adaptation speed of the digital filter 153a is changed
from the slow adaptation speed to the fast adaptation speed. Also, in response to the speech
recognition operation transitioning from the "ON" state to the "OFF" state, the adaptation speed
of the digital filter 153a is changed from the fast adaptation speed to the slow adaptation speed.
[0175]
However, the adaptive control unit 157 changes the adaptation speed as described above only
when the monaural / stereo identification signal indicates stereo, and when monaural is
displayed, regardless of the setting of the state setting unit 1581, The adaptation speed of the
digital filter 153a is set to a high adaptation speed. The operations of components other than the
adaptive control unit 157 are the same as in the eighth embodiment, and thus the description
thereof is omitted.
[0176]
The hardware configuration of the speech recognition device 7 of FIG. 17 is the same as that of
FIG. In FIG. 3, the program stored in the ROM 12 is the same as that of the eighth embodiment.
However, with regard to the algorithm in (m) above, that is, the algorithm for controlling the
adaptation speed of the echo canceller, not only the "ON" / "OFF" state of the speech recognition
operation but also the monaural / stereo discrimination signal is referred to. Changes have been
09-05-2019
50
made to control. The CPU 10 operates in accordance with the above program while using the
RAM 11 as a work area. Thereby, the function of each block shown in FIG. 17 is realized.
[0177]
The start instruction unit 1582 and the end instruction unit 1583 are realized by the buttons
constituting the control panel of FIG. Also, the functions of the blocks other than the start
instruction unit 1582 and the end instruction unit 1583 may be realized by dedicated hardware
circuits instead of software.
[0178]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. In general TV broadcasting, there are two programs, a stereo program
and a monaural program, and an identification signal for identifying a stereo program or a
monaural program is broadcast together with the video / audio signal. On the receiving side, this
identification signal makes it possible to know whether the current program is a stereo program
or a monaural program. In the voice recognition device 7 shown in FIG. 16 above, in the standby
state where the signal processed by the monaural unit 155 is not input to the speakers 9a and
9b, regardless of whether the currently received program is a stereo program or a monaural
program. Although the adaptation speed of the echo canceller 153 is reduced, it is natural that
the user does not want to reduce the adaptation speed even in the standby state.
[0179]
In the state where the adaptation speed is reduced, the echo canceller 153 may not be able to
follow the fluctuation of the system, and when moving to the operating state at such time, a
sufficient amount of echo cancellation can not be obtained. On the other hand, if the adaptation
speed is not reduced even in the standby state, since the echo canceller 153 can always follow
the system fluctuation, a sufficient amount of echo cancellation can be secured regardless of the
transition to the operating state. It is from.
[0180]
09-05-2019
51
If the broadcast itself is a monaural program, it is possible to increase the adaptation speed even
in a standby state in which the monaural unit 155 has not made monaural. Therefore, in the
voice recognition device 7 of FIG. 17, the adaptation unit 157 first checks the identification
signal, and as a result, when the currently received program is a stereo program, the adaptation
speed of the echo canceller 153 is slowed down in the standby state. However, in the case of a
monaural program, the adaptation speed is kept high as in the operating state even in the
standby state.
[0181]
As described above, according to the present embodiment, it is determined based on the stereo /
monaural identification signal whether the sound of the program currently being received is
stereo or monaural, and in the case of monaural, the voice recognition operation is in a standby
state Even in this case, since the adaptation speed of the echo canceller 153 is not reduced, the
ability to follow changes in the impulse response of the echo path does not deteriorate, and as a
result, an excellent echo cancellation effect can be realized in the standby state. Speech
recognition performance immediately after transition to the state is enhanced.
[0182]
Tenth Embodiment FIG. 18 is a block diagram showing a configuration of a voice recognition
device for AV equipment according to a tenth embodiment of the present invention.
The speech recognition device 7 of FIG. 18 corresponds to the speech recognition device 7
provided in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 2channel signal is output from the AV processing unit 3, and 2-channel sound is output through
the two speakers 9 a and 9 b included in the speaker unit 9.
[0183]
In FIG. 18, the speech recognition device 7 includes a monauralization unit 165, one echo
canceller 163, a speech recognition unit 164, a start instruction unit 1682, an end instruction
unit 1683, a state setting unit 1681 and a switching unit. And a monaural degree determination
unit 1671 and an adaptive control unit 1672. That is, the speech recognition device 7 of FIG. 18
is obtained by adding a monaural degree determination unit 1671 to the speech recognition
09-05-2019
52
device 7 (eighth embodiment) of FIG. The monaural degree determination unit 1671 is the same
as the monaural degree determination unit 76 (see the fourth embodiment) of FIG. 9. The signals
input to the speakers 9a and 9b are two-channel signals output from the AV processing unit 3 of
FIG.
[0184]
The above-described two-channel signal is branched and input to the monaural degree
determination unit 1671, and the monaural degree determination unit 1671 determines the
monaural degree of the two-channel signal. The adaptive control unit 1672 controls the
adaptation speed of the adaptive digital filter 163a in the echo canceller 163 in relation to the
determination result of the monaural determination unit.
[0185]
That is, the adaptive control unit 1672 changes the adaptation speed of the digital filter 163a
according to the monaural degree of the two-channel signal. Preferably, the higher the mono
level, the faster the adaptation speed. To that end, the adaptive control unit 1672 determines a
function (processing strength determination characteristic; indicated by reference numeral 104
in FIG. 19) for determining based on the monaural degree at which strength the processing to
accelerate the adaptation speed should be performed. I remember. The operations of the
components other than the monaural degree determining unit 1671 and the adaptive control
unit 1672 are the same as in the eighth embodiment, and thus the description thereof is omitted.
[0186]
The hardware configuration of the speech recognition device 7 of FIG. 18 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, the
algorithm of (a) to (c) described in the first embodiment, the algorithm of (e) described in the
second embodiment, and (f) described in the third embodiment , The algorithm of (g) described in
the fourth embodiment, and the algorithm of (m) described in the eighth embodiment are
described.
[0187]
09-05-2019
53
However, the algorithm of the above (m), that is, the algorithm for controlling the adaptation
speed of the echo canceller, is controlled based on the "ON" / "OFF" state of the speech
recognition operation (eighth embodiment) Rather, changes have been made to provide control
based on the monaural degree of the two channel signal to the speaker. The CPU 10 operates in
accordance with the above program while using the RAM 11 as a work area. Thus, the function
of each block shown in FIG. 18 is realized.
[0188]
The start instruction unit 1682 and the end instruction unit 1683 are realized by the buttons
constituting the control panel of FIG. Also, the functions of the blocks other than the start
instruction unit 1682 and the end instruction unit 1683 can be realized by dedicated hard
circuits instead of software.
[0189]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. The speech recognition device 7 shown in FIG. 18 solves the drawback
that the adaptation accuracy of the echo canceller 163 is degraded when a signal with a low
monaural degree is input to the speech recognition device 7 shown in FIG. As described above,
the adaptation of the echo canceller 163 is to sequentially correct the estimated impulse
response in the direction in which the output becomes "0", and it is erased no matter how the
impulse response is corrected like noise. If an impossible signal is included in the input signal to
the echo canceller 163, an error occurs in the estimated impulse response, and this error
degrades the echo cancellation amount.
[0190]
The same thing happens when the echo of the stereo signal is canceled by the monauralized
signal. That is, when canceling the echo sound of a stereo signal with a monaural signal, in
principle, there remains a component that can not be canceled even if the impulse response is
corrected. In the case where there are many components that can not be canceled (stereo
components), that is, in the case of a stereo signal with a low monaural degree, the echo canceler
09-05-2019
54
163 repeatedly performs adaptive operation to cancel the signal that can not be canceled in
principle. Will greatly destroy the impulse response.
[0191]
Therefore, in the speech recognition device 7 of FIG. 18, the stereo signal from the AV processing
unit 3 is analyzed to determine in principle whether the echo cancellation can be performed
accurately and whether it is a signal suitable for the adaptive operation. When it is determined
that it is suitable, the echo canceller 163 is caused to perform an adaptation operation.
[0192]
In the speech recognition device 7 of FIG. 18, it is judged by the monaural degree of the signal
whether the signal is suitable for adaptation.
As described above, the higher the monaural signal is, the higher the echo cancellation effect is,
and the impulse response can be estimated well. Therefore, first, the monaural degree
determination unit 1671 obtains the monaural degree of the stereo signal. Next, the adaptive
control unit 1672 controls the adaptation speed of the echo canceller 163 according to this
monaural degree.
[0193]
FIG. 19 is a diagram showing the characteristic of the adaptive speed control process performed
by the adaptive control unit 1672 of FIG. In FIG. 19, the characteristic 191 shows the
relationship between the monaural degree of the stereo signal going to the speakers 9a and 9b in
FIG. 18 and the adaptation speed of the echo canceller 163. As can be seen from FIG. 19, when it
is determined that the monaural degree of the stereo signal is high and suitable for adaptation,
the adaptive control unit 1672 increases the adaptation speed so as to always obtain the best
estimated impulse response. On the other hand, when it is judged that the monaural degree is
low and not suitable for adaptation, the adaptation speed is lowered to prevent the destruction of
the estimated impulse response.
[0194]
09-05-2019
55
As described above, according to the present embodiment, since the adaptation speed of the
adaptive digital filter 163a is controlled based on the monaural degree of the two-channel signal
(stereo signal), it is suitable for a two-channel signal having various monaural degrees. Echo
cancellation can be performed. That is, when the monaural degree is low, the adaptation speed is
reduced to improve the noise resistance. On the other hand, when the monaural degree is high,
noise resistance is not necessary because the stereo component as noise is small when viewed
from the adaptive digital filter 163a. Therefore, by making the adaptation speed faster, it is
possible to improve the followability to the fluctuation of the impulse response of the echo path.
As a result, particularly when the monaural degree is high, an excellent echo cancellation effect
can be realized, and the speech recognition performance immediately after the transition to the
operating state is enhanced.
[0195]
Eleventh Embodiment FIG. 20 is a block diagram showing a configuration of a voice recognition
device for AV equipment according to an eleventh embodiment of the present invention. The
speech recognition device 7 of FIG. 20 corresponds to the speech recognition device 7 provided
in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 2-channel
signal is output from the AV processing unit 3, and 2-channel sound is output through the two
speakers 9 a and 9 b included in the speaker unit 9.
[0196]
In FIG. 20, the speech recognition device 7 includes a monauralization unit 175, one echo
canceller 173, a speech recognition unit 174, a start instruction unit 1782, an end instruction
unit 1783, a state setting unit 1781, and a switching unit. And a non-volatile memory 177. That
is, the voice recognition device 7 of FIG. 20 is obtained by adding a non-volatile memory 177 to
the voice recognition device 7 (third embodiment) of FIG. 7. The signals input to the speakers 9a
and 9b are two-channel signals output from the AV processing unit 3 of FIG.
[0197]
The non-volatile memory 177 receives the power ON / OFF signal from the control panel 5 of
FIG. 1, and the non-volatile memory 177 receives the estimated impulse response held by the
09-05-2019
56
echo canceler 173 when the power is OFF. Get it and memorize it. Then, when the power is "ON",
the stored estimated impulse response is given to the echo canceller 173 (the adaptive digital
filter 173a therein). The echo canceller 173 uses the estimated impulse response given from the
non-volatile memory 177 as an initial value when starting an operation of canceling echo sound.
That is, the adaptive digital filter 173a starts estimation of the impulse response with the value
supplied from the non-volatile memory 177 as an initial value.
[0198]
The echo canceller 173 performs the same operation as the echo canceller 54 (third
embodiment) of FIG. 7 except for the difference in the initial value used when the power is ON. In
the case of the echo canceller 54, since “0” is used as an initial value when canceling the
echoing sound, echoing sound is generated immediately after the power is turned “ON” until
the adaptation of the digital filter 54a progresses. There was a problem that could not be
canceled out enough. The operations of the components other than the non-volatile memory 177
and the echo canceller 173 are the same as in the third embodiment, so the description will be
omitted.
[0199]
The hardware configuration of the speech recognition device 7 of FIG. 20 is obtained by adding a
non-volatile memory 177 to the configuration of FIG. A predetermined program is stored in
advance in the ROM 12. In this program, the algorithm of (a) to (c) described in the first
embodiment, the algorithm of (e) described in the second embodiment, and (f) described in the
third embodiment And (n) write the estimated impulse response held by the echo canceler 173 to
the non-volatile memory 177 when the power is "OFF", and give the estimated impulse response
to the echo canceler 173 when the power is "ON" The procedure is described. The CPU 10
operates in accordance with the above program while using the RAM 11 as a work area. Thereby,
the function of each block shown in FIG. 20 is realized.
[0200]
The start instruction unit 1782 and the end instruction unit 1783 are realized by the buttons
constituting the control panel of FIG. Also, the functions of the blocks other than the start
instruction unit 1782 and the end instruction unit 1783 can be realized by dedicated hard
09-05-2019
57
circuits instead of software.
[0201]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. The impulse response of the echo path from the speakers 9a and 9b to
the microphone 6 is determined by the state of acoustic reflection on walls, ceilings, floors,
furniture, people, windows, curtains and the like. Even with the same AV equipment, various
impulse responses can be obtained depending on the installation environment. Moreover, it
changes momentarily due to the movement of AV equipment, the movement of furniture, the
movement of people, the opening and closing of windows, and the like. With a fixed impulse
response, a sufficient echo cancellation effect can not be obtained. For this reason, the echo
canceller 173 of the speech recognition apparatus 7 of FIG. 7 performs adaptation sequentially
and always estimates the latest impulse response. However, the adaptation method in which the
initial value of the impulse response is "0" has a drawback that a sufficient echo cancellation
amount can not be obtained immediately after the power "ON".
[0202]
With the exception of minor changes such as people and windows, the rough impulse response
determined by the installation position of the AV device and the room shape does not
significantly change today yesterday unless the room's furniture is changed. Even if the
estimated impulse response at the time of yesterday's power supply "OFF" is used at the time of
today's power supply "ON", a reasonable amount of echo cancellation can be obtained in many
cases.
[0203]
Therefore, in the voice recognition device 7 of FIG. 20, the non-volatile memory 177 is provided,
and the estimated impulse response held by the echo canceller 173 at the time of the power
"OFF" is stored in the non-volatile memory 177. The echo canceller 173 is started with the
estimated impulse response stored in 177 as an initial value.
[0204]
As described above, according to the present embodiment, since the estimated impulse response
at the time of the power supply "OFF" is stored, and the estimation of the impulse response is
09-05-2019
58
started with the power supply "ON" as the initial value, "0" As compared with the case where the
initial value is set, the estimation error immediately after the power supply "ON" becomes
smaller, as a result, the speech recognition performance is improved.
[0205]
(Twelfth Embodiment) FIG. 21 is a block diagram showing the configuration of a voice
recognition device for AV equipment according to a twelfth embodiment of the present invention.
The speech recognition device 7 of FIG. 21 corresponds to the speech recognition device 7
provided in the AV device of FIG.
However, in the present embodiment, in the AV apparatus, a 2-channel signal is output from the
AV processing unit 3, and 2-channel sound is output through the two speakers 9 a and 9 b
included in the speaker unit 9.
[0206]
In FIG. 21, the speech recognition device 7 includes a monauralization unit 185, one echo
canceller 183, a speech recognition unit 184, a speech detection unit 187, a button switch 1882
as a start instruction unit, and an end instruction unit. , A state setting unit 1881, and a switching
unit 186. That is, the voice recognition device 7 in FIG. 21 adds the voice detection unit 187 in
the voice recognition device 7 (third embodiment) in FIG. 7, further sets the start command unit
581 as the button switch 1882 in particular, and ends In particular, the instruction unit 582 is a
time switch 1883. The voice detection unit 187 is the same as the voice detection unit 37 of FIG.
5 (see the second embodiment). The signals input to the speakers 9a and 9b are two-channel
signals output from the AV processing unit 3 of FIG.
[0207]
When the button switch 1882 is pressed, a signal instructing the activation of the voice
recognition operation is sent from the button switch 1882 to the state setting unit 1881. The
09-05-2019
59
voice detection unit 187 detects the presence or absence of the user voice and notifies the
detection result to the time switch 1883. The time switch 1883 starts the timing process by
capturing the moment when the user's voice transitions from the presence state to the non-state.
Then, when a predetermined time has elapsed from the start of clocking, a signal instructing the
end of the voice recognition operation is sent to the state setting unit 1881.
[0208]
The state setting unit 1881 receives an instruction signal from the button switch 1882 and the
time switch 1883 and sets the operation state of the speech recognition unit 184 (that is, the
speech recognition operation is "ON" / "OFF"). The operations of components other than the
voice detection unit 187, the button switch 1882, the time switch 1883, and the state setting
unit 1881 are the same as in the third embodiment, and thus the description thereof is omitted.
[0209]
The hardware configuration of the speech recognition device 7 of FIG. 21 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, the
algorithm of (a) to (c) described in the first embodiment, the algorithm of (e) described in the
second embodiment, and (f) described in the third embodiment In addition to the algorithm of
(1), (o) a procedure is described which performs clocking and transmits an end command signal
when a predetermined time has elapsed from the start of clocking. The CPU 10 operates in
accordance with the above program while using the RAM 11 as a work area. Thereby, the
function of each block shown in FIG. 21 is realized.
[0210]
The button switch 1882 is realized by any of the buttons constituting the control panel of FIG.
Also, instead of software realizing the function of each block other than the button switch 1882,
it can also be realized by a dedicated hard circuit.
[0211]
09-05-2019
60
The operation of the speech recognition device 7 for AV device configured as described above
will be described below. In this embodiment, specific examples of the start instruction unit 581
and the end instruction unit 582 in the speech recognition device 7 of FIG. 7 are shown. When
the user intends to use the voice recognition function, the user first presses the button switch
1882 corresponding to the start instruction unit 581 of FIG. 7. Then, an instruction to switch
from the standby state (in which the voice recognition operation is "OFF") to the operating state
("ON" state) is issued to the state setting unit 1881, and start of time measurement to the time
switch 1883 The instruction of
[0212]
In the operating state, the voice detection unit 187 checks whether or not the user voice is input,
and when the voice is detected, the time switch 1883 resets the measurement time (that is,
returns the measurement time to 0). When a state in which no voice is detected continues and
the measurement time of the time switch 1883 exceeds a predetermined value, the time switch
1883 instructs the state setting unit 1881 to switch from the operating state to the standby state.
[0213]
As described above, according to this embodiment, the end of the voice recognition function can
be automatically performed.
[0214]
Thirteenth Embodiment FIG. 22 is a block diagram showing the configuration of a speech
recognition system for AV equipment according to a thirteenth embodiment of the present
invention.
The voice recognition device 7 of FIG. 22 corresponds to the voice recognition device 7 provided
in the AV device of FIG. However, in the present embodiment, in the AV apparatus, a 2-channel
signal is output from the AV processing unit 3, and 2-channel sound is output through the two
speakers 9 a and 9 b included in the speaker unit 9.
[0215]
09-05-2019
61
In FIG. 22, the voice recognition device 7 includes a monaural unit 195, one echo canceller 193,
a voice recognition unit 194, a voice detection unit 197, a voice switch 1982 as a start command
unit, and an end command unit. , A state setting unit 1981, and a switching unit 196. That is, the
speech recognition apparatus 7 of FIG. 22 adds the speech detection unit 197 in the speech
recognition apparatus 7 of FIG. 7 (third embodiment), further sets the activation command unit
581 to the speech switch 1982 in particular, and ends In particular, the instruction unit 582 is a
time switch 1983. The speech detection unit 197 is the same as the speech detection unit 37 in
FIG. 5 (see the second embodiment). The signals input to the speakers 9a and 9b are two-channel
signals output from the AV processing unit 3 of FIG.
[0216]
The voice detection unit 197 detects the presence or absence of the user voice, and notifies the
voice switch 1982 and the time switch 1983 of the detection result. The voice switch 1982
captures a moment when the user's voice transitions from nothing to yes, and sends a signal
instructing the state setting unit 1981 to activate the voice recognition operation. The time
switch 1983 captures a moment when the user's voice transitions from the presence state to the
non-state, and starts the timekeeping process. Then, when a predetermined time has elapsed
from the start of clocking, a signal instructing the end of the voice recognition operation is sent
to the state setting unit 1981.
[0217]
The state setting unit 1981 receives command signals from the speech switch 1982 and the time
switch 1983, and sets the operation state of the speech recognition unit 194 (that is, the speech
recognition operation is "ON" / "OFF"). The operations of components other than the voice
detection unit 197, the voice switch 1982, the time limit switch 1983, and the state setting unit
1981 are the same as in the third embodiment, and thus the description thereof is omitted.
[0218]
The hardware configuration of the speech recognition device 7 of FIG. 22 is the same as that of
FIG. In FIG. 3, a predetermined program is stored in advance in the ROM 12. In this program, the
09-05-2019
62
algorithm of (a) to (c) described in the first embodiment, the algorithm of (e) described in the
second embodiment, and (f) described in the third embodiment And the procedure of (o)
described in the twelfth embodiment, and further (p) a procedure of transmitting an activation
command signal when speech is detected. The CPU 10 operates in accordance with the above
program while using the RAM 11 as a work area. Thereby, the function of each block shown in
FIG. 21 is realized.
[0219]
Note that the function of each block may be realized by a dedicated hard circuit instead of
software.
[0220]
The operation of the speech recognition device 7 for AV device configured as described above
will be described below.
In the speech recognition device 7 of FIG. 22, the speech detection unit 197 detects the speech of
the user even in the standby state. When the user tries to use the speech recognition function,
first, a relatively loud voice is emitted. The voice detection unit 197 detects this voice and sends
the detection result to the voice switch 1982. When the detection result indicates that the voice
above the level set in advance is detected, the voice switch 1982 sends a voice recognition start
instruction to the state setting unit 1981, and the state setting unit 1981 switches from the
standby state to the operating state. To direct.
[0221]
The detection result by the voice detection unit 197 is also sent to the time limit switch 1983,
and in response, the time limit switch 1983 starts time measurement. In the operating state, the
voice detection unit 197 checks whether or not the user voice is input, and when the voice is
detected, the time switch 1983 resets the measurement time (that is, returns the measurement
time to 0). When no sound is detected and the measurement time of the time switch 1983
exceeds a predetermined value, the time switch 1983 instructs the state setting unit 1981 to
switch from the operating state to the standby state.
09-05-2019
63
[0222]
The audio level at which the above-mentioned audio switch 1982 is "ON" is set considerably
higher than the audio level at which the time switch 1983 is reset. This is to prevent a relatively
large level of unerased echo sound generated in a standby state in which the cancellation effect
of the echo canceller 193 is not good is erroneously detected as the user's voice, and the
transition to the operation mode does not occur accordingly.
[0223]
As described above, according to this embodiment, the speech recognition function can be
automatically started and ended.
09-05-2019
64
Документ
Категория
Без категории
Просмотров
0
Размер файла
95 Кб
Теги
jp2001100785
1/--страниц
Пожаловаться на содержимое документа