close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2002198870

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2002198870
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an
echo processing for reducing echo contained in a transmission voice signal which occurs in a
communication path or an echo path between a speaker and a microphone in voice
communication of fixed telephones, car telephones and mobile telephones. It relates to an
apparatus.
[0002]
2. Description of the Related Art A conventional echo processing apparatus is disclosed in
Japanese Patent Application Laid-Open No. 11-17589. FIG. 7 is a block diagram schematically
showing a conventional echo processing apparatus disclosed in Japanese Patent Application
Laid-Open No. 11-17589, and is a speech encoding means for encoding a transmission signal in
addition to the main body of the echo processing apparatus for explanation. A speech decoding
means for decoding the received signal, a speech coding means for coding the speech signal on
the far end side, and a speech decoding means for decoding the speech signal are added.
[0003]
In FIG. 7, 1 is a pseudo echo generation means, 2 and 7 are addition means, 3 is an echo
canceller means, 4 and 5 are sound / silence determination means, 6 is an echo suppression
15-04-2019
1
means, 8 is a pseudo background noise generation means, These constitute the echo processing
apparatus main body. Reference numeral 9 denotes a speech coding means, and 10 denotes a
speech decoding means, which constitute a speech coding / decoding means 11. Further, 18 is a
far-end side speech decoding means, and 19 is a far-end side speech coding means.
[0004]
Next, the operation will be described. The speech signal uttered by the far-end speaker is
encoded by the speech encoding means 19 and is input to the near-end speech decoding means
10 as encoded data CR via the communication path. The speech decoding means 10 decodes the
coded data CR and outputs a received signal RI (i) as an analog signal. Based on the power of the
reception signal RI (i), the presence / non-speech determination means 5 determines whether the
reception signal RI (i) is speech or silence, and the result is used as the echo suppression means 6
and the pseudo background noise generation means 8 Output to Further, the reception signal RI
(i) is directly converted to the reception signal RO (i) and is output to the outside including the
echo path.
[0005]
On the other hand, in the transmission signal SI (i) input to the echo canceller means 3, an echo
signal in which the reception signal RO (i) converted to an analog signal is echoed through an
echo path, and a near-end speaker Includes an audio signal uttered by the voice and background
noise at the near end. The echo canceller means 3 estimates the transfer characteristic of the
echo path and generates a pseudo echo signal by the pseudo echo generation means 1 and
subtracts the pseudo echo signal from the reception signal SI (i) by using the addition means 2 to
obtain a residual signal SA Ask for (i). Based on the power of the residual signal SA (i), the
presence / non-speech determining means 4 determines whether the residual signal SA (i) is in
the presence or absence of speech and the determination result is sent to the echo suppression
means 6 and the pseudo background noise generation means 8 Output. The power is calculated
by finding the sum of squares of samples.
[0006]
The echo suppressor 6 determines that the received signal RI (i) is determined to be presence by
the presence / absence determination unit 5 and the residual signal SA (i) is determined to be
15-04-2019
2
absence by the presence / absence determination unit 4 as the remaining signal. It is determined
that the signal included in the difference signal SA (i) is only the echo signal, and the echo signal
is suppressed by suppressing the amplitude of the residual signal SA (i). Also, if both the
judgment result of the sound and silence judgment means 5 and the judgment result of the
speech and noise judgment means 4 are voiced or if the judgment result of the speech and noise
judgment means 5 is silence, the amplitude is SA (i) Suppression of
[0007]
The pseudo background noise generating means 8 calculates and stores the spectral parameter
(linear prediction coefficient) of the section in which the received signal RI (i) is determined to be
silent by the noise / silence judging means 5. Then, in a section in which the echo suppression
means 6 suppresses the amplitude of the residual signal SA (i), spectral parameters and white
noise (called white noise, whose shape of the frequency spectrum is flat) are obtained in the
silent section. To generate pseudo background noise. The pseudo background noise is added to
the signal whose amplitude is suppressed by the echo suppression unit 6 through the addition
unit 7, and the transmission signal SO (i) is obtained from the addition unit 7.
[0008]
By this processing, the background noise once suppressed together with the echo signal by the
eco-suppressing means 6 is compensated by the pseudo background noise having a spectral
characteristic close to that of the actual background noise. Is reduced. Next, the transmission
signal SO (i) is encoded by the speech encoding means 9 and output as encoded data CS. The
coded data CS is input to the voice decoding means 18 on the far end side via the communication
path. The voice decoding means 18 decodes the encoded data and outputs a voice signal.
[0009]
SUMMARY OF THE INVENTION Since the conventional echo processing apparatus is configured
as described above, the process of calculating the spectral parameter when generating the
pseudo background noise, and the pseudo background from the spectral parameter and the
white noise There is a problem that it is necessary to perform synthesis filter processing to
generate noise, and the processing is complicated and the amount of calculation is large.
[0010]
15-04-2019
3
The present invention has been made to solve the above-described conventional problems, and
an object of the present invention is to provide an echo processing apparatus that generates
high-quality simulated background noise with simple processing.
[0011]
The echo processing apparatus according to the present invention is an echo judging means for
judging that there is only a background noise component of the near end and an echo signal in
which the reception signal is echoed by an echo path in the transmission signal. And the
transmission signal is encoded, and in the analysis frame in which it is determined that only the
background noise component of the near end and the echo signal are present in the transmission
signal in the echo determination means, the analysis frame determined in advance as a noise
section is determined. And speech coding means for generating and outputting coded data
corresponding to background noise using analysis parameters for coding.
[0012]
In the echo processing apparatus according to the present invention, the speech coding system of
the speech coding means is the CELP system, and the coded data corresponding to the
background noise comprises linear prediction coefficients, adaptive excitation codes, driving
excitation codes, and both excitation codes. It is composed of gains.
[0013]
The echo processing apparatus according to the present invention is the echo determination
means, in the analysis frame in which it is determined that only the background noise component
of the near end and the echo signal are present in the transmission signal, a plurality of analysis
frames determined in advance as noise sections. A plurality of obtained encoded spectral
parameters are randomly selected and sequentially output.
[0014]
The echo processing apparatus according to the present invention is the echo determination
means, in the analysis frame in which it is determined that only the background noise component
of the near end and the echo signal are present in the transmission signal, a plurality of analysis
frames determined in advance as noise sections. A plurality of obtained spectral parameters are
randomly selected, encoded, and sequentially output.
[0015]
According to the echo processing apparatus of the present invention, in the analysis frame in
15-04-2019
4
which it is determined that only the background noise component of the near end and the echo
signal are present in the transmission signal in the echo determination means, in a plurality of
analysis frames determined in advance as noise sections. The spectral parameters are averaged,
and the values of each dimension of the averaged spectral parameters are encoded for each
analysis frame by shaking, and sequentially output.
[0016]
The echo processing apparatus according to the present invention selects an adaptive excitation
code, a driving excitation code, and a coded gain corresponding to both excitation codes obtained
in the same analysis frame as the analysis frame for which the selected spectral parameter is
obtained, It outputs one by one.
[0017]
The echo processing apparatus according to the present invention is the echo determination
means, in the analysis frame in which it is determined that only the background noise component
of the near end and the echo signal are present in the transmission signal, a plurality of analysis
frames determined in advance as noise sections. A plurality of adaptive excitation codes
determined and a set of encoded gains corresponding to each of the adaptive excitation codes are
randomly selected and sequentially output, and a plurality of drives determined by a plurality of
analysis frames determined in advance as noise sections A set of excitation codes and encoded
gains corresponding to respective driving excitation codes are randomly selected and
sequentially output.
[0018]
In the echo processing apparatus according to the present invention, the gain of the adaptive
excitation code and the gain of the drive excitation code are obtained by obtaining and holding
values before encoding, and are encoded and output each time they are selected.
[0019]
The echo processing apparatus according to the present invention corrects and encodes the gain
of the selected adaptive excitation code and the excitation code based on the average power of
the target signal obtained in advance in a plurality of analysis frames determined to be noise
sections in advance. Output.
[0020]
In the echo processing apparatus according to the present invention, when obtaining the gain of
15-04-2019
5
the excitation code in the analysis frame determined in advance as a noise section, the gain of the
adaptive excitation code is substituted with a value of zero to near zero and then the gain of the
excitation code is calculated. It is what you want.
[0021]
The echo processing apparatus according to the present invention outputs a silence flag in DTX
(Discontinuous Trasmission) in an analysis frame in which it is determined that only the near-end
background noise component and the echo signal exist in the transmission signal in the echo
determination means. It comprises speech coding means that operates internally in a silent
interval processing mode in DTX.
[0022]
An echo processing apparatus according to the present invention comprises speech encoding
means for randomly selecting, averaging, encoding, and sequentially outputting a plurality of
spectral parameters obtained in advance by a plurality of analysis frames determined to be noise
sections in advance. .
[0023]
The echo processing apparatus according to the present invention includes encoding means,
which is adapted to transmit an echo signal based on the determination result of the echo
determining means, when a large amount of echo component is contained in the transmission
signal. A background noise generation unit is provided which outputs, as transmission data,
encoded data stored in advance corresponding to background noise instead of encoded data.
[0024]
DESCRIPTION OF THE PREFERRED EMBODIMENTS First Embodiment
FIG. 1 is a block diagram showing the configuration of an echo processing apparatus according
to the present invention.
FIG. 2 is a block diagram showing a detailed configuration of the speech coding means of FIG.
15-04-2019
6
The conventional echo processing apparatus is provided with pseudo background noise
generating means and performs processing to generate pseudo background noise. However, in
the echo processing apparatus according to the first embodiment shown in FIG. To generate
encoded data corresponding to
[0025]
The first embodiment of the present invention will be described below with reference to FIGS. 1
and 2.
In FIG. 1, the same reference numerals as the reference numerals shown in FIG. 7 indicate the
same or corresponding parts, and therefore the description of the same operations will be
omitted.
The reference numeral 12 denotes an echo judging means, and 13 denotes a speech encoding
means for inputting the judgment result of the echo judging means 12 and the judgment result of
the voiced / silence judging means 4.
[0026]
Further, in FIG. 2 showing the specific configuration of the speech encoding means 13, 51 is an
adaptive excitation codebook, 52 is a driving excitation codebook, 53 and 54 are amplifiers, 55 is
an addition means, 56 is a linear prediction analysis means, 57 Is a synthesizing filter means, 58
is an adding means, 59 is a sound source searching means, 60 is a background noise generating
means, 61 is a multiplexing means.
[0027]
In FIG. 1, the echo judging means 12 receives the received signal RI (i) outputted from the speech
decoding means 10 and the residual signal SA (i) outputted from the echo canceller means 3 and
inputs them for each analysis frame. The power of is determined as RP, SAP.
Then, for example, it is determined whether or not the condition shown in equation (1) holds.
15-04-2019
7
[0028]
RP> SAP and SAP <TH (1) RP and SAP are simply symbols.
TH is a preset fixed threshold.
In the analysis frame in which the condition of equation (1) is satisfied, it is determined that the
residual signal SA (i) includes only the echo signal (hereinafter referred to as "echo residual
frame"), and the determination result is a speech code It is outputted to the converting means 13.
[0029]
The sound / silence determining means 4 determines whether the residual signal SA (i) is voiced
or silent, for example, by determining the power or spectrum of the residual signal SA (i), and
determines the result of the speech coding. Output to means 13.
The power is calculated by finding the sum of squares of samples.
[0030]
The speech coding means 13 is constituted by a CELP coding system very often used in a
standard speech codec system such as a portable telephone as shown in FIG. 2, for example, and
codes the residual signal SA (i) for each analysis frame. At the same time, the encoded data
corresponding to the pseudo background noise is generated according to the presence / absence
determination result output from the presence / non-existence determination unit 4 and the echo
determination result output from the echo determination unit 12.
[0031]
Hereinafter, with reference to FIG. 2, an operation of the speech encoding unit 13 generating
encoded data corresponding to the pseudo background noise will be described in detail.
15-04-2019
8
[0032]
In FIG. 2, the configuration is the same as the configuration of the encoding unit of the general
CELP coding system except for the background noise generation means 60, so first the operation
other than the background noise generation means 60 (ie the basic operation of the CELP coding
system) I will explain briefly).
[0033]
The linear prediction analysis unit 56 receives the residual signal SA (i), performs linear
prediction analysis on an analysis frame basis, obtains LSP (Line Spectrum Pair) as a spectral
parameter, quantizes it, and outputs it to the synthesis filter unit 57.
In addition, the quantized LSP is encoded and output to the background noise generation unit 60
and the multiplexing unit 61.
The adaptive excitation codebook 51 stores past excitation signals.
The excitation signal in the adaptive excitation codebook 51 is cut out with a length (variable
length) called a lag, and the cut-out signal is repeated at a lag cycle until it becomes a subframe
length to generate an adaptive sound source.
Further, the driving excitation codebook 52 is composed of a plurality of noise signal vectors.
[0034]
The adaptive excitation of the adaptive excitation codebook 51 and the excitation of the driving
excitation codebook 52 are sequentially read out in units of subframe length, gained and
amplified by the amplifiers 53 and 54, and then added by the addition means 55. It becomes a
sound source signal.
The synthesis filter means 57 synthesizes synthetic speech using the LSP from the linear
prediction analysis means 56 and the sound source signal from the addition means 55.
15-04-2019
9
The addition means 58 adds the synthesized speech synthesized by the synthesis filter means 57
and the residual signal SA (i) to obtain distortion.
The sound source searching means 59 searches for and codes the lag of the adaptive sound
source, the driving sound source code, and the gain for them in the case where the distortion is
minimized, and outputs it to the background noise generating means 60 and the multiplexing
means 61.
[0035]
The multiplexing means 61 multiplexes these encoded data and outputs the result as encoded
data CS to the communication channel.
The coded data CS is transmitted to the voice decoding means 18 on the far end side via a
communication channel, and is decoded by the voice decoding means 18 to obtain a decoded
signal.
[0036]
The above is the operation of the CELP coding system.
Next, the operation of the background noise generation means 60 will be described.
The background noise generation means 60 inputs, in addition to the encoded data described
above, the sound / silence determination result output from the sound / silence determination
means 4 of FIG. 1 and the echo determination result output from the echo determination means
12.
[0037]
15-04-2019
10
First, the background noise generation means 60 always always generates LSPs for the last six
frames, as shown in FIG. 3A, for example, as shown in FIG. Save it. Also, as shown in FIG. 3 (b)
and (c), the codes of the adaptive sound source and the gain, and the codes of the driving sound
source and the gain obtained in the same analysis frame as described above in which the
soundless / silence judgment result is silent Frames (1 analysis frame corresponds to 2
subframes) are always stored.
[0038]
Then, in the analysis frame in which the determination result of the echo determination means
12 is an echo residual frame, the background noise generation means 60 performs the operation
described below. At this time, the speech coding process according to the basic operation of
CELP coding described above is not performed.
[0039]
In the echo residual frame, the background noise generation means 60 first randomly selects one
from the stored six LSPs using random numbers (see FIG. 3), and outputs the same to the
multiplexing means 61. Next, a set of adaptive excitation code and its gain code, driving
excitation code and its gain code in two subframes corresponding to the analysis frame in which
the selected LSP is analyzed is selected (FIG. 3 (b), (C) (refer to the dotted line), and output to the
multiplexing means 61.
[0040]
Since the encoded data obtained in the same analysis frame in the past silent section
(background noise section) is sequentially output to the multiplexing means 61, the decoded
signal obtained in the decoding is the actual background. It has good frequency response and
power as noise and shows good quality. In addition, since the encoded data of different analysis
frames are sequentially and randomly output without repeating the same one, a period
inappropriate as background noise for a decoded signal decoded by the decoding means that
receives this encoded data There is no sex.
15-04-2019
11
[0041]
Further, the contents of the coded data CS output from the background noise generation means
60 are the same as the contents of the coded data CS described earlier in the basic operation of
CELP coding. Therefore, in the voice decoding means 18 on the far end side which receives and
decodes the encoded data CS via the communication path, whether or not the encoded data CS is
generated by the background noise generation means 60, The speech signal of the voice section
and the background noise of the echo residual frame section are decoded and generated by the
CELP decoding process of FIG.
[0042]
As described above, according to the first embodiment, in the echo residual frame, the LSP, the
adaptive excitation code, and the excitation code obtained by the speech encoding unit 13 based
on the CELP method in advance in the same analysis frame of the silent section. Since a set of
encoded gains is randomly selected and sequentially output to generate encoded data
corresponding to background noise, it is generally possible to provide pseudo background noise
generation without providing any new means. By the simple configuration and method using the
popular CELP method, it is possible to generate good-quality pseudo background noise having no
periodicity and a feature as background noise in a section where only an echo is present in the
transmission signal.
[0043]
Second Embodiment
The background noise generation means 60 of the first embodiment selects the adaptive
excitation code, the excitation code and their gains based on the analysis frame for which the LSP
is selected, but the set of the adaptive excitation code and its gain, and the drive excitation code
And the set of gains may be randomly selected separately.
[0044]
As described above, according to the second embodiment, since LSP, adaptive excitation code,
driving excitation code, and their gains are separately and randomly selected, the pseudo noise
generated by background noise generation means 60 is generated. Background noise has the
effect of becoming more random noise and not having biased characteristics.
15-04-2019
12
[0045]
Third Embodiment
The background noise generation means 60 of the first embodiment holds the encoded LSP and
gain in advance and outputs it, but when holding and outputting the actual LSP or gain value
before being encoded It may be encoded.
[0046]
As described above, according to the third embodiment, since the actual LSP or gain value is
retained and re-encoded, for example, the LSP (or gain) of the previous analysis frame and the
LSP (or gain) of the current frame There is an effect that it is possible to cope with the case of
using a method of encoding the parameter by using the relationship with other parameters that
encode the difference of.
[0047]
Fourth Embodiment
The background noise generation means 60 according to the first embodiment outputs the
adaptive excitation code and the gain of the drive excitation code obtained in advance in the
silent section as it is. However, with the sound source signal (in the adaptive sound source
codebook) obtained in the analysis frame of the silent section as the target signal, the average
value of the power is determined as the reference sound source power, and the adaptive sound
source code and drive selected by the background noise generation means 60 The gains of both
excitation codes may be corrected and encoded so that the power of the excitation signal
generated using the excitation code and the gains of both excitation codes matches this reference
excitation power. At this time, the gain of the adaptive excitation code may be set to a value of
zero or close to zero, and the gain of the drive excitation may be determined so that the power of
the excitation signal generated only by the excitation code matches the reference excitation
power.
15-04-2019
13
[0048]
As described above, according to the fourth embodiment, the power of the sound source signal
generated from the encoded data generated by the background noise generation means 60
matches the reference sound source power, so a stable pseudo background without large power
fluctuation There is an effect that noise can be generated. In addition, since the adaptive
excitation is not used by setting the gain of the adaptive excitation to zero, even if the content of
the adaptive excitation codebook erroneously has the feature of the sound section, it is good
regardless of the content of the adaptive excitation codebook. It is possible to generate false
background noise of various quality.
[0049]
Embodiment 5 The background noise generating means 60 in the fourth embodiment determines
the reference sound source power with the sound source signal in the silent section as the target
signal, and the power of the sound source signal generated from the encoded data generated by
the background noise generating means 60 is this reference sound source power. However, with
the residual signal SA (i) in the silent section as the target signal, the average value of the power
is determined as the reference speech power, and the LSP selected by the background noise
generation means 60 and the adaptive excitation code and the drive are determined. Both sound
source gains may be corrected so that the power of the synthesized signal synthesized using the
sound source code and the gains of both sound source codes matches the reference speech
power.
[0050]
As described above, according to the fifth embodiment, since the power of the synthesized signal
generated from the encoded data generated by the background noise generation means 60
matches the reference speech power, stable pseudo background noise without power fluctuation
is obtained. There is an effect that can be generated.
[0051]
Sixth Embodiment
15-04-2019
14
FIG. 4 is a block diagram showing the configuration of an echo processing apparatus according
to a sixth embodiment of the present invention. FIG. 5 is a block diagram showing the detailed
configuration of the speech encoding means 14 of FIG. In FIG. 1 described in the first
embodiment, although the judgment result of the echo judging means 12 is directly input to the
speech coding means 13, the echo processing apparatus according to the sixth embodiment is a
wireless communication terminal such as a mobile phone. It is configured via DTX (Discontinuous
Transmission) control means 15 generally used in voice processing in the above. When the
transmission signal is a silent section, the DTX control means 15 turns off the transmission
output by radio to reduce the power consumption of the section. The transmission control
method and comfort noise by the DTX control means 15 are described, for example, in 3G TS
26.093 "AMR Speech Codec; Source Controlled Rate Operation" which is a standard of the third
generation mobile phone.
[0052]
The operation of the sixth embodiment according to the present invention will be described
below with reference to FIGS. 4 and 5. 4 and FIG. 5, the same reference numerals as the
reference numerals shown in FIG. 1 and FIG. 2 indicate the same or corresponding parts, and
therefore the description of the same operations will be omitted. In FIG. 4, reference numeral 14
denotes speech coding means for inputting control information output from the DTX control
means 15, 16 denotes modulation means for inputting the coded data CS and flag information
DTX and modulating them into a radio signal, and 17 denotes a radio signal. Demodulation
means for outputting the encoded data CR. In FIG. 5, reference numeral 62 denotes background
noise generation means for inputting the flag information DTX from the DTX control means 15.
Reference numeral 20 denotes demodulation means for demodulating coded data and flag
information DTX from the radio signal on the far end side, and reference numeral 21 denotes
modulation means for modulating the coded data on the far end side and outputting the result by
radio.
[0053]
The DTX control means 15 inputs the judgment results of the voiced / silence judgment means 4
and the echo judgment means 12 and whether the judgment result of the echo judgment means
12 indicates an echo residual frame, or the judgment result of the voiced / silence judgment
means 4 In the case of silence, the flag information DTX is set to “0” meaning wireless
transmission off, and is output to the speech encoding unit 14 and to the modulation unit 16.
Further, when the determination result of the echo determination unit 12 is not the echo residual
15-04-2019
15
frame and the determination result of the sound / silence determination unit 4 is voiced, the DTX
control unit 15 means that the radio transmission on of the flag information DTX is " It is set to
“1” and output to the speech encoding means 14 and to the modulation means 16.
[0054]
The speech encoding means 14 has a function of encoding the residual signal SA (i), the flag
information DTX from the DTX control means 15, the sound / silence judgment result from the
speech / silence judgment means 4, and the echo judgment means 12 It has a function of
generating encoded data corresponding to background noise according to the echo
determination result. Hereinafter, the operation of the speech encoding means 14 will be
described in detail using FIG.
[0055]
When the flag information DTX is “1” (wireless transmission ON), the speech encoding means
14 encodes the received signal SA (i) according to the CELP method described in the first
embodiment, and makes the encoded data CS into the modulation means 16. Output. The
modulation means 16 continuously modulates this encoded data and outputs it by radio.
[0056]
Further, when the speech encoding means 14 indicates that the flag information DTX is "0" and
the echo judgment result is not an echo residual frame, it enters a DTX processing mode, for
example, 3G TS 26.0992 "AMR Speech Codec; Background noise is encoded intermittently in the
manner described in "Confort noise aspects". In this mode, the background noise generation
means 62 outputs the encoded LSP inputted from the linear prediction analysis means 56, for
example, every six frames to the multiplexing means 61. Also, the power of the residual signal SA
(i) is obtained for each of the same six frames, encoded, and output to the multiplexing means 61.
The multiplexing unit 61 multiplexes the encoded data and outputs the multiplexed data to the
modulation unit 16, and the modulation unit 16 modulates the encoded data intermittently
(every 6 frames) and outputs the modulated data by radio. The modulation means always
modulates the flag information DTX and outputs it by radio. This saves power consumption in
the modulation means 16.
15-04-2019
16
[0057]
The radio signal modulated by the modulation means 16 is inputted to the demodulation means
20 on the far end side via the communication path, and is demodulated into encoded data and
flag information DTX in the demodulation means 20 and inputted to the speech decoding means
18 Ru. At this time, the encoded data is intermittently inputted to the speech decoding means 18.
The speech decoding means 18 enters the DTX processing mode when the input flag information
DTX is "0", for example, the LSP received in a method as shown in 3G TS 26.0992 "AMR Speech
Codec; Confort noiseaspects". And the power to decode the signal corresponding to the
background noise.
[0058]
Also, when the flag information DTX is “0” and the echo determination result indicates an echo
residual frame, the speech encoding means 14 enters a DTX processing mode accompanied by
background noise generation. In this DTX processing mode, as shown in FIG. 6, the background
noise generating means 62 selects a plurality of random sets from the set of LSPs and powers
obtained and stored in a silent interval in advance by random numbers by means of random
numbers and averaging means respectively. After averaging at 63, encoding is performed by the
encoding means 64 and output to the multiplexing means 61 every six frames. At this time, for
example, when N sets of LSP and power are stored as a whole, a number M smaller than N is
randomly selected. The multiplexing unit 61 multiplexes the encoded data and outputs the
multiplexed data to the modulation unit 16, and the modulation unit 16 modulates the encoded
data intermittently (every 6 frames) and outputs the modulated data by radio.
[0059]
The background noise generation means 62 always saves the LSP input from the linear
prediction analysis means 56 for the last several frames in advance if the presence / absence
determination result is silence regardless of the value of the flag information DTX and the echo
determination result, Similarly, it is assumed that the power of the reception signal SA (i) in the
same analysis frame in which the LSP is stored is obtained and stored.
[0060]
15-04-2019
17
The voice decoding means 18 for intermittently inputting the encoded data through the
communication path and the demodulation means 20 enters the DTX processing mode because
the input flag information DTX is "0", and the received LSP The power is used to decode the
signal corresponding to the background noise.
At this time, since LSP and power are randomly selected and averaged in advance in the analysis
frame of the silent section, the decoded signal decoded by the voice decoding means 18 on the
far end side changes appropriately. It has spectral characteristics as background noise and has
good quality.
[0061]
As described above, according to the sixth embodiment, when the echo determination result is an
echo residual frame, the speech encoding unit 14 uses the DTX control method generally used
for speech coding for wireless communication. Since the LSP and power of the received signal
previously obtained in the silent interval are randomly selected and sequentially output to
generate the encoded data corresponding to the background noise, a new means for generating
the pseudo background noise is provided. According to the simple configuration and method
without providing, it is possible to generate high-quality simulated background noise in a section
where only an echo exists in the transmission signal.
[0062]
Embodiment 7
The echo processing apparatus according to the sixth embodiment randomly selects and
averages LSP and power, which are obtained in advance in an analysis frame of a silent section,
every six frames in an echo residual frame. However, it is also possible to generate new LSPs
every six frames by slightly randomly fluctuating the values of each dimension of LSPs obtained
in the first analysis frame determined to be an echo residual frame using random numbers.
[0063]
As described above, according to the seventh embodiment, LSPs transmitted intermittently
change little, so that there is no large fluctuation in quality in a section where only an echo exists
in the transmission signal. False background noise can be generated.
15-04-2019
18
[0064]
As described above, according to the present invention, an echo judging means for judging that
there is only a background noise component at the near end and an echo signal in which the
received signal is echoed by an echo path. In the analysis frame in which the transmission signal
is encoded and the echo determination means determines that only the background noise
component of the near end and the echo signal are present in the transmission signal, a code
obtained by an analysis frame determined in advance as a noise section Since speech coding
means for generating and outputting coded data corresponding to background noise using
analysis parameters for encoding is provided, it is possible to simplify the process without
providing new means for generating pseudo background noise. According to the configuration,
there is an effect that it is possible to generate high-quality pseudo background noise in a section
where only an echo exists in the transmission signal.
[0065]
According to the present invention, the speech coding system is the CELP system, and the coded
data corresponding to the background noise comprises speech coding means using spectral
parameters, adaptive excitation code, driving excitation code and gains of these two excitation
codes. Since the configuration is made, there is an effect that it can be easily applied to the CELP
method generally used widely as a standard voice codec method for mobile phones and the like.
[0066]
According to the present invention, in the analysis frame in which it is determined that only the
background noise component of the near end and the echo signal exist in the transmission signal
in the echo determination means, a plurality of analysis frames determined in advance as a noise
section are determined. Since the speech coding means is provided to randomly select coded
spectral parameters and sequentially output the spectrum parameters, there is a spectral feature
of background noise and there is no periodicity in a section where only an echo is present in the
transmission signal. There is an effect that it is possible to generate a high quality pseudo
background noise having a feature as noise.
[0067]
According to the present invention, in the analysis frame in which it is determined that only the
background noise component of the near end and the echo signal exist in the transmission signal
in the echo determination means, a plurality of analysis frames determined in advance as a noise
section are determined. Since the speech coding means for selecting and encoding spectral
parameters at random and outputting them sequentially is provided, a scheme for encoding
spectral parameters using the relationship between spectral parameters obtained in different
15-04-2019
19
analysis frames and other parameters is provided. There is an effect that can cope with the case
of using.
[0068]
According to the present invention, in the analysis frame in which it is determined that only the
background noise component of the near end and the echo signal are present in the transmission
signal by the echo determination means, spectral parameters of a plurality of analysis frames
determined in advance as noise sections are averaged. Since the speech coding means is provided
to oscillate and encode the value of each dimension of the averaged spectral parameter and
output sequentially, it is possible to generate a good quality background noise without large
fluctuation. effective.
[0069]
According to the present invention, the adaptive excitation code, the driving excitation code, and
the encoded gain corresponding to both excitation codes obtained in the same analysis frame as
the analysis frame from which the randomly selected spectral parameter is obtained are selected,
and sequentially Since the output speech encoding means is configured to be provided, there is
an effect that it is possible to generate a good quality pseudo background noise having the
spectral features and power of the actual background noise.
[0070]
According to the present invention, in the analysis frame in which it is determined that only the
background noise component of the near end and the echo signal are present in the transmission
signal in the echo determination means, a plurality of analysis frames determined in advance as
noise segments are determined. Adaptive excitation codes and sets of encoded gains
corresponding to the respective adaptive excitation codes are randomly selected and sequentially
output, and a plurality of drive excitation codes obtained for a plurality of analysis frames
determined to be noise sections in advance And speech coding means for randomly selecting a
set of coded gains corresponding to each driving excitation code and sequentially outputting
them, so that there is no biased characteristic and a high quality pseudo random There is an
effect that can generate background noise.
[0071]
According to the present invention, the gain of the adaptive excitation code and the gain of the
drive excitation code are configured to obtain and hold the values before encoding, and to be
provided with speech encoding means for encoding and outputting each time it is selected. There
is an effect that it is possible to cope with the case of encoding a gain by using the gain obtained
15-04-2019
20
in different analysis frames and the relationship with other parameters.
[0072]
According to the present invention, the gains of the selected adaptive excitation code and driving
excitation code are corrected, encoded, and output based on the average power of the target
signal determined in advance for the plurality of analysis frames determined to be noise sections
in advance. Since the speech coding means is provided, there is an effect that stable pseudo
background noise without large power fluctuation can be generated.
[0073]
According to the present invention, when obtaining the gain of the drive excitation code in the
analysis frame determined in advance as a noise section, a voice code for obtaining the gain of
the drive excitation code after substituting the gain of the adaptive excitation code from zero to a
value close to zero. Since the present invention is configured to include the quantization means,
there is an effect that pseudo background noise of good quality can be generated regardless of
the contents of the adaptive excitation codebook.
[0074]
According to the present invention, in the analysis frame in which it is determined that only the
background noise component of the near end and the echo signal exist in the transmission signal
in the echo determination means, the silence flag in DTX is output and the silence interval
processing in DTX is performed. Since the speech encoding means is internally operated in the
mode, the DTX control method generally used is utilized, and a simple structure and method are
provided without providing any new means for generating pseudo background noise. There is an
effect that it is possible to generate high quality simulated background noise.
[0075]
According to the present invention, since the plurality of spectrum parameters determined in
advance by the plurality of analysis frames determined to be noise sections in advance are
randomly selected, averaged, encoded, and sequentially output, speech encoding means is
provided. There is an effect that it is possible to generate a good quality pseudo background
noise having spectral characteristics as the background noise that changes moderately.
[0076]
According to the present invention, the encoding means, based on the determination result of the
echo determination means, corresponds to the background noise instead of the encoded data of
15-04-2019
21
the transmission signal when the transmission signal contains a large amount of echo
components. Since the background noise generating means for outputting the encoded data
stored in advance as transmission data is provided, even when the transmission signal contains a
large amount of echo components, the encoded data corresponding to the background noise is
There is an effect that can be output as transmission data.
15-04-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
35 Кб
Теги
description, jp2002198870
1/--страниц
Пожаловаться на содержимое документа