close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2011023959

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011023959
[Purpose] An object of the present invention is to provide a "voice intelligibility improvement
system and a voice intelligibility improvement method" for performing voice intelligibility
improvement control in consideration of noise mixed in a call voice sent from a called party.
[Composition] (1) A first gain G for compensating for loudness is determined based on the power
of synthetic noise obtained by combining the noise on the reception side and the noise on the
transmission side and the power of speech, (2) Transmission The gain when the noise increase
amount when the predetermined gain is applied to the talking side noise balances the loudness
compensation amount by the gain is determined as the second gain G, and (3) the voice and
transmitter side Synthesized noise at the listening position on the receiver side when the
predetermined gain G is applied to each of the noises, and the synthesized noise at the listening
position on the receiver side when the first gain G is applied to only the call voice. Determine the
third gain G that minimizes the sum of the error power with each of the voice calls, and use the
gains G and G based on the power ratio (SN ratio) of the voice calls and the transmission side
noise. Calculate the gain. [Selected figure] Figure 1
Speech intelligibility improvement system and speech intelligibility improvement method
[0001]
The present invention relates to a speech intelligibility improvement system and a speech
intelligibility improvement method, and more particularly to a speech intelligibility improvement
system and a speech intelligibility improvement method for controlling the gain of speech speech
based on the power of speech speech and noise.
[0002]
10-04-2019
1
A speech intelligibility improvement system has been commercialized that adjusts the volume of
voice (navi-guide voice, reading of news and mail, etc.) to improve the clarity of voice according
to the ambient noise.
The principle of speech intelligibility improvement is that the gain of the audio signal is based on
the noise level and the level of the sound output from the speaker so that a person can perceive
the sound output from the speaker as a constant loudness regardless of the noise level. To
control.
[0003]
FIG. 12 is a block diagram of the proposed speech intelligibility improvement system for
adjusting the volume of navigation voice (see Patent Document 1), and the power estimation unit
50 and the loudness compensation control unit 70 are the main components. . During normal
times, the guidance voice generation unit 51 of the navigation device generates a guidance voice
signal when approaching an intersection. The audio unit 52 applies sound quality, volume
control, and the like to the guidance voice signal, amplifies it, and outputs it. The gain adjustment
unit (volume correction unit) 53 multiplies the audio signal output from the audio unit 52 by the
gain G determined by the loudness compensation control unit (gain control unit) 70 described
later, corrects the volume, and inputs the result to the speaker 54. The speaker 54 acoustically
converts the input voice signal and outputs a guidance voice into the vehicle compartment. The
microphone (microphone) 55 detects a synthetic sound of the guidance voice A and the ambient
noise N (engine sound, road noise, etc.) and inputs it to the power calculation unit 57 via the
audibility correction filter 56. The power calculation unit 57 performs a square operation on the
amplitude of the input microphone detection signal to calculate the power, and inputs the power
to the switching unit 58. In the section where the guide voice is not output, that is, when the
power (voice power) of the voice signal is smaller than the set value, the switching unit 58 sets
the fixed contact A as noise power using the power calculated by the power calculation unit 57.
In the section where the guide voice is output, that is, when the voice power is larger than the set
value, the power calculated by the power calculation unit 57 is output to the B contact side. Do
not input to any unit. The noise power averaging unit 59 regards the power output from the
power calculation unit 57 as noise power in a section where the guide voice is not output, and
the latest M (for example, 256) powers output from the power calculation unit 57 The moving
average value is stored in the power storage unit 60 as noise power. As a result, when the guide
voice is output, the latest noise power in the section where the immediately preceding guide
voice is not output is stored in the power storage unit 60. That is, the noise power in the guide
10-04-2019
2
voice output is regarded as the noise power stored in the power storage unit 60, and the noise
power stored in the power storage unit 60 is input to the loudness compensation control unit 70.
[0004]
In parallel with the above, the audio signal output from the audio unit 52 is input to the audio
power calculation unit 62 via the auditory sense correction filter 61. The voice power calculation
unit 62 performs a square operation on the amplitude of the input voice signal to calculate voice
power, and inputs the voice power to the determination unit 63 and the voice power averaging
unit 64. The determination unit 63 compares the input audio power with the setting level, and
determines that it is a section where the guide audio is not output when the audio power is
smaller than the setting level, and when the audio power is larger than the setting level. It is
determined that it is a section in which a guide voice is output. Then, in the section where the
guide voice is not output, the determination section 63 controls the switch 58 to input the power
calculated by the power calculation section 57 to the noise power averaging section 59, and the
section where the guide voice is output. Then, do not input to any unit. The voice power
averaging unit 64 calculates the average value of L (for example, 1024) voice powers output
from the voice power calculation unit 62, inputs the average value to the variable gain unit 65,
and the variable gain unit 65 is set. The gain H is multiplied by the average voice power and
input to the loudness compensation control unit 70. The gain H set in the variable gain unit 65 is
regarded as that the propagation characteristic from the input terminal of the speaker 54 to the
microphone output terminal can be approximated by only the gain H, and a characteristic
identification unit not shown identifies the gain H Are identified and set in advance.
[0005]
The loudness compensation control unit 70 makes the guidance voice clear regardless of the
noise level based on the voice power input from the variable gain unit 65 and the noise power
input from the power storage unit 60 in the section where the guide voice is output. The audible
gain G is determined based on the loudness characteristic of a person and input to the gain
adjustment unit 53. The gain adjustment unit 53 receives the gain G, multiplies the guidance
voice signal by the gain G, and outputs the result. Note that the loudness compensation control
unit 70 does not perform determination control of the gain G in a section where the guide voice
is not output. According to the speech intelligibility improvement system of FIG. 12, noise power
can be calculated and stored in the section where the guide speech is not output, and noise
power in the speech output can be the stored noise power. Since it is not necessary to calculate
10-04-2019
3
the noise power by subtracting the voice power from the power of the microphone detection
signal (the power of synthetic sound of noise and voice), moreover, there is no need to use an
adaptive filter, so the configuration can be simplified. It can be realized even with devices that are
not DSPs.
[0006]
By the way, recently, it has been proposed to apply the above-mentioned speech intelligibility
improvement system to a car-based hands-free telephone (HFT). FIG. 13 is a block diagram of a
voice intelligibility improvement system for adjusting the volume of hands-free call voice,
including a speaker 81 for outputting the other party's call voice, a microphone 82 for collecting
the driver's voice, a communication destination and a voice signal And a speech intelligibility
improvement unit 91. The speech intelligibility improvement unit 91 is disposed between the
communication network 83, the speaker 81, and the microphone 82, and has the same
configuration as that shown in FIG. That is, the speech intelligibility improvement unit 91
includes a power estimation unit 50, a loudness compensation control unit 70, and a gain
adjustment unit 53 for controlling the gain of the call speech, and as in FIG. It is designed to
adjust the volume based on it. Reference numerals 84 and 85 denote a speaker and a
microphone of the communication destination, and H denotes a propagation characteristic from
the speaker 81 to the microphone 82. Conventionally, it is assumed that a voice signal with a
high SN ratio is input, that is, only the voice of the communication partner is collected by the
microphone 85 of the communication destination, and noise is not collected and N1 = 0. . Then,
on such an assumption, the gain G is controlled so that the sense of volume of the sound HS
heard at the listening position (the installation position of the microphone 82) at the time of
silence and that in the case where the noise N2 is present become equal. That is, a correction
gain corresponding to the noise N2 on the receiving side is calculated, and the gain is multiplied
by the signal S + N1 and output from the speaker.
[0007]
However, in the case of the HFT call, noise may be mixed into the other party's transmission
voice (N1 ≠ 0) depending on the speech environment (surrounding situation) of the transmitter
(other party). In such a case, it is originally necessary to lift the voice HS by an appropriate
amount with respect to the noise HN1 + N2, but in the conventional system, only volume control
according to the noise N2 is performed, and the gain adjustment unit 53 amplifies not only the
transmission voice S but also the noise N1 on the transmission side. That is, in the conventional
system, control to make transmission voice S more audible to noise HN1 + N2 is not performed,
10-04-2019
4
and noise N1 is also amplified, so the noise is HN1 + N2 to GHN1 +. It changes to N2 and the
loudness compensation effect is reduced. In addition, although the method of canceling the echo
in hands-free telephones is proposed by the voice intelligibility improvement system as a prior
art (refer patent document 2), the voice signal with a high SN ratio also in this prior art is input,
ie, Since it is premised that N1 = 0, there is a problem that the loudness compensation effect can
not be exhibited sufficiently.
[0008]
Japanese Patent Application No. 2008-002144 Japanese Patent Application Laid-Open No. 2003264627
[0009]
From the above, it is an object of the present invention to perform voice intelligibility
improvement control in consideration of noise mixed in a call voice sent from a called party.
An object of the present invention is to make it possible to exhibit an original loudness
compensation effect even when noise is mixed in a call voice sent from a called party. An object
of the present invention is to determine an appropriate gain based on a power ratio (S / N ratio)
of speech voice and noise transmitted from a called party. An object of the present invention is to
determine an appropriate gain based on the noise on the receiving side and the power of the call
voice when the power of the call voice and the noise sent from the call destination is equal.
[0010]
The present invention is a voice intelligibility improvement system for controlling the gain of call
voice based on the power of call voice and noise, and controlling the gain of call voice based on
the power of call voice and noise to achieve the clarity of call voice. It is a speech intelligibility
improvement method to be improved. · Speech intelligibility improvement system The speech
intelligibility improvement system according to the present invention comprises: (1) a power
estimation unit for estimating the power of a call speech, a transmission side noise and a
reception side noise; (2) a reception side noise and a transmission side The first gain G0 for
loudness compensation is determined on the basis of the power of the synthetic noise obtained
by combining the noise and the power of the speech signal, and the noise increase amount when
the transmission side noise is multiplied by a predetermined gain and the noise increase When
10-04-2019
5
the gain when the amount of loudness compensation is balanced is determined as the second
gain GH, and a predetermined gain G is applied to each of the call voice and the transmitting side
noise, the synthetic noise at the receiving side listening position When the first gain G0 is applied
only to the call voice and the call voice, the gain G at which the sum of the power of each error
between the synthesized noise and the call voice at the listening position on the receiving side is
minimized is To determine as the gain GL of (3) An optimum gain calculation unit which
calculates an optimum gain using the second and third gains based on the power ratio of the
speech signal and the transmission side noise. The speech intelligibility improvement system
according to the present invention further includes a loudness compensation unit that
determines a gain for loudness compensation based on the power of speech speech and noise,
and the loudness compensation unit is configured to calculate the power of the synthetic noise
and the speech speech. The gain of the power in the ideal state is calculated on the basis of these
powers, and the result is input to the gain determination unit. In the speech intelligibility
improvement system according to the present invention, the gain determination unit
compensates for loudness in an ideal state based on the power of synthetic noise obtained by
combining the noise on the reception side and the noise on the transmission side and the power
of speech. A first gain determining unit that determines a first gain; a gain when the noise
increase amount when the transmission side noise is multiplied by a predetermined gain and the
loudness compensation amount by the gain is equal to the second gain The second gain
determination unit to determine, the synthetic noise at the receiving side listening position when
each of the call voice and the transmitting side noise is multiplied by a predetermined gain, and
the receiving side listening position in the ideal state And a third gain determining unit that
determines a third gain that minimizes the sum of the power of the error with each of the
synthetic noise and the speech in the above.
In the speech intelligibility improvement system according to the present invention, when the
power ratio is larger than a first set value, the loudness compensation unit is based on the noise
when the transmission side noise is regarded as zero and the power of the call voice. A fourth
gain is obtained, and the optimum gain calculation unit (1) sets the fourth gain as the optimum
gain when the power ratio is larger than a first set value, and (2) the power ratio is first When it
is smaller than the set value and larger than the second set value, the average value of the second
gain and the third gain is set as the optimum gain, and (3) when the power ratio is approximately
equal to 1, the call voice and the receiver side The optimum gain is determined based on the
power of noise, and (4) when the power ratio is smaller than a third set value smaller than 1, the
third gain is set as the optimum gain. In this case, when the power ratio is approximately equal to
1, the optimum gain calculating unit converges to 0 dB as the power of the call voice increases
(1), and (2) the same call voice power However, the optimum gain is determined using the second
and third gains so that the gain becomes larger than 0 dB as the reception noise power increases.
10-04-2019
6
[0011]
-Speech intelligibility improvement method The speech intelligibility improvement method of the
present invention is a step of estimating the power of the speech, the transmitter side noise and
the receiver side noise, and the synthetic noise obtained by combining the receiver side noise and
the transmitter side noise. A step of determining a first gain G0 for loudness compensation based
on the power and the power of the speech voice, a noise increase amount when the transmission
side noise is multiplied by a predetermined gain, and a loudness compensation amount by the
gain Determining the gain at the time of balancing as the second gain GH, combining noise at the
receiving side listening position when the predetermined gain G is applied to each of the call
voice and the transmitter side noise, the call voice, and the call The gain G at which the sum of
the power of each error with the synthesized noise at the receiving side listening position when
the first gain G0 is applied to only the voice is minimized is determined as the third gain GL. Step,
the second based on the power ratio of the transmission side noise and speech sound, has a step,
of calculating an optimum gain by using the third gain. The optimal gain calculating step
determines (4) a fourth gain based on noise when the transmitting side noise is regarded as zero
and the power of the speech when the power ratio is larger than the first set value. And (2)
setting the fourth gain as the optimum gain, (2) when the power ratio is smaller than the first set
value and larger than the second set value, an average value of the second gain and the third gain
is used. Step of setting the optimum gain, (3) Step of determining the optimum gain based on the
power of the call voice and the receiving side noise when the power ratio is approximately equal
to 1, (4) the power ratio is the third set value When smaller, the step of making the third gain the
optimum gain is included. In this case, when the power ratio is approximately equal to 1, (1) the
gain of the call voice converges to 0 dB as the power of the call voice increases, and (2) the
reception noise power increases even with the same call voice power. The second and third gains
are used to determine the optimum gain so that the gain is larger than 0 dB.
[0012]
According to the present invention, the first gain for performing loudness compensation is
determined based on the power of synthetic noise obtained by combining the noise on the
reception side and the noise on the transmission side and the power of the speech signal, and the
noise on the transmission side is determined. The gain when the amount of noise increase when
a predetermined gain is applied and the amount of loudness compensation by the gain balance is
determined as a second gain, and a predetermined gain G for each of the speech and the
transmission side noise is determined. Of the synthetic noise at the receiving side listening
position at the time of applying the call and the error of the synthesized noise at the receiving
10-04-2019
7
side listening position at the receiving side listening position when the first gain G0 is applied to
only the call voice The gain G at which the sum of the above becomes the smallest is determined
as the third gain GL, and the optimum gain is calculated using the second and third gains based
on the power ratio of the speech voice and the transmitting side noise. From the other party Is
the noise mixed in the incoming call voice can do voice intelligibility enhancement control
considering noise in call voice becomes possible to sufficiently exhibit the original loudness
compensation effect even if contaminated. That is, according to the present invention, a second
gain GH capable of realizing an auditory loudness (loudness) of a voice and a third gain GL such
that the deviation from the ideal state is minimized are determined. By obtaining the optimum
gain Gopt according to the S / N ratio of S and transmitting side noise N1, it becomes possible to
control the level of the speech S in consideration of the total noise level that the receiver hears
finally . As a result, even for HFT voice where noise N1 is mixed in the transmitter side signal, the
performance as a system for improving speech intelligibility can be exhibited more appropriately
than in the conventional system, which is more appropriate. It becomes possible to listen to the
other party's voice at the volume. Further, according to the present invention, the optimum gain
is determined using the second gain GH and the third gain GL based on the power ratio (SN ratio)
of the speech voice and the noise sent from the called party. can do. Further, according to the
present invention, when the power ratio is almost equal to 1, the gain converges to 0 dB as the
power of the speech voice increases, and the reception noise power increases as the speech voice
power increases. The first and second gains can be used to determine an appropriate gain such
that the gain is larger than 0 dB.
[0013]
FIG. 1 is a block diagram of a speech intelligibility improvement system according to the present
invention when applied to an on-vehicle hands-free telephone (HFT). It is a block diagram of the
optimal gain determination part. It is the optimal gain determination processing flow of this
invention. It is explanatory drawing of a decision processing of gain GH. It is a table | surface for
determining the optimal gain based on the power ratio (SN ratio) of a telephone call voice and
transmitting side noise. It is a detailed processing flow of step 122. It is explanatory drawing of
the optimal gain determination table. It is explanatory drawing of the optimal gain decision for
demonstrating the specific case in the case of PS = 70 dB, PN1 = 60 dB, and PN2 = 50 dB. It is a
power distribution explanatory view of the voice. It is power distribution explanatory drawing of
a noise. It is an explanatory view of power distribution of voice and noise. It is a block diagram of
the speech intelligibility improvement system which adjusts the volume of the proposed
navigation sound. It is a block diagram of the conventional speech intelligibility improvement
system which adjusts the volume of handsfree telephone call speech.
10-04-2019
8
[0014]
(A) Speech intelligibility improvement system FIG. 1 is a block diagram of a speech intelligibility
improvement system according to the present invention when applied to on-vehicle hands-free
telephone (HFT). It is arranged between the speaker 22 and the microphone 23. The telephone
line network 21 establishes a communication link for communicating with the communication
partner, inputs an audio signal from the communication destination into the speaker 22 via the
gain adjustment unit of the audio clarity improving unit 10, and also inputs it from the
microphone 23. The driver's voice signal is sent to the other party. The speaker 22 outputs the
call voice of the communication destination party inputted from the call line network 21, and the
microphone 23 collects the voice of the driver and inputs it to the communication line network
21. The transfer characteristic from the speaker 22 to the microphone 23 is H. The HFT terminal
apparatus is composed of various circuit units such as a communication interface and an audio
circuit in addition to the speaker 22 and the microphone 23, but these are omitted in FIG.
Further, the speaker 31 and the microphone 32 are provided in the HFT terminal apparatus of
the communication partner.
[0015]
In the speech intelligibility improvement unit 10, the power estimation unit 11 estimates each
power PHS and PHN1 at the receiving side listening position (microphone position) of the speech
S transmitted from the communication destination and the noise N1 on the transmitting side. The
power PN2 at the receiving side listening position of the receiving side noise N2 is estimated and
output. As a method of calculating the power, for example, the already proposed method, which
will be described later, can be adopted. According to the power PHN1 calculation method, first,
the power PN1 of the transmission side noise N1 is calculated, and then the power PS of the
speech S is calculated by subtracting the power from the power of the reception signal S + N1. .
On the other hand, when the level of S + N1 is small, the signal input from the receiving side
microphone can be regarded as receiving side noise N2, so if its power is determined, PN2 can be
extracted independently. The power obtained by multiplying the power PN1 of the transmission
side noise N1 by the propagation characteristic H becomes the power PHN1 of the transmission
side noise at the listening position, and the power obtained by multiplying the power PS of the
speech S by the propagation characteristic H is the listening position It becomes power PHS of
the call voice in. The optimum gain determining unit 12 appropriately inputs the power PHS of
the speech and the power PN of the combined noise of the noise N1 and the noise N2 to the
loudness compensation unit 13 and determines the optimum gain Gopt according to the
processing flow of FIG. Then, the gain adjustment unit 14 is set. The loudness compensation unit
13 determines a gain for generating a compensation amount (loudness compensation amount)
10-04-2019
9
such that an auditory volume feeling of voice can be obtained based on the power PHS of the
speech voice and the synthetic noise power PN, and is an optimal gain Input to the decision unit
12. The loudness compensation unit 13 includes a gain table GTL, and determines the gains with
reference to the gain table. The gain adjustment unit 14 applies an optimum gain to the voice
signal input from the telephone line network 21 and inputs the voice signal to the speaker 22.
[0016]
FIG. 2 is a block diagram of the optimum gain determination unit 12. The optimum gain
calculation control unit 12a that executes the processing of FIG. 3 for determining the optimum
gain, and the predetermined gain G set by the optimum gain calculation control unit 12a. Adder
12c which calculates and outputs the power PN of the synthetic noise at the listening position on
the receiving side by G × PHN1 + PN2 by gain multiplying unit 12b which multiplies power
PHN1 of transmitting side noise N1 by G × PHN1 + PN2, call voice S and transmitting side noise
N1 The SN ratio calculator 12d is provided to calculate the power ratio (SN ratio) of the above
and input it to the optimum gain calculation control unit 12a.
[0017]
(B) Optimal Gain Determination Process FIG. 3 is a flowchart of the optimal gain determination
process of the present invention, in which the optimal gain Gopt is determined in consideration
of not only the receiving noise N2 but also the power component of the transmitting noise N1.
It is assumed that the power PHS, PHN1, and PN2 at the receiving side listening position of the
call voice S, the transmitting side noise N1, and the receiving side noise N2 have already been
estimated. First, the power of the synthetic noise (N2 + HN1) obtained by combining the noise N2
on the receiving side and the noise on the transmitting side HN1 is taken as PN, and the loudness
compensation in the ideal state is performed based on the power PN and the power PHS of the
speech voice HS. Determine the gain G0 for the That is, when an ideal situation is considered in
which the gain N is not multiplied to the component N1 of the transmission side noise input to
the reception side microphone 23, the gain G0 required for the loudness compensation is
determined. To this end, the optimum gain calculation control unit 12a of the optimum gain
determination unit 12 sets the gain G = 1 (0 dB) in the gain multiplication unit 12b (step 101),
and the adder 12c at the listening position at that time. The noise PN is calculated by the
following equation: PN = G · PHN1 + PN2 = PHN1 + PN2 (step 102), and the noise power PN and
the power PHS of the speech signal are input to the loudness compensation unit 13. The
loudness compensation unit 13 refers to the gain table GTL to generate a compensation amount
(loudness compensation amount) such that an auditory volume feeling of voice can be obtained
10-04-2019
10
based on the power PHS of the call voice and the noise power PN. Are determined and input to
the optimum gain calculation control unit 12a (step 103). The optimal gain calculation control
unit 12a sets the input gain G as the gain G0 for the loudness compensation in the ideal state (G0
= G, step 104).
[0018]
Next, the optimum gain determination unit 12 determines that the amount of increase in noise
((GHN1 + N2) − (HN1 + N2)) obtained as a result of multiplying the transmission side noise N1
by the gain G is exactly the same as the amount of loudness compensation required for
correction. The gain G is calculated so that the gain becomes GH. To this end, the optimum gain
calculation control unit 12a of the optimum gain determination unit 12 sets the gain G (= G0) as
the initial value of Gold, sets the Gold in the gain multiplication unit 12b (step 111), and the
adder 12c The noise PN at the listening position at that time is calculated by the following
equation: PN = G · PHN1 + PN2 = G0 · PHN1 + PN2 (step 112), and the noise power PN and the
power PHS of the speech signal are input to the loudness compensation unit 13. The loudness
compensation unit 13 refers to a gain table to generate a compensation amount (a loudness
compensation amount) that can obtain an auditory volume feeling of voice based on the power
PHS of the call voice and the noise power PN. Are determined and input to the optimum gain
calculation control unit 12a (step 113). The optimum gain calculation control unit 12a compares
the input gain G with Gold (Step 114), and if G> Gold, sets the G to a new Gold and repeats the
processing from Step 111 onward. However, when the gain G becomes equal to or less than the
Gold, the gain G is set to GH (GH = G, step 115), and the GH determination processing is ended.
FIG. 4A is an explanatory diagram of a process of determining the gain GH.
[0019]
G0, which is initially set as Gold, is an ideal gain obtained when the synthetic noise N at the
microphone position is HN1 + N2, and the synthetic noise N at the microphone position is
obtained by multiplying the gain Gold by the noise N1. 'Becomes G0HN1 + N2 and becomes
larger than HN1 + N2. As a result, the gain G obtained for the loudness compensation becomes
larger than Gold (= G0), and the obtained gain G becomes a new Gold. After that, as the gain
increases, the synthetic noise at the microphone position increases and the gain G gradually
increases, but the calculated gain G at a predetermined noise level becomes G = Gold or G <Gold,
and G at that time is Assume GH.
10-04-2019
11
[0020]
FIG. 4B is an example of a gain table for explaining the process of obtaining the gain GH
numerically. Note that S = 70 dB, N1 = 60 dB, and N2 = 50 dB. The optimum gain determination
unit 12 obtains the total sum N of noises according to N = N1 + N2 = 60.4 dB, inputs S and N to
the loudness compensation unit 13, and the loudness compensation unit 13 refers to the gain
table GTL and S = 70 dB, Gain G for loudness compensation (in the ideal state) at N = 60.4 dB,
that is, G = f (S, N) = f (70, 60.4) = 2.9 (= G0) to determine the optimum gain Input to the decision
unit 12. Next, the optimum gain determination unit 12 calculates Gold = 2.9, calculates N1 ′ =
N1 + Gold = 62.9, N ′ = N1 ′ + N2 = 63.1, inputs S and N ′ to the loudness compensation unit
13, and the loudness compensation unit 13 Gain table for loudness compensation at S = 70 dB, N
'= 63.1 dB with reference to gain table GTL G = f (S, N') = f (70, 63.1) = 3.6 to determine optimum
gain Input to section 12 Since the determined gain G does not satisfy G ≦ Gold, the optimum
gain determination unit 12 sets Gold = 3.6, and N1 ′ = N1 + Gold = 63.6, N ′ = N1 ′ + N2 =
63.8 is calculated again, S, N 'is input to the loudness compensation unit 13, and the loudness
compensation unit 13 refers to the gain table GTL. Gain G = f (S, N') for loudness compensation at
S = 70 dB, N '= 63.8 dB. Determine = f (70, 63.1) = 3.6 and input it to the optimum gain
determination unit 12. Since the gain G at this time satisfies G ≦ Gold, GH = G = 3.6
[0021]
Returning to the process flow of FIG. 3, if the gain GH is determined, the optimum gain
determination unit 12 determines the gain (the overall volume) so as to minimize the deviation
from the ideal state, that is, the total error power. Gain considered) GL is calculated (step 121).
This gain GL can be calculated by the following equation using PHS, PHN1 and G0. Here, w1 and
w2 are weighting factors. This equation (1) is derived according to the following. Essentially, an
appropriate gain G0 should be obtained for the noise HN1 + N2, and only the speech S should be
multiplied by the gain G0. That is, it is ideal that the sound at the listening position is G0HS and
the noise is HN1 + N2. However, since the gain G is applied as G × (S + N1) in practice, the actual
sound at the listening position is GHS, and the actual noise is GHN1 + N2. That is, the voice and
the noise at the listening position deviate from the ideal state and cause errors e1 and e2 shown
by the following equations. Therefore, G is determined so as to minimize the evaluation function
using the following evaluation function. The gain that minimizes this evaluation function is
equation (1). That is, if G is obtained by setting the derivative value dJ / dG of the evaluation
function J differentiated by G to 0, equation (1) is obtained. Note that w1 and w2 are weighting
factors that determine how much the respective errors e1 and e2 are to be emphasized.
10-04-2019
12
[0022]
The optimum gain value Gopt is determined using the gains GL and GH obtained as described
above. That is, the power ratio (SN ratio) of the reception voice HS to the transmission side noise
N1 is calculated, and the values thereof are obtained in advance with reference to Tables 1 and 2
shown in (A) and (B) of FIG. The optimum gain value Gopt is determined (step 122). FIG. 6 is a
detailed processing flow of step 122, in which the power ratio (SN ratio) of the reception voice
HS to the transmission side noise N1 is calculated (step 201), and the reception voice HS and the
transmission side noise N1 are almost equal. It is checked whether the SN ratio is equivalent to 1
(0 dB in dB), ie, whether it is within a predetermined range (−α (dB) to + α (dB)) including 0 dB
(step 202). If it is out of the range, the optimum gain Gopt according to the SN ratio is
determined according to Table 1 of FIG. 5A and set in the gain adjustment unit 14 (step 203).
According to Table 2 of B), the optimum gain Gopt according to the speech voice power PHS and
the receiving side noise PN2 is determined and set in the gain adjustment unit 14 (step 204).
[0023]
(C) Determination criteria in Table 1 and Table 2 (a) Table 1 FIG. 5A is a table for determining the
optimum gain based on the speech voice power PHS and the transmission side noise power
PHN1. The SN ratio is so high that the SN ratio is lower toward the upper right. This Table 1 is
determined in consideration of the following. 1)When the S / N ratio is sufficiently high (20 dB
or more), the speech intelligibility improvement effect is the same as in the case of adopting the
same gain value (Gorg) as in the conventional case. Therefore, speech intelligibility is improved
with Gorg as the optimum gain Gopt. Gorg is a gain obtained from the loudness compensation
unit 12 when the speech voice power is PHS and the noise power PN at the receiving side
listening position is the receiving side noise power PN2. 2) When the SN ratio is about 10 dB, the
difference is small no matter which gain is selected between GL and GH, so the speech
intelligibility is improved with an intermediate value (GL + GH) / 2 between GL and GH as the
optimum gain Gopt. . Note that the gain Gorg calculated by the conventional method can also be
adopted. 3) When the SN ratio is about 0 dB, gain selection depending on the receiver noise
level N2 and the speech signal level HS (≒ HN1) is performed. See Table 2. 4) If the S / N ratio is
-10 dB or less, the S / N ratio is originally too bad, and even if the gain is superimposed, it does
not improve the audibility. For this reason, emphasis is placed on not increasing the overall
sound volume, and a gain GL is adopted as the optimum gain such that the deviation from the
ideal state, that is, the sum of error power is minimized. The gain may be converged to 0 dB as
the SN ratio becomes worse. That is, G = 0 dB when the SN ratio is -20 dB or less.
10-04-2019
13
[0024]
(B) Table 2 FIG. 5 (B) is a table for determining the optimum gain based on the receiving noise
level N2 and the speech signal level HS (≒ HN1) when the SN ratio is about 0 dB. This Table 2 is
determined in consideration of the following. 1) As the speech voice power PHS increases, the
optimum gain is converged to 0 dB (direction not to be corrected). 2) Also, even with the same
PHS, as the receiving noise power PN2 increases, it is difficult for the optimum gain to converge
to 0 dB. For example, when PHS is 50 dB, a gain corresponding to GH is selected for any PN2.
Here, as PHS increases to 60 dB and 70 dB, the gain changes as GL → 0 at PN2 = 50 dB, while
the median value of GH and GL at GL 2 = 60 dB (GL + GH) / 2 → 0 It changes, and it changes like
the intermediate value (GL + GH) / 2-> GL of GH and GL at PN2 = 70 dB. That is, although the
step-by-step reduction of the gain is similar, the degree is made weak (it is difficult to become 0
dB).
[0025]
The reason for determining the optimum gain as shown in Table 2 is as follows. If the level of the
transmit side total sound H (S + N1) heard by the receiver (when G = 1) before lifting the signal S
+ N1 is small, the signal gain S + N1 is raised by a large gain G Therefore, even if the level of the
sound GH (S + N1) that the receiver will finally hear is slightly increased, the receiver does not
feel so uncomfortable. Therefore, when the level of H (S + N1) is small, GH is used to calculate the
signal S + N1 as Gopt = GH using the GH calculated based on the idea of loudness equivalence.
Improve the ease. Conversely, if the level of the transmit side total sound H (S + N1) that the
receiver has been listening to (when G = 1) before lifting the signal S + N1 is large, the signal S +
N1 is increased. If you lift it by gain G, the level of the transmitter total sound GH (S + N1) that
the receiver will finally hear will become large, and before lifting up the signal S + N1 (G = 1 The
sound is clearly louder than the sound H (S + N1) that was being heard. And, it is perceived as
unpleasant by the receiver because it is clearly perceived by the receiver. Therefore, the higher
the level of the speech voice HS (and the transmitter noise HN1) that the receiver listened to
(when G = 1) before lifting the signal S + N1, that is, H (S + N1) The gain G is set to a smaller
value as the level of V increases, so that the sense of discomfort as described above is not felt.
Also, as the level of the receiver noise N2 increases, the total SN ratio of the voice heard by the
receiver worsens. However, as N2 is relatively larger than H (S + N1), the degree of improvement
of the total SN ratio when H (S + N1) is amplified becomes larger. In order to take advantage of
this effect, as the level of N2 becomes relatively larger than H (S + N1), the larger gain G is set to
improve the ease of hearing.
10-04-2019
14
[0026]
(C) Numerical Example The optimum gain determination table when numerical values are
actually entered is shown as Tables 3 to 5 in FIGS. 7 (A) to 7 (C). FIGS. 7A to 7C are optimum gain
determination tables in the case of N2 (PN2) = 50 dB, 60 dB, and 70 dB, respectively. Although
the numerical examples described in each table are partial, they can be expanded (refined)
according to the rules described above. FIG. 8 is an explanatory view of a specific optimum gain
determination process for PS = 70 dB, PN1 = 60 dB, and PN2 = 50 dB, wherein (A) gain table, (B)
and (C) are optimum gain determination processing steps FIG. However, it is assumed that H = 0
dB and w1 = w2 = 1. (1) The gain Gorg when N1 = 0 is obtained from the gain table according
to the prior art will be Gorg = 1.1 dB. (2) Next, when the gain G0 in the ideal state is obtained
from the gain table, G0 = 2.9 dB. (3) After G0 is obtained, GH = 3.6 dB is obtained by obtaining a
gain GH which is balanced with the increment of the transmission side noise N1. (4) After that,
when the gain GL in consideration of the entire volume is calculated, it becomes GL = 2.7 dB. (5)
Since the power ratio (SN ratio) of the reception voice power PHS to the transmission side noise
power PN1 is 10 dB, the optimum gain Gopt is Gopt = (GL + GH) / 2, and Gopt = 3.2 dB (FIG. 7A)
See shaded area). Other optimum gains can be calculated similarly with reference to Table 1 and
Table 2 of FIG.
[0027]
In the speech intelligibility improvement system, it is ideal that only the speech S is raised by 2.9
dB. However, in practice, the voice S and the noise N1 are simultaneously lifted. Therefore,
according to the present invention, as a result of simultaneously raising the voice S and the noise
N1, according to the loudness theory, the gain GH and the voice and the total noise level are
closest to the ideal state. The gain GL which will become is calculated, and the final optimal gain
Gopt is determined from the two values and the SN ratio.
[0028]
(D) Modification The above embodiment calculates G0, GH, and GL, and then determines the
optimum gain Gopt using GH and GL according to Table 1 and Table 2 shown in FIG. 5 based on
the SN ratio. It is the case. However, a large number of optimum gain determination tables (for
example, Tables 3 to 5 in FIG. 7) when the receiver noise level is varied are created and stored
beforehand, and the optimum gain according to the actual receiver noise level N2 The optimum
gain Gopt according to PS and PN1 can also be determined by lookup processing using the
10-04-2019
15
determination table.
[0029]
(E) Power calculation processing of transmission side noise HN1 The input signal (voice + noise)
data is divided into each frame which is a set of a predetermined number of samples, the power
of each frame is extracted, and the appearance frequency thereof The call voice HS and the
transmission side noise HN1 can be calculated by creating a power distribution associated with
the number of frames representing L and estimating the noise power based on the power
distribution (Japanese Patent No. 3888727). . This method will be described below. First, the
power distribution of speech is as shown in FIG. 9, and it is known that the power value Es
(hereinafter referred to as the maximum frequency power) that becomes highest on the
distribution becomes a very small value. On the other hand, since the noise power distribution
can be modeled as a normal distribution as shown in FIG. 10, the maximum frequency power En
can be regarded as the average value of the noise power. In this method, the power of noise is
estimated from the input signal (voice + noise), paying attention to the difference in power
distribution between the voice signal and the noise signal.
[0030]
Below, the estimation method of the power of noise is demonstrated using a specific example.
Here, in order to make it easy to grasp the outline of the power estimation method, a simple one
as shown in FIG. 11A and FIG. 11B is used as the power distribution of voice and noise. Further,
since the value of the vertical axis of the power distribution is considered to be the appearance
probability of the power value of the horizontal axis, the value of the vertical axis in FIGS. 11A,
11B, and 11C is “frequency”. Instead, it is written as "appearance probability Ps (Esj), Pn (Enj)".
Now, the power value (value of the horizontal axis on the power distribution) of the input signal
(voice + noise): Exj and its appearance probability (value of the vertical axis of power
distribution): Px (Exj) Since they are statistically independent, in the case of an audio signal
having a power distribution as shown in FIG. 11A and a noise signal having a power distribution
as shown in FIG. 11B, the power values: Esj, Enj And the appearance probability at that time: Ps
(Esj), Pn (Enj) can be expressed as the following equation (4) and equation (5). Exj = Esj + Enj (4)
Px (Exj) = Ps (Esj) × Pn (Enj) (5) Therefore, the power distribution of the input signal at this time
can be expressed by these equations (4) From Eq. (5), it can be obtained as shown in FIG. 12 (C).
[0031]
10-04-2019
16
A brief description of FIG. 11C will be given. For example, when the maximum frequency power
Es of the audio signal is 2 (Es = 2) and the maximum frequency power En of the noise signal is
100 (En = 100), the power Ex of the input signal is 102 according to equation (4). It becomes (Ex
= 102). In addition, the appearance probability Ps (Es) of the speech signal whose power is 2 (Es
= 2) is the maximum value of the appearance probability of the power in the speech signal
(maximum appearance probability), and the power is 100 (En = 100) Since the appearance
probability Pn (En) of a noise signal is the maximum appearance probability of power in the
noise signal, the appearance probability Px (Ex) of the power of the input signal at that time is It
becomes the maximum appearance probability of power. Furthermore, the power value Ex of the
input signal at this time is 102 (Ex = 102), and the average value En of the noise power is 100
(En = 100), and these two values are close (Es = 2 Noise power (strictly speaking, a value close to
the noise power) can be estimated from the power value that gives the maximum occurrence
probability in the power distribution of the input signal, since Es is smaller than En). It can be
said. Summarizing the above, it is considered that the power of the audio signal is the maximum
frequency power and the power of the noise signal is the maximum frequency power when the
maximum appearance probability (maximum frequency) is obtained in the input signal. . And
since the maximum frequency power of the audio signal is very small, the maximum frequency
power of the input signal can be approximated by the maximum frequency power of the noise
signal, and the maximum frequency power of this noise signal is the average value of the power
of the noise signal It can be regarded. Therefore, if the maximum frequency power of the input
signal is determined, it can be used as an estimate of the noise power.
[0032]
The noise power PHN1 can also be estimated using the following two methods. The first method
focuses on the periodicity of speech and aperiodicity of noise, and estimates the noise power
using the autocorrelation function of the input signal (see Japanese Patent Laid-Open No. 2005208152). The second method converts the input signal into the frequency domain for each time
interval, and updates the noise spectrum estimation value based on the power ratio between the
input signal spectrum for each frequency and the noise spectrum estimation value ( JP-A-1097288). However, it is assumed that the first several tens of msec of the input signal is a silent
interval, and that value is taken as the initial value of the noise spectrum.
[0033]
DESCRIPTION OF SYMBOLS 10 Speech intelligibility improvement unit 11 Power estimation part
12 Optimal gain determination part 13 Loudness compensation part 14 Gain adjustment part 21
10-04-2019
17
Communication network 22 Speaker 23 Microphone 31 Speaker of a communicating party 32
Microphone of a communicating party
10-04-2019
18
Документ
Категория
Без категории
Просмотров
0
Размер файла
35 Кб
Теги
jp2011023959, description
1/--страниц
Пожаловаться на содержимое документа