close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2014194437

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2014194437
Abstract: To detect a desired voice with high accuracy. SOLUTION: A deriving means for deriving
an attenuation factor ratio between a first attenuation factor from a noise source to a first
microphone and a second attenuation factor from a noise source to a second microphone, an
attenuation factor ratio and a second microphone Integration means for integrating the input
second input signal, subtraction means for subtracting the integration result by the integration
means from the first input signal input by the first microphone, subtraction result by the
subtraction means, and a predetermined threshold value And determining means for determining
that the desired voice is present when the subtraction result is large. [Selected figure] Figure 1
Voice processing apparatus, voice processing method and voice processing program
[0001]
The present invention relates to a speech processing technique for processing a mixed signal in
which desired speech and noise are mixed.
[0002]
In the above technical field, Patent Document 1 discloses the technology of a voice detection
device provided with a plurality of directional microphones.
Patent Document 1 discloses a technique for detecting a desired voice regardless of the
magnitude of noise by using a level difference and a power ratio of signals collected by two
11-04-2019
1
microphones in combination.
[0003]
JP 2008-304498 A
[0004]
However, in the voice detection device of Patent Document 1 described above, when changes in
signal level difference and power ratio are small between the presence and absence of the
desired voice, setting of the threshold is difficult and accurate. Voice detection could not be
performed.
[0005]
An object of the present invention is to solve the above problems.
[0006]
In order to achieve the above object, a device according to the present invention is derived by
deriving an attenuation factor ratio between a first attenuation factor from a noise source to a
first microphone and a second attenuation factor from the noise source to a second microphone.
Means, integrating means for integrating the attenuation ratio and the second input signal
inputted by the second microphone, and subtraction for subtracting the integration result by the
integrating means from the first input signal inputted by the first microphone And a
determination unit configured to compare the subtraction result by the subtraction unit with a
predetermined threshold value and determine that the desired voice is present when the
subtraction result is large.
[0007]
In order to achieve the above object, a method according to the present invention is derived by
deriving an attenuation factor ratio of a first attenuation factor from a noise source to a first
microphone and a second attenuation factor from the noise source to a second microphone. An
integration step of integrating the attenuation ratio and the second input signal input by the
second microphone; and subtracting the integration result of the integration step from the first
input signal input by the first microphone It is characterized by including a step of determining
the desired voice when the subtraction result is large by comparing the step and the subtraction
result of the subtraction step with a predetermined threshold.
11-04-2019
2
[0008]
In order to achieve the above object, a program according to the present invention is derived by
deriving an attenuation factor ratio between a first attenuation factor from a noise source to a
first microphone and a second attenuation factor from the noise source to a second microphone.
An integration step of integrating the attenuation ratio and the second input signal input by the
second microphone; and subtracting the integration result of the integration step from the first
input signal input by the first microphone And causing a computer to execute a step of
determining the desired voice when the subtraction result is large by comparing the subtraction
result of the subtraction step with a predetermined threshold value. Do.
[0009]
According to the present invention, a desired voice can be detected with high accuracy.
[0010]
It is a block diagram showing the composition of the speech processing unit concerning a 1st
embodiment of the present invention.
It is a figure explaining the speech processing unit concerning a 2nd embodiment of the present
invention.
It is a figure explaining the speech processing unit concerning a 2nd embodiment of the present
invention.
It is a figure explaining the speech processing unit concerning a 2nd embodiment of the present
invention.
It is a block diagram showing the composition of the speech processing unit concerning a 2nd
embodiment of the present invention.
It is a flowchart explaining the flow of a process of the speech processing unit concerning a 2nd
embodiment of the present invention.
11-04-2019
3
It is a block diagram showing the composition of the speech processing unit concerning a 3rd
embodiment of the present invention.
It is a block diagram showing the composition of the speech processing unit concerning a 4th
embodiment of the present invention. It is a figure explaining the structure of the speech
processing unit concerning a 4th embodiment of the present invention.
[0011]
Hereinafter, embodiments of the present invention will be exemplarily described in detail with
reference to the drawings. However, the component described in the following embodiment is an
illustration to the last, and it is not a thing of the meaning which limits the technical scope of this
invention only to them.
[0012]
First Embodiment A speech processing apparatus 100 according to a first embodiment of the
present invention will be described with reference to FIG. The speech processing apparatus 100
includes a derivation unit 101, an integration unit 102, a subtraction unit 103, and a
determination unit 104.
[0013]
The deriving unit 101 derives an attenuation factor ratio between the first attenuation factor
from the noise source to the first microphone 110 and the second attenuation factor from the
noise source to the second microphone 120.
[0014]
The integration unit 102 integrates the attenuation factor ratio and the second input signal input
by the second microphone 120.
11-04-2019
4
Further, the subtraction unit 103 subtracts the integration result by the integration unit 102
from the first input signal input by the first microphone 110. Then, the determination unit 104
compares the subtraction result by the subtraction 103 with a predetermined threshold value,
and determines that the desired voice is present when the subtraction result is large.
[0015]
According to the configuration as described above, desired voice can be detected with high
accuracy.
[0016]
Second Embodiment (Base Technology) As shown in FIG. 2, it is assumed that there are two
microphones and two sound sources.
[0017]
The source of the desired voice of the two sound sources is the sound source 210, and the noise
source is the sound source 220.
The time series of the power of the sound signal generated by the sound source 210 is PA (t),
and the time series of the power of the sound signal generated by the sound source 220 is PB (t).
PA (t) and PB (t) are values that can not be observed directly.
[0018]
Of the two microphones, one close to the sound source 210 is referred to as a microphone 201,
and one distant from the sound source 210 is referred to as a microphone 202. The time series
of the power of the sound signal collected by the microphone 201 is P1 (t), and the time series of
the power of the sound signal collected by the microphone 202 is P2 (t). P1 (t) and P2 (t) are
directly observable values.
[0019]
11-04-2019
5
The attenuation factor of the power generated by the sound source 210 to reach the microphone
201 and the microphone 202 is dA1 and dB1, respectively, and the attenuation of the power
generated by the sound source 220 to the microphone 201 and the microphone 202 is
represented by dA1 and dB1. Let dA2 and dB2, respectively. When the sound source is
considered to be a point sound source, the power of speech decreases in inverse proportion to
the square of the distance, so the attenuation factor is the reciprocal of the square of the
distance.
[0020]
The time series of the power of the sound signal collected by the microphone 201 and the
microphone 202 satisfies the following relationship. P1 (t) = PA (t) x dA1 + PB (t) x dB1 P2 (t) =
PA (t) x dA2 + PB (t) x dB2 Power of sound signal collected by the microphone 201 and the
microphone 202 The difference time series D (t) is defined by the following equation.
[0021]
D (t) = P1 (t) -P2 (t) The time series R (t) of the ratio of the power of the sound signal collected by
the microphone 201 and the microphone 202 is defined by the following equation.
[0022]
R (t) = P1 (t) / P2 (t) In the case where there is no desired voice but only noise, PA (t) = 0 and PB
(t)> 0.
[0023]
At this time, the time series DB (t) of the power difference and the time series RB (t) of the ratio
of the power are calculated as follows.
[0024]
DB (t) = PB (t) × (dB1−dB2) RB (t) = dB1 / dB2 In the case of only voice without noise, PA (t)> 0
and PB (t) = 0.
[0025]
11-04-2019
6
At this time, the time series DA (t) of the power difference and the time series RA (t) of the ratio
of the power are calculated as follows.
[0026]
DA (t) = PA (t) × (dA1−dA2) RA (t) = dA1 / dA2 D (t) has the following relationship with DA (t)
and DB (t).
[0027]
D (t) = DA (t) + DB (t) FIG. 3 shows an example of temporal change of D (t), DA (t) and DB (t).
[0028]
Also, there is the following relationship between R (t) and RA (t) and RB (t).
[0029]
R (t) = α (t) × RA (t) + (1−α (t)) × RB (t) where α (t) = 1 / (1 + PB (t) / PA (t) × dB 2 / DA2).
α has a value between 0 and 1.
[0030]
FIG. 4 shows an example of time change of R (t), RA (t) and RB (t).
As shown in FIG. 4, the time series of R (t) is a value obtained by internally dividing the values of
each time of the time series of RA (t) and RB (t) by α (t): 1-α (t) It is a series.
[0031]
Speech detection using the time series D (t) of the power difference and the time series R (t) of
the ratio of the power has a disadvantage depending on the conditions.
11-04-2019
7
[0032]
In voice detection, the presence of a desired voice is determined by comparing the feature
amount with the threshold value.
For this reason, feature quantities having a large difference between when the desired voice is
present and those not present are good feature quantities, and feature quantities having a small
difference are bad feature quantities.
[0033]
The conditions under which the time series D (t) of the power difference is a bad feature quantity
are the following four conditions.
[0034]
Condition 1-1: The temporal change (the difference between the maximum value and the
minimum value) of the time series PA (t) of the desired voice power is small.
→ Temporal change of DA (t) becomes small.
[0035]
Condition 1-2: The temporal change in the time series PB (t) of the noise power is large.
→ Temporal change of DB (t) becomes large.
[0036]
Condition 1-3: Desired voices are input to the microphone 1 and the microphone 2 equally.
11-04-2019
8
As dA11dA2, DA (t) ≒ 0, and the temporal change becomes smaller.
[0037]
Condition 1-4: A large amount of noise is input to the microphone 2 and a small amount of noise
is input to the microphone 1.
→ Temporal change of DB (t) becomes large.
[0038]
If the above conditions apply, the time change of DA (t) becomes smaller than the time change of
DB (t), making it difficult to determine the threshold.
[0039]
On the other hand, the conditions under which the voice detection using the time series R (t) of
the ratio of power is a bad feature amount are the following two conditions.
[0040]
Condition 2-1: Desired voices are input to the microphone 1 and the microphone 2 equally.
→dA1≒dA2となるため、RA(t)≒1となる。
[0041]
Condition 2-2: Noise is uniformly input to the microphone 1 and the microphone 2.
→dB1≒dB2となるため、RB(t)≒1となる。
11-04-2019
9
[0042]
If the above conditions apply, the difference between RA (t) and RB (t) becomes small, making it
difficult to determine the threshold.
[0043]
When the distance between the microphone and the mouth is long, the level of the desired voice
input to the microphone 1 and the microphone 2 approaches.
For this reason, it is difficult to determine both the power difference D (t) and the power ratio R
(t).
[0044]
In this embodiment, instead of using D (t) and R (t), voice detection is performed using a time
series E (t) of power obtained by suppressing noise from the sound signal of the microphone 1.
[0045]
The noise suppression power time series E (t) is defined by the following equation.
[0046]
E (t) = P1 (t) −Q (t) × P2 (t) Here, Q (t) is an estimated value of RB (t).
[0047]
If RB (t) can be estimated correctly, E (t) is calculated as follows.
[0048]
E (t) = P1 (t)-RB (t) x P2 (t) = PA (t) x dA1 x (1-dB1 / dB2 x dA2 / dA1) dA1 / dA2> dB1 / dB2 E
Since t) always has a value of 0 or more, if the threshold is set to a value slightly larger than 0, it
is possible to determine the presence of a desired voice.
11-04-2019
10
Since the threshold value can be set small, the value of the audio power may be small.
Since this equation does not include the value of the noise power, it does not depend on the
magnitude of the noise.
[0049]
By using E (t) for the determination of speech, it is sufficient to set the threshold value to a fixed
value that is a little larger than 0 regardless of the size of the speech, so speech detection does
not depend on the size of the speech. it can.
In addition, since E (t) does not include a noise term, speech detection independent of noise
magnitude can be performed.
[0050]
(Device Configuration) A voice processing device 500 according to a second embodiment of the
present invention will be described with reference to FIG.
[0051]
As shown in FIG. 5, the speech processing apparatus 500 includes a microphone 201, a
microphone 202, a power calculation unit 503, a power calculation unit 504, a noise power ratio
estimation unit 505, a noise power estimation unit 506, and noise suppression. A power
estimation unit 507 and a threshold comparison unit 508 are included.
It is desirable that the microphone 201 be closer to a desired sound source than the microphone
202.
The microphone 201 acquires a first mixed signal in which desired voice and noise are mixed.
11-04-2019
11
The microphone 202 acquires a second mixed signal in which desired voice and noise are mixed
at a rate different from that of the first mixed signal.
The power calculation unit 503 receives the first mixed signal as input, calculates and outputs
power.
The power calculation unit 504 receives the second mixed signal as input, calculates and outputs
power. The noise power ratio estimation unit 505 receives the power of the first mixed signal
and the power of the second mixed signal as input, and estimates and outputs a noise power
ratio. The noise power estimation unit 506 receives the power of the second mixed signal and the
noise power ratio, estimates the noise power included in the first mixed signal, and outputs the
estimated noise power. The noise suppression power estimation unit 507 receives the power of
the first mixed signal and the estimated value of the noise power included in the first mixed
signal as input, and estimates and outputs the noise suppression power. The threshold
comparison unit 508 receives the noise suppression power and a threshold set in advance, and
compares the magnitude relationship to determine whether a desired voice is present.
[0052]
Next, the entire operation of the present embodiment will be described in detail with reference to
the flowcharts of FIGS. 6 and 7.
[0053]
First, a first mixed signal in which a desired voice and noise are mixed in the microphone 201 is
acquired (step S601).
Further, the microphone 202 acquires a second mixed signal in which desired voice and noise
are mixed at a rate different from that of the first input signal. The first mixed signal and the
second mixed signal are acquired by converting a time series of analog data such as a potential
difference by an AD converter into digital data having a quantization size of 16 bits and a
sampling number of 44 kHz, for example.
[0054]
11-04-2019
12
The power calculation unit 103 calculates a time series of power from the first mixed signal.
Further, the power calculating unit 104 calculates a time series of power from the second mixed
signal (step S602). The power is determined with respect to the unit cut out in short time units
such as 20 milliseconds. The values of the time series of the power of the first mixed signal and
the power of the second mixed signal calculated for the unit time t are P1 (t) and P2 (t),
respectively. As a method of calculating the power, for example, input waveform data is squared
for each sample and averaged over the unit time sampling points. Alternatively, short-time
Fourier transformation may be performed to calculate the square of the spectrum obtained for
each frequency, and the average in the frequency direction may be used. The subsequent
processing is performed every unit time.
[0055]
The noise power ratio estimation unit 105 estimates a ratio Q (t) of the power of noise contained
in the first mixed signal and the power of noise contained in the second mixed signal (step S603).
The following method can be considered to obtain the estimation method of Q (t).
[0056]
In an ideal environment where one noise source does not move, this ratio does not depend on the
value of the power generated by the noise source, but only on the positional relationship and
becomes a constant value. Therefore, the ratio of the power P1 (t) of the first mixed signal to the
power P2 (t) of the second mixed signal with respect to a plurality of unit times before the user
vocalizes is determined, and this average value is RB (t Use the same value as the estimate of).
[0057]
Alternatively, it is conceivable to estimate Q (t) using an average whose rise is late and fall is
early. Specifically, the following equation is used.
[0058]
11-04-2019
13
Q (t) =. Beta..times.P1 (t) / P2 (t) + (1-.beta.). Times.Q (t-1) where .beta. Is a value from zero to
one. When P1 (t) / P2 (t)> Q (t-1), a value close to 0 is used for β. When P1 (t) / P2 (t) ≦ Q (t-1),
a value close to 1 is used for β.
[0059]
It is conceivable to estimate Q (t) using the same procedure as other general noise estimation
methods. When using a general noise estimation method, P1 (t) / P2 (t) is regarded as the power
of the input signal mixed with the desired voice and noise, and the power of the noise is
calculated using the noise estimation method from the power of this signal Estimate and set Q (t).
As an example of a general noise estimation method, a method may be considered in which the
minimum value of the power of the input signal is remembered for a fixed time and the power of
the noise is taken.
[0060]
The noise power estimation unit 106 estimates the power of noise included in the first mixed
signal (step S604). The power of the noise is estimated by multiplying the power P2 (t) of the
second mixed signal by the noise power ratio Q (t).
[0061]
Compared to the method of directly estimating the power P2 (t) of the first mixed signal, the
method of multiplying the power P2 (t) of the second mixed signal and the noise power ratio RB
(t) in this way is more accurate and is more accurate Noise can be estimated. This is because the
value of the noise power ratio Q (t) hardly depends on the magnitude of noise.
[0062]
The noise suppression power estimation unit 107 suppresses the noise included in the first
mixed signal, and estimates the noise suppression power E (t) (step S605). Specifically, the noise
power estimated from the first mixed signal is subtracted.
11-04-2019
14
[0063]
E (t) = P1 (t) −Q (t) × P2 (t) In addition to this, the noise power estimated from the first mixed
signal may be multiplied by several and subtracted. It is also conceivable to estimate the noise
suppression power E (t) using a general noise removal method. When using a general noise
removal method, P1 (t) is regarded as the power of the input signal in which the desired speech
and noise are mixed, and Q (t) × P2 (t) is regarded as the estimated noise power. The method of
removal is used to remove the power of noise estimated from the power of the input signal. As an
example of a general noise removal method, in addition to simple subtraction, a noise reduction
filter may be calculated and the power of the input signal may be multiplied to suppress the
noise power.
[0064]
The threshold comparison unit 108 compares the noise suppression power E (t) with a preset
threshold Θ to determine whether a desired voice is present (step S606). If E (t) is greater than
the threshold Θ, it is determined that there is a voice, otherwise it is determined that there is no
voice. The value of the threshold Θ is set to a value slightly larger than zero.
[0065]
The noise suppression power E (t) almost completely removes the noise regardless of the
magnitude of the noise. When the second mixed signal includes the desired speech, noise and a
part of the desired speech are suppressed. However, if the desired sound is mixed with the
microphone 201 by a little more than the microphone 202, all the desired sound is not erased.
Therefore, the presence of a desired voice can be detected by comparing the noise suppression
power E (t) with the threshold Θ. Further, since the value of the threshold Θ does not depend on
the magnitude of noise, a constant value independent of noise can be used. Therefore, the object
of the present invention can be achieved by using this configuration.
[0066]
Further, the voice detection with the above configuration may be performed by dividing the
11-04-2019
15
frequency band and performing each frequency band. In this case, the noise suppression power E
(t) may be determined for each frequency band, and the average or sum thereof may be
compared with the threshold, or the frequency is compared with the threshold for each
frequency band and the result is integrated using majority decision etc. You may
[0067]
Third Embodiment A speech processing apparatus 700 according to a third embodiment of the
present invention will be described with reference to FIG.
[0068]
As shown in FIG. 7, this embodiment is characterized by including an adaptive filter 701.
[0069]
The adaptive filter 701 receives the second mixed signal as input, and generates a pseudo noise
signal by approximating an impulse response of a path (noise path) until noise contained in the
second mixed signal reaches the first mixed signal.
A pseudo emphasis signal is obtained by subtracting the pseudo noise signal from the first mixed
signal.
As the adaptive filter 701, it is conceivable to use the adaptive filter described in the
conventional example of Japanese Patent Application Laid-Open No. 08-056180.
[0070]
The pseudo-emphasis signal is input to the power calculation unit 103, and the pseudo noise
signal is input to the power calculation unit 204, and the same processing as in the first
embodiment is performed.
[0071]
When a large amount of noise is mixed in the first mixed signal or a large amount of voice is
11-04-2019
16
mixed in the second mixed signal, the noise suppression power E (t) removes not only the noise
but also part of the voice.
By using a pseudo-emphasis signal instead of the first mixed signal and using a pseudo noise
signal instead of the second mixed signal, the noise suppression power E (t) can be made close to
a value in which only noise is suppressed. Therefore, voice detection with fewer errors can be
performed compared to the first embodiment.
[0072]
A microphone arrangement suitable for the present embodiment is shown in FIG. It is desirable
that the desired sound source 210 is closer to the microphone 201 and farther from the
microphone 202, and the noise source 220 is closer to the microphone 202 and farther from the
microphone 201. The distances from the desired sound generation source 210 to the
microphone 201 and the microphone 202 are rA1 and rB1, respectively, and the distances from
the noise source 220 to the microphone 201 and the microphone 202 are rA2 and rB2,
respectively. At this time, it is desirable that the value of rA1 / rB1 be smaller than the value of
rA2 / rB2.
[0073]
Fourth Embodiment A voice processing apparatus 800 according to a fourth embodiment of the
present invention will be described with reference to FIG. As shown in FIG. 8, the present
embodiment is characterized in that a beam former 801 and a beam former 802 are provided at
the front stage of the second embodiment.
[0074]
The beam former 801 calculates the sum of the first mixed signal and the second mixed signal in
the time waveform area to obtain a sum signal. The beam former 802 calculates the difference
between the first mixed signal and the second mixed signal in the time waveform area to obtain a
difference signal.
11-04-2019
17
[0075]
The sum signal is input to the power calculation unit 503, and the difference signal is input to
the power calculation unit 504, and the same processing as in the second embodiment is
performed.
[0076]
A microphone arrangement suitable for the present embodiment is shown in FIG.
The desired sound source 210 is equidistant from the microphone 201 and the microphone 202,
and the noise source 220 is preferably closer to either the microphone 201 or the microphone
202.
[0077]
Also, if the desired sound generation source 210 is close to either the microphone 201 or the
microphone 202, and the noise source 220 is equidistant from the microphone 201 and the
microphone 202, the beamformer 801 calculates a difference signal and the beam The sum
signal is calculated by the former 802, the difference signal is input to the power calculation unit
103, and the sum signal is input to the power calculation unit 104.
[0078]
It is also conceivable to use a beamformer 801 to direct the beam to a desired direction of voice,
and a beamformer 802 to perform an adaptive beamformer to direct the beam to a noisy
direction.
[0079]
Other Embodiments Although the embodiments of the present invention have been described in
detail, systems or devices that combine the different features included in each embodiment are
also included in the scope of the present invention.
[0080]
Furthermore, the present invention may be applied to a system configured of a plurality of
11-04-2019
18
devices or to a single device.
Furthermore, the present invention is also applicable to the case where an information
processing program for realizing the functions of the embodiments is supplied to a system or
apparatus directly or remotely.
Therefore, in order to realize the functions of the present invention on a computer, a program
installed on the computer, a medium storing the program, and a WWW (World Wide Web) server
for downloading the program are also included in the scope of the present invention. .
[Other Expressions of Embodiment] Some or all of the above-described embodiments can be
described as in the following appendices, but is not limited thereto. (Additional remark 1)
Derivation means for deriving an attenuation factor ratio between the first attenuation factor
from the noise source to the first microphone and the second attenuation factor from the noise
source to the second microphone; The attenuation factor ratio and the first attenuation factor (2)
Integration means for integrating the second input signal input by the microphone, Subtraction
means for subtracting the integration result by the integration means from the first input signal
input by the first microphone, Subtraction result by the subtraction means A voice processing
apparatus comprising: determination means for comparing with a predetermined threshold value
and determining that a desired voice is present when the subtraction result is large. (Additional
remark 2) It further has an input signal ratio calculation means which calculates an input signal
ratio of said 1st input signal and said 2nd input signal in the state where desired voice is not
generated, and said derivation means is said input signal The speech processing apparatus
according to claim 1, wherein the attenuation factor ratio is derived using a ratio.
(Supplementary Note 3) The input signal ratio calculation means calculates an average value of
the ratio of the first input signal and the second input signal input in a predetermined period
before the desired voice is generated, and uses the average value as the input signal ratio. The
speech processing apparatus according to claim 1, characterized in that: (Supplementary Note 4)
The input signal ratio calculation means is a ratio of the first input signal to the second input
signal input during a predetermined period before the desired voice is generated, and excludes a
portion where the ratio rises early. The speech processing device according to claim 3, wherein
an average value of the period is calculated to be the input signal ratio. (Supplementary Note 5)
The input signal ratio calculation means is a ratio of the first input signal to the second input
signal input during a predetermined period before the desired voice is generated, and a portion
where the falling of the ratio is late The speech processing apparatus according to any one of
Appendices 3 or 4, wherein an average value of the excluded periods is calculated to be the input
signal ratio. (Supplementary Note 6) A linear filter is further provided which generates a pseudo
noise signal mixed with the first input signal from the second input signal, and a pseudo voice
11-04-2019
19
signal is determined by subtracting the pseudo noise signal from the first input signal. Means is
further provided, wherein the integration means integrates the attenuation ratio and the pseudo
noise signal, and the subtraction means subtracts the integration result by the integration means
from the pseudo sound signal. The voice processing apparatus according to any one of
appendices 1 to 5. (Supplementary Note 7) A first beam former that generates a sum signal of the
first input signal and the second input signal, and a second beam former that generates a
difference signal of the first input signal and the second input signal And the integration means
integrates the attenuation ratio and the difference signal, and the subtraction means subtracts
the integration result of the integration means from the sum signal. The voice processing device
according to any one of items 1 to 6.
(Supplementary Note 8) A first beam former that generates a pseudo voice signal from the first
input signal and the second input signal by directing a beam to a sound source of desired voice,
and a first beam input by directing a beam to a noise source. And a second beam former for
generating a pseudo noise signal from the signal and the second input signal, wherein the
integration means integrates the attenuation factor ratio and the pseudo sound signal, and the
subtraction means integrates the signal by the integration means. The speech processing
apparatus according to any one of appendices 1 to 7, wherein a result is subtracted from the
pseudo noise signal. (Supplementary Note 9) A derivation step of deriving an attenuation factor
ratio of a first attenuation factor from a noise source to a first microphone and a second
attenuation factor from the noise source to the second microphone, The attenuation factor ratio,
and (2) an integration step of integrating the second input signal input by the microphone, a
subtraction step of subtracting the integration result of the integration step from the first input
signal input of the first microphone, and a subtraction result of the subtraction step A voice
processing method comprising: a determination step of comparing with a predetermined
threshold and determining that a desired voice is present when the subtraction result is large.
(Supplementary note 10) Deriving step of deriving an attenuation factor ratio between a first
attenuation factor from the noise source to the first microphone and a second attenuation factor
from the noise source to the second microphone, The attenuation factor ratio and the second
attenuation factor (2) an integration step of integrating the second input signal input by the
microphone, a subtraction step of subtracting the integration result of the integration step from
the first input signal input of the first microphone, and a subtraction result of the subtraction
step An audio processing program causing a computer to execute a determination step of
comparing with a predetermined threshold value and determining that a desired audio exists
when the subtraction result is large.
11-04-2019
20
Документ
Категория
Без категории
Просмотров
0
Размер файла
30 Кб
Теги
description, jp2014194437
1/--страниц
Пожаловаться на содержимое документа