close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2016127458

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2016127458
Abstract: [PROBLEMS] To provide a sound collection device and method for suppressing sound
collection of a background noise component even when strong background noise exists around a
sound source of a target sound. A sound collection device 100 includes directivity forming units
2-1 and 2-2 for forming directivity in the direction of a target area with respect to outputs of
microphone arrays MA1 and MA2, and directivity forming unit 2-1. The target area sound
extraction unit 6 extracts the target area sound by suppressing the component of the non-target
area sound from the output 2-2. A coherence calculation unit 7 that calculates and adds
coherence for each frequency from the outputs of the directivity forming units 2-1 and 2-2, and
an area sound determination unit 8 that determines presence / absence of a target area sound
using a coherence addition value And an output unit that outputs the target area sound when it is
determined that the target area sound is present, and does not output the target area sound
otherwise. [Selected figure] Figure 1
Sound collecting device, program and method
[0001]
The present invention relates to a sound collection device and program, and can be applied to,
for example, a sound collection device and program that emphasizes sound in a specific area and
suppresses sound in other areas.
[0002]
Conventionally, as a technology for separating and collecting only sound in a specific direction
11-04-2019
1
(hereinafter, also referred to as “target direction”) under an environment in which a plurality
of sound sources exist, a beam former (Beam Former; hereinafter referred to as BF) (See NonPatent Document 1).
BF is a technology for forming directivity by using the time difference between signals arriving at
each microphone.
[0003]
Conventional BFs can be roughly divided into two types: addition and subtraction. In particular,
the subtractive BF has an advantage that directivity can be formed with a smaller number of
microphones than the additive BF. As an apparatus to which the conventional subtraction type BF
is applied, there is one described in Patent Document 1.
[0004]
Hereinafter, a configuration example of a conventional subtractive BF will be described.
[0005]
FIG. 7 is an explanatory view showing a configuration example of the sound collection device PS
to which the conventional subtraction type BF is applied.
[0006]
The sound collection device PS shown in FIG. 7 extracts a target sound (sound in a target
direction) from the output of the microphone array MA configured using two microphones M1
and M2.
[0007]
In FIG. 7, the sound signals captured by the microphones M1 and M2 are shown as x 1 (t) and x
2 (t), respectively.
Also, the sound collection device PS shown in FIG. 7 has a delay device DEL and a subtractor SUB.
11-04-2019
2
[0008]
The delay unit DEL calculates a time difference τ L between the signals x 1 (t) and x 2 (t)
arriving at each of the microphones M1 and M2, and adds a delay to match the phase difference
of the target sound.
Hereinafter, a signal obtained by adding a delay of a time difference τ L to x 1 (t) will be
denoted as x 1 (t−τ L).
[0009]
The delay unit DEL calculates the time difference τ L according to the following equation (1).
In the following equation (1), d represents the distance between the microphones M1 and M2, c
represents the speed of sound, and τ i represents the delay amount. Further, in the following
equation (1), θ L indicates the angle from the vertical direction to the target direction with
respect to the straight line connecting the microphones M1 and M2. τ L = (d sin θ L) / c (1)
[0010]
Here, when the dead angle is in the direction of the microphone M1 with respect to the centers
(midpoints) of the microphones M1 and M2, delay processing is performed on the input signal x
1 (t) of the microphone M1. The subtractor SUB performs processing of subtracting x 1 (t−τ L)
from x 2 (t), for example, by the following equation (2). α (t) = x 2 (t) -x 1 (t-τ L) (2)
[0011]
The subtractor SUB can also perform subtraction processing in the frequency domain. In that
case, the above equation (2) can be expressed as the following equation (3). A (ω) = X 2 (ω) -e <-
11-04-2019
3
j> ωτ <L> X 1 (ω) (3)
[0012]
Here, in the case of θ L = ± π / 2, the directivity formed by the microphone array MA is a
cardioid unidirectivity as shown in FIG. 8A. On the other hand, when θ L = 0, π, the directivity
formed by the microphone array MA is an eight-shaped bi-directional as shown in FIG. 8B.
Hereinafter, a filter that forms unidirectionality from an input signal is referred to as a
unidirectional filter, and a filter that forms bidirectionality is referred to as a bidirectional filter.
In addition, in the subtractor SUB, by using the process of spectral subtraction (hereinafter, also
simply referred to as “SS”), it is possible to form strong directivity in a bi-directional blind spot.
[0013]
When the directivity is formed by SS, the subtractor SUB can perform subtraction processing
using the following equation (4). Although the input signal X 1 of the microphone M1 is used in
the following equation (4), the same effect can be obtained with the input signal X 2 of the
microphone M2. In the following equation (4), β is a coefficient for adjusting the intensity of SS.
The subtractor SUB performs processing (flooring processing) to replace zero or the original
value with a smaller value when the result value of the subtraction processing using the
following equation (4) becomes negative. Good. The subtractor SUB performs a subtraction
process according to the SS method to extract sounds existing outside the direction of the target
area, and the amplitude spectrum of the extracted sound (sounds existing outside the direction of
the target area) is the amplitude of the input signal By subtracting from the spectrum, the target
area sound can be emphasized. | Y (ω) | = | X 1 (ω) | -β | A (ω) | (4)
[0014]
When it is desired to pick up only the sound present in a specific area (hereinafter referred to as
"target area sound") in the conventional sound pickup apparatus, the sound is present around the
target area only by using the subtractive BF. The sound of the sound source (hereinafter referred
to as "non-purpose area sound") may also be collected.
[0015]
11-04-2019
4
Therefore, in Patent Document 1, for example, as shown in FIG. 9, a plurality of microphone
arrays are used to direct directivity to the target area from different directions and to cross the
directivity at the target area to collect the target area sound. A process of making a sound
(hereinafter referred to as "target area sound collection process") has been proposed.
In this method, first, the ratio of the power of the target area sound included in the BF output of
each microphone array is estimated and used as a correction coefficient.
[0016]
FIG. 9 shows an example of the prior art in which a target area sound is picked up using two
microphone arrays MA1 and MA2. When the target area sound whose source is the target area
sound is picked up using the two microphone arrays MA1 and MA2, the correction coefficient of
the target area sound power is, for example, the following (5), (6) or It is calculated by the
following equations (7) and (8).
[0017]
In the above equations (5) to (8), Y 1k (n) and Y 2k (n) are amplitude spectra of BF output of
microphone arrays MA 1 and MA 2, N is the total number of frequency bins, k is frequency α 1 (
n) and α 2 (n) represent power correction coefficients for each BF output. Further, in the
equations (5) to (8), mode represents a mode value, and median represents a median value.
Thereafter, by correcting each BF output with the correction coefficient and performing SS, it is
possible to extract non-target area sound existing in the target direction. Furthermore, the target
area sound can be extracted by SS the extracted non-target area sound from the output of each
BF. In order to extract the non-target area sound N 1 (n) present in the target direction viewed
from the microphone array MA 1, the microphone array from the BF output Y 1 (n) of the
microphone array MA 1 as shown in the following equation (9) The product of the BF output Y 2
(n) of MA 2 and the power correction coefficient α 2 is SS. Similarly, the non-target area sound N
2 (n) present in the target direction viewed from the microphone array MA 2 is extracted
according to the following equation (10). N 1 (n) = Y 1 (n) -α 2 (n) Y 2 (n) (9) N 2 (n) = Y 2 (n) α 1 (n) Y 1 (n) 10)
[0018]
11-04-2019
5
Then, according to the equations (11) and (12), the non-target area sound is SS from each BF
output Y 1 (n) and Y 2 (n), and the target area sound collection signal Z 1 (n), Z 2 ( n) extract. In
the following equations (11) and (12), γ 1 (n) and γ 2 (n) are coefficients for changing the
intensity at SS. Z 1 (n) = Y 1 (n) -γ 1 (n) N 1 (n) (11) Z 2 (n) = Y 2 (n) -γ 2 (n) N 2 (n) 12)
[0019]
As described above, if the technology described in Patent Document 1 is used, sound pickup
processing of the target area sound can be performed even if the non-target area sound exists
around the target area.
[0020]
JP, 2014-72708, A
[0021]
Asano Ta, "Sound Technology Series 16 Array signal processing of sound-Localization, tracking
and separation of sound source", Japan Acoustical Society, edited by Corona, February 25, 2011
[0022]
However, even if the technique described in Patent Document 1 is used, if the background noise
is strong (for example, the target area is a place with many people such as an event hall or a
place where music etc. is flowing around), the target area Due to noises that can not be erased by
sound collection processing, offensive noise such as musical noise is generated.
In the conventional sound pickup apparatus, these abnormal sounds are masked to some extent
by the target area sound, but if the target area sound does not exist, only the abnormal sound is
heard, which may make the listener uncomfortable.
[0023]
Therefore, there is a demand for a sound collection device, program and method for suppressing
the collection of the background noise component even when strong background noise exists
around the sound source of the target sound.
11-04-2019
6
[0024]
According to a first sound collecting apparatus of the present invention, (1) directivity forming
means for forming directivity in the direction of the target area with respect to the output of the
microphone array, and (2) from the output of the directivity forming means Target area sound
extraction means for extracting non-target area sound existing in the direction of the target area
and suppressing the component of non-target area sound extracted from the output of the
directivity forming means to extract the target area sound; 2.) coherence calculation means for
calculating coherence for each frequency from the output of the directivity forming means and
adding coherence of each frequency to calculate a coherence addition value; (4) the coherence
addition value calculated by the coherence calculation means When it is determined that the
target area sound exists by the area sound determination unit that determines the presence or
absence of the target area sound using (5) the area sound determination unit, The target area
sound extraction means outputs the target area sound extracted, and when the area sound
judgment means judges that the target area sound does not exist, the target area sound
extraction means does not output the target area sound extracted And output means.
[0025]
A sound collection program according to a second aspect of the present invention comprises a
computer, (1) directivity forming means for forming directivity in the direction of a target area
with respect to an output of a microphone array, and (2) the directivity forming means Target
area sound extraction means for extracting non-target area sound present in the direction of the
target area from the output, and suppressing the component of non-target area sound extracted
from the output of the directivity forming means to extract the target area sound (3) coherence
calculation means for calculating the coherence of each frequency from the output of the
directivity forming means and adding the coherence of each frequency to calculate the coherence
addition value; (4) the coherence calculated by the coherence calculation means An area sound
judging means for judging presence / absence of a target area sound using an addition value, and
(5) a target area sound exists by the area sound judging means If it is determined, the target area
sound extraction means outputs the extracted target area sound, and if the area sound
determination means determines that the target area sound does not exist, the target area sound
extraction means extracts It is characterized in that it functions as an output unit that does not
output the target area sound.
[0026]
A third aspect of the present invention is a sound collection method performed by a sound
collection device, comprising (1) directivity forming means, target area sound extracting means,
coherence calculating means, area sound judging means, and output means; The directivity
11-04-2019
7
forming unit forms directivity in the direction of the target area with respect to the output of the
microphone array, and (3) the target area sound extracting unit generates the directivity of the
target area from the output of the directivity forming unit. Extracting the non-target area sound
present in the target area, suppressing the component of the non-target area sound extracted
from the output of the directivity forming unit to extract the target area sound, and (4) the
coherence calculating unit The coherence of each frequency is calculated from the output of the
forming means, the coherence of each frequency is added to calculate the coherence addition
value, and (5) the area sound judging means calculates by the coherence calculating means The
presence or absence of the target area sound is determined using the added coherence addition
value, and (6) the output means determines the target area sound when it is determined that the
target area sound exists by the area sound determination means. The target area sound extracted
by the extraction means is output, and when it is determined that the target area sound does not
exist by the area sound determination means, the target area sound extracted by the target area
sound extraction means is not output. Do.
[0027]
According to the present invention, even when strong background noise exists around the sound
source of the target sound, it is possible to suppress the collection of the background noise
component.
[0028]
It is the block diagram shown about the functional composition of the sound collection device
concerning a 1st embodiment.
It is explanatory drawing shown about the example of the positional relationship of the
microphone which comprises the microphone array which concerns on 1st Embodiment.
It is explanatory drawing shown about the directional characteristic which the sound collection
apparatus which concerns on 1st Embodiment forms using a microphone array.
It is explanatory drawing which shows the example of the positional relationship of the
microphone array which concerns on 1st Embodiment, and the object area.
11-04-2019
8
It is explanatory drawing which showed the time change of the coherence addition value of the
input sound in which a target area sound and a non-target area sound exist.
It is the block diagram shown about the functional composition of the sound collection device
concerning a 2nd embodiment.
In the conventional sound collection apparatus, it is a figure which shows the directivity
characteristic formed by a subtraction type beam former using two microphones.
It is explanatory drawing explaining an example of the directional characteristic formed of the
conventional directional filter. In the conventional sound collection apparatus, it is explanatory
drawing shown about the structural example at the time of aiming the directivity by the
beamformer (BF) of two microphone arrays from a different direction to the target area.
[0029]
(A) First Embodiment A first embodiment of a speech processing apparatus, program and method
according to the present invention will be described in detail with reference to the drawings.
[0030]
(A-1) Configuration of First Embodiment FIG. 1 is a block diagram showing a functional
configuration of the sound collection device 100 of the first embodiment.
[0031]
The sound collection device 100 performs target area sound collection processing for collecting
a target area sound from a sound source of a target area using the two microphone arrays MA1
and MA2.
[0032]
The microphone arrays MA1 and MA2 are disposed at arbitrary places in the air where the target
area exists.
11-04-2019
9
The position of the microphone array MA with respect to the target area may be, for example,
anywhere as long as the directivity of each microphone array MA overlaps only in the target area
as shown in FIG. Also good.
The microphone array MA is composed of two or more microphones 21 and each microphone 21
picks up an acoustic signal.
In this embodiment, three microphones M1, M2, and M3 are arranged in each microphone array
MA. That is, each microphone array MA constitutes a 3-ch microphone array.
[0033]
FIG. 2 is an explanatory view showing the positional relationship between the microphones M1,
M2, and M3 in each microphone array MA.
[0034]
As shown in FIG. 2, in each microphone array MA, two microphones M1 and M2 are arranged to
be horizontal to the direction of the target area, and further orthogonal to a straight line
connecting the microphones M1 and M2 and The microphone M3 is disposed on a straight line
passing through either of the microphones M1 and M2.
At this time, the distance between the microphones M3 and M2 is the same as the distance
between the microphones M1 and M2. That is, it is assumed that the three microphones M1, M2,
and M3 are arranged to be the vertices of a right-angled isosceles triangle.
[0035]
The sound collection device 100 includes data input unit 1 (1-1, 1-2), directivity forming unit 2
(2-1, 2-2), delay correction unit 3, space coordinate data storage unit 4, power correction
coefficient The calculation unit 5, the target area sound extraction unit 6, the coherence
calculation unit 7, and the area sound determination unit 8 are included. Detailed processing of
each functional block constituting the sound collection device 100 will be described later.
11-04-2019
10
[0036]
The sound collection device 100 may be configured entirely by hardware (for example, a
dedicated chip or the like), or may be configured as software (program) for part or all. The sound
collection device 100 may be configured, for example, by installing the sound collection program
according to the embodiment in a computer having a processor and a memory.
[0037]
(A-2) Operation of the First Embodiment Next, the operation (the sound collecting method of the
embodiment) of the sound collecting device 100 of the first embodiment having the
configuration as described above will be described.
[0038]
The data input units 1-1 and 1-2 receive supply of analog signals of the acoustic signals captured
by the microphone arrays MA1 and MA2, respectively, convert the analog signals into digital
signals, and generate the directivity forming unit 2-1. , 2-2.
[0039]
The directivity forming units 2-1 and 2-2 perform processing for forming the directivity of each
of the microphone arrays MA1 and MA2 (forming the directivity of the signals supplied from the
microphone arrays MA1 and MA2). .
[0040]
The directivity forming unit 2 converts each from time domain to frequency domain using fast
Fourier transform.
In this embodiment, each directivity forming unit 2 forms a bi-directional filter with microphones
M1 and M2 arranged in a line orthogonal to the direction of the target area, and the directivity
forming unit 2 forms a bi-directional filter on a line parallel to the target direction. The
microphones M1 and M3 arranged side by side form a unidirectional filter that directs a blind
spot in the target direction.
11-04-2019
11
[0041]
Specifically, the directivity forming unit 2 sets θ L = 0, and performs an operation according to
the above equations (1) and (3) on the outputs of the microphones M 1 and M 2 to obtain a
bidirectional filter. Perform the formation of
Further, the directivity forming unit 2 sets θ L = −π / 2, and performs an operation according
to the above equations (1) and (3) for the outputs of the microphones M1 and M3 to obtain
single directivity. Form a sex filter.
[0042]
FIG. 3 shows, at the output of the microphone array MA, directivity characteristics formed by the
above-mentioned bi-directional filter and uni-directional filter.
In FIG. 3, the hatched area indicates the overlapping portion (overlapping filtered area) of the bidirectional filter and the uni-directional filter described above. As shown in FIG. 3, although the
bi-directional filter and a part of the uni-directional filter overlap, performing this SS makes it
possible to eliminate this overlapping portion. Specifically, the directivity forming unit 2 can
erase the overlapping portion by performing SS in accordance with the following equation (13).
In the following equation (13), A BD represents a bi-directional amplitude spectrum, A UD
represents a uni-directional amplitude spectrum, and A UD 'represents an amplitude spectrum in
which the overlapping components of A UD and A BD are eliminated, respectively. There is. The
directivity forming unit 2 may perform the flooring process when A UD ′ becomes negative as a
result of SS using the following equation (13).
[0043]
Then, the directivity forming unit 2 SS these two directivity A BD and A UD ′ from the input
signal according to the following equation (14), in front of the microphone array MA with respect
to the target direction (target sound It is possible to obtain a signal Y (hereinafter, this signal is
also referred to as “BF output”) which has a sharp directivity only in the direction. In the
following equation (14), X DS represents an amplitude spectrum obtained by adding and
11-04-2019
12
averaging the input signals (outputs of the microphones M1, M2, and M3). Further, in the
following equation (14), β 1 and β 2 are coefficients for adjusting the intensity of SS.
Hereinafter, the BF output based on the output of the microphone array MA1 is represented as Y
1, and the BF output based on the output of the microphone array MA2 is represented as Y 2. Y =
X DS −β 1 A BD −β 2 A UD1 ′ (14)
[0044]
The directivity forming units 2-1 and 2-2 respectively process the BF as described above to form
directivity in the direction of the target area for the microphone arrays MA1 and MA2. In each
directivity forming unit 2, by performing the BF processing as described above, the directivity of
each microphone array MA is formed only in the front, so that the direction opposite to the
target area viewed from the microphone array MA is reversed. Can reduce the influence of
reverberation from the In addition, each directivity forming unit 2 performs BF processing as
described above, thereby suppressing non-target area sound positioned behind each microphone
array in advance and improving the SN ratio of target area sound collection processing. can do.
[0045]
The space coordinate data storage unit 4 holds position information of all target areas (position
information of the range of the target area) and position information of each microphone array
MA (position information of each microphone 21 constituting each microphone array MA) doing.
The specific format and display unit of the position information stored in the space coordinate
data storage unit 4 is not limited as long as the relative positional relationship of the target area
and each microphone array MA can be recognized.
[0046]
The delay correction unit 3 calculates and corrects a delay generated due to a difference in
distance between the target area and each microphone array MA.
[0047]
The delay correction unit 3 first obtains the position of the target area and the position of each
microphone array MA from the position information held in the space coordinate data storage
11-04-2019
13
unit 4 and the arrival time of the target area sound to each microphone array MA Calculate the
difference of
Next, the delay correction unit 3 adds a delay so that the target area sounds reach all the
microphone arrays MA simultaneously based on the microphone array MA arranged farthest
from the target area. Specifically, the delay correction unit 3 performs a process of adding a
delay to one of Y 1 and Y 2 to make the phases coincide.
[0048]
The power correction coefficient calculation unit 5 calculates a correction coefficient for making
the power of the target area sound component included in each BF output (Y 1, Y 2) the same
level. Specifically, the power correction coefficient calculation unit 5 calculates the correction
coefficient according to the above (5), (6) or the above (7), (8).
[0049]
The target area sound extraction unit 6 corrects each of the BF outputs Y 1 and Y 2 with the
correction coefficient calculated by the power correction coefficient calculation unit 5.
Specifically, the target area sound extraction unit 6 corrects each of the BF outputs Y 1 and Y 2
according to the equations (9) and (10) to obtain N 1 and N 2 after correction.
[0050]
In addition, the target area sound extraction unit 6 SS non-target area sound (noise) using N 1
and N 2 corrected by the correction coefficient, and the target area sound collection signals Z 1
and Z 2 I get a signal) that was picked up area sound. Specifically, the target area sound
extraction unit 6 SS obtains Z 1 and Z 2 according to the above equations (11) and (12).
[0051]
Next, the processing outline of the coherence calculation unit 7 and the area sound
determination unit 8 will be described.
11-04-2019
14
[0052]
In the sound collection device 100, the coherence calculation unit 7 calculates the coherence
between each BF output in order to determine whether the target area sound is present.
Coherence is a feature that indicates the relationship between two signals, and takes a value
between 0 and 1. This value indicates that the closer to 1, the stronger the relationship between
the two signals.
[0053]
For example, as shown in FIG. 9, when there is a sound source in the target area, the target area
sound is commonly included in each BF output, so the coherence of the target area sound
component becomes large. Conversely, when the target area sound does not exist in the target
area (when the sound source does not exist), the non-target area sounds included in each BF
output are different from each other, so the coherence is reduced. Further, since the two
microphone arrays MA1 and MA2 are separated, the background noise component in each BF
output is also different and the coherence is reduced. Due to this feature, when all the coherences
determined at each frequency are added, a large difference is generated between when the target
area sound is present and when it is not present.
[0054]
The time change of the value which added the coherence in case a target area sound and two
non-target area sounds actually exist is shown in FIG. The waveform W1 in FIG. 5 is a waveform
of an input sound in which all sound sources are mixed. The waveform W2 in FIG. 5 is the
waveform of the target area sound in the input sound. Furthermore, W3 in FIG. 5 indicates the
coherence addition value. As shown in FIG. 5, it is understood that the coherence addition value
is large in the section in which the target area sound exists. Therefore, in the sound collection
device 100, the area sound determination unit 8 determines the coherence addition value with a
preset threshold, and when it is determined that the target area sound does not exist, an output
from which the target area sound is extracted (hereinafter, A process of outputting a sound with
silence or a reduced gain of the input sound is performed without outputting area sound output
data.
11-04-2019
15
[0055]
Next, an example of specific processing of the coherence calculation unit 7 will be described.
[0056]
The coherence calculation unit 7 obtains the BF outputs Y 1 and Y 2 of each microphone array
from the directivity forming units 2-1 and 2-2, calculates the coherence for each frequency, and
adds the coherence for all frequencies. To obtain the coherence addition value.
[0057]
For example, the coherence calculation unit 7 calculates the coherence in accordance with Y 1
and Y 2 using the following equation (15).
Then, the coherence calculation unit 7 adds the coherences calculated according to the following
equation (16).
[0058]
The coherence calculation unit 7 uses the phase of the input signal of each of the microphone
arrays MA as phase information of the BF outputs Y 1 and Y 2 required when calculating the
coherence.
At this time, the coherence calculation unit 7 may limit the frequency band. For example, the
coherence calculating unit 7 may acquire the phase of the input signal of the microphone array
MA by limiting to a frequency band (for example, a range of about 100 Hz to 6 kHz) in which
audio information is sufficiently included.
[0059]
In the following equations (15) and (16), C represents coherence. In the following equations (15)
11-04-2019
16
and (16), P y1y2 represents the cross spectrum of the BF outputs Y 1 and Y 2 of each
microphone array. Furthermore, in the following equations (15) and (16), P y1y1 and P y2y2
represent power spectra of Y 1 and Y 2 respectively. Furthermore, in the following equations
(15) and (16), m and n respectively represent the lower limit and the upper limit of the
frequency. Moreover, in the following (15) and (16) Formula, H represents the value which added
the coherence of each frequency.
[0060]
The coherence calculator 7 may use past information as Y 1 and Y 2 used to calculate the cross
spectrum and the power spectrum. In this case, Y 1 and Y 2 can be obtained by the following
equations (17) and (18), respectively. In the following equations (17) and (18), α is an arbitrary
coefficient that determines how much past information is used, and the value is between 0 and 1.
For α, it is necessary to obtain a suitable value in advance by experiment or the like and set it in
the coherence calculation unit 7. Y 1 (t) = αY 1 (t) + (1−α) Y 1 (t−1) (17) Y 2 (t) = αY 2 (t) +
(1−α) Y 2 (t− 1) ... (18)
[0061]
Next, an example of specific processing of the area sound determination unit 8 will be described.
[0062]
The area sound determination unit 8 compares the coherence addition value calculated by the
coherence calculation unit 7 with a preset threshold to determine whether an area sound is
present or not.
When it is determined that the target area sound exists, the area sound determination unit 8
outputs the target area sound pickup signal (Z 1, Z 2) as it is, and when it is determined that the
target area sound does not exist, the target area sound collection Silence data (for example,
preset dummy data) is output without outputting sound signals (Z 1, Z 2). The area sound
determination unit 8 may output a signal obtained by weakening the gain of the input signal
instead of the silent data. Furthermore, the area sound determination unit 8 determines that the
target area sound exists regardless of the coherence addition value in a few seconds after the
coherence addition value is larger than the threshold by a predetermined amount or more
(processing corresponding to the hangover function) ) May be added.
11-04-2019
17
[0063]
The format of the signal output from the area sound determination unit 8 is not limited. For
example, the target area sound collection signals Z 1 and Z 2 may be output based on the outputs
of all the microphone arrays MA. Alternatively, only a part of the target area sound pickup signal
(for example, one of Z 1 and Z 2) may be output.
[0064]
(A-3) Effects of the First Embodiment According to the first embodiment, the following effects
can be achieved.
[0065]
In the sound collection device 100 according to the first embodiment, a section in which the
target area sound is present and a section in which the target area sound is not present are
determined, and in the non-existing section, the area pickup sound is not output. Suppress the
generation of sound.
Further, in the sound collection device 100 according to the first embodiment, when the
coherence addition value is determined by a preset threshold and it is determined that the target
area sound does not exist, an output from which the target area sound is extracted (hereinafter
Output a sound with silence or a reduced gain of the input sound without outputting data
(referred to as “area sound output”).
As described above, in the sound collection device 100 according to the first embodiment, the
presence or absence of the target area sound is determined, and the area sound output data is
not output when the target area sound is not present. It is possible to suppress the generation of
abnormal noise when there is no area sound.
[0066]
(B) Second Embodiment A second embodiment of the speech processing apparatus, program and
method according to the present invention will be described in detail with reference to the
drawings.
11-04-2019
18
[0067]
(B-1) Configuration and Operation of Second Embodiment FIG. 6 is a block diagram showing a
functional configuration of the sound collection device 100A of the second embodiment.
[0068]
The sound collection device 100A of the second embodiment is different from the first
embodiment in that a noise suppression unit 9 is added.
The noise suppressing unit 9 is inserted between the directivity forming units 2-1 and 2-2 and
the delay correcting unit 3.
[0069]
The noise suppression unit 9 uses the determination result of the area sound determination unit
8 (the detection result of the section in which the target area sound exists) to output the BF
output Y 1 output from the directivity forming units 2-1 and 2-2. , Y 2 (the BF output results of
the microphone arrays MA 1 and MA 2), noise (sounds other than the target area sound) are
suppressed and supplied to the delay correction unit 3.
[0070]
The noise suppression unit 9 uses the result of the area sound determination unit 8 as in voice
activity detection (VAD) to adjust the noise suppression processing.
Usually, when noise suppression is performed in the sound collection device, the input signal is
discriminated into a voice section and a noise section using VAD, and learning is performed in
the noise section to form a filter.
When the non-target area sound of the input signal is voice, it is determined to be a voice section
in the normal VAD processing, but in the determination of the area sound determination unit 8 of
this embodiment, the sound other than the target area sound is voice. Even treated as noise.
Therefore, the noise suppression unit 9 uses the determination result of the area sound
11-04-2019
19
determination unit 8 to detect the target area sound section (section in which the target area
sound exists) and the non-purpose area sound section (the target area sound does not exist). ,
The section where only the sound of the non-purpose area exists). For example, the noise
suppression unit 9 can recognize a sound section in a section other than the target area sound
section as a non-target area sound section. Then, the noise suppression unit 9 recognizes the
non-target area sound section as a noise section, and performs the filter learning and the
adjustment of the filter gain by the same processing as the existing VAD.
[0071]
For example, when it is determined that the target area sound does not exist, the noise
suppression unit 9 can perform further filter learning. In addition, when the target area sound
does not exist, the noise suppression unit 9 may intensify the filter gain as compared to the case
where it exists.
[0072]
The determination received from the area sound determination unit 8 by the noise suppression
unit 9 is the processing result immediately before (time-series processing result of n-1) in time
series, but the current processing result (n time series) It is also possible to receive the
processing result (1), perform the noise suppression process, and perform the area sound
collection process again. As the noise suppression method, various methods such as SS, Wiener
filter, and Minimum Mean Square Error-Short Time Spectral Amplitude (MMSE) method can be
used.
[0073]
(B-2) Effects of the Second Embodiment According to the second embodiment, the following
effects can be achieved in addition to the effects of the first embodiment.
[0074]
In the second embodiment, by providing the noise suppression unit 9, it is possible to collect the
target area sound with higher accuracy than in the first embodiment.
[0075]
11-04-2019
20
Further, since the noise suppression unit 9 can perform noise suppression processing using the
determination result (non-target area sound section) of the area sound determination unit 8, it is
possible to collect the target area sound more than the conventional noise suppression
processing. Suitable noise suppression can be performed.
[0076]
(C) Other Embodiments The present invention is not limited to the above-described embodiments,
and may include modified embodiments as exemplified below.
[0077]
(C-1) In each of the above embodiments, an acoustic signal acquired and acquired by the
microphone is processed in real time, but the acoustic signal acquired and acquired by the
microphone is stored in a storage medium, and then stored. It may be read from the medium and
processed to obtain an emphasis signal of the target sound and the target area sound.
As described above, when the storage medium is used, the place where the microphone is set and
the place where the target sound and the target area sound are extracted may be separated.
Similarly, even when performing real-time processing, the place where the microphone is set and
the place where extraction processing of the target sound and the target area sound may be
separated, so that the signal is supplied to a remote place by communication. Also good.
[0078]
(C-2) Although the microphone array MA used in the above-described sound collection device
has been described as a 3-ch microphone array, a 2-ch microphone array (a microphone array
including two microphones) may be applied.
In this case, the directivity forming process by the directivity forming unit can be replaced with
various existing filter processes.
11-04-2019
21
[0079]
(C-3) In the above sound collecting apparatus, the configuration for collecting the target area
sound from the outputs of the two microphone arrays has been described, but the target area
sound is collected from each of the outputs of the three or more microphone arrays It is good
also as composition.
In that case, the coherence calculation unit 7 may calculate the coherence addition value by
matching the phases of the BF outputs of all the microphone arrays.
[0080]
100 ... sound collecting device, 1, 1-1, 1-2 ... data input unit 1, 2, 2-1, 2-2 ... directivity forming
unit, 3 ... delay correction unit, 4 ... space coordinate data storage unit, 5 Power correction
coefficient calculation unit 6 target area sound extraction unit 7 coherence calculation unit 8
area sound determination unit MA, MA1, MA2 microphone array M, M1, M2, M3 microphone.
11-04-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
34 Кб
Теги
description, jp2016127458
1/--страниц
Пожаловаться на содержимое документа