close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2013179388

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2013179388
Abstract: The present invention provides an acoustic signal emphasizing device that reproduces
an acoustic signal of a sound source by accurately obtaining an estimated value of a direct ratio
of an acoustic signal. A direct sound direction power estimation unit estimates the power of a
direct sound direction signal obtained by performing a process of passing only a signal
component coming from a direct sound source direction by a predetermined beam former
realized by a microphone array. Get the value. The reverberation direction power estimation unit
directly generates a direct source direction by using a signal component having the same
directivity shape as that of the above-described beamformer and in which the main beam
direction directly avoids the source direction. A power estimate value of the reverberation
direction signal obtained by processing to pass through the signal components arriving from
other sources is obtained. The in-between ratio estimation unit uses the power estimation value
of the frequency domain signal and the power estimation value of the reverberation sound
direction signal, and represents the ratio of the power estimation value of the direct sound to the
power estimation value of the reverberation sound direction signal Get DRR. [Selected figure]
Figure 6
Acoustic signal enhancement device, perspective determination device, method thereof and
program
[0001]
TECHNICAL FIELD The present invention relates to a technique for estimating the direct ratio of
an acoustic signal.
[0002]
11-04-2019
1
In the prior art shown in Patent Document 1, the sound reception signal of the microphone array
is converted to the frequency domain to obtain the in-between ratio, and the power of the direct
sound and the indirect sound is calculated using the spatial correlation matrix obtained from the
signal. (For example, see paragraphs [0034] to [0061] of the first embodiment).
[0003]
JP, 2009-201724, A
[0004]
In the method disclosed in Patent Document 1, it is not possible to distinguish between direct
sound and indirect sound coming from the same direction, so all sounds coming from the
direction of the direct sound are judged to be direct sounds.
As a result, there is a problem that the direct sound power is overestimated (or the indirect sound
power is underestimated), and the finally obtained ratio of in-between is larger than the true
value.
[0005]
The present invention has been made in view of such problems, and distinguishes reverberant
sound coming from the direction of direct sound, and estimates the direct sound power and the
reverberant sound power, so that it is more true than the conventional method. An acoustic
signal emphasizing device and a perspective judgment device which reproduces an acoustic
signal of a sound source with high accuracy by obtaining a direct-to-reverteration energy ratio
(DRR) close to the value and based on the accurate direct-to-revolution energy ratio And, it aims
to provide those methods and programs.
[0006]
An acoustic signal enhancement device according to the present invention includes a received
sound power estimation unit, a direct sound direction power estimation unit, a reverberation
direction power estimation unit, a subtraction unit, a direct ratio calculation unit, and a target
signal adjustment unit.
11-04-2019
2
The received sound power estimation unit obtains a power estimated value of the frequency
domain signal using a frequency domain signal obtained by converting a received sound signal
received by a plurality of microphones included in the microphone array into a frequency
domain.
The direct sound direction power estimation unit is configured to perform a process of mainly
passing a signal component that has directly arrived from the sound source direction to the
frequency domain signal, and to the power estimation value of the direct sound direction signal
or the sound reception signal obtained. A power estimation value of a direct sound direction
signal obtained by converting a signal that has been processed to mainly pass signal components
that have arrived directly from the sound source direction into a frequency domain is obtained.
The reverberation direction power estimation unit mainly passes signal components that arrive
directly from other than the sound source direction, with the same directivity shape as the
processing that mainly passes signal components that arrive from the direct sound source
direction of the direct sound direction power estimation unit. The power estimation value of the
reverberation direction signal obtained by performing the processing, or the signal obtained by
performing processing for passing the signal component mainly coming from other than the
direct sound source direction to the sound reception signal is converted into the frequency
domain Obtain a power estimate of the reverberant sound direction signal obtained. The
subtractor outputs a direct sound power estimated value obtained by subtracting the power
estimated value of the reverberation direction signal from the power estimated value of the direct
sound direction signal. The inter-area ratio calculation unit uses the power estimation value of
the frequency domain signal and the power estimation value of the reverberation sound direction
signal, and represents the ratio of the power estimation value of the direct sound to the power
estimation value of the reverberation sound direction signal Get The target signal adjustment unit
obtains a processed signal by multiplying the processing target signal obtained from the sound
reception signal by the gain according to the directness ratio estimated value. Then, the gain
multiplied by the processing target signal whose ratio represented by the in-between ratio
estimated value is larger than a predetermined threshold is larger than the gain multiplied by the
processing target signal whose ratio is smaller than the predetermined threshold.
[0007]
Further, in the distance determination device of the present invention, the same as the abovedescribed acoustic signal enhancement device, the received sound power estimation unit, the
11-04-2019
3
direct sound direction power estimation unit, the reverberation sound direction power estimation
unit, the subtraction unit, and the ratio calculation And a distance determination unit. The
distance determination unit includes a determination value corresponding to the estimated
directness ratio value obtained based on the sound reception signal received in the judgment
section including one or more frames, and a judgment value more than the judgment section. The
direct determination in the determination section is performed by comparison and determination
using a plurality of reference values corresponding to the plurality of closeness ratio estimated
values obtained based on the sound reception signal received in the reference section including
the number of frames. Perform distance determination of sound source.
[0008]
The sound signal emphasizing device according to the present invention emphasizes the sound
signal of the sound source by using the estimated quotient ratio obtained by the method relating
to the estimated quotient ratio according to the present invention. The direct ratio estimation
method is a new method that focuses on the isotropy of the arrival direction due to the strong
diffusivity of the reverberation, and two or more beamformers with the same directivity shape
realized by the microphone array Of the signals coming from the direct sound direction, the
direct sound and the reverberation sound are distinguished, and their respective powers are
correctly estimated. As a result, it is possible to improve the estimation accuracy of the direct
ratio, which makes it possible to accurately emphasize the sound signal of the sound source.
[0009]
Further, since the distance determination device of the present invention determines the distance
between the sound sources of sounds having different sounding times based on the estimated
value of the estimated ratio obtained by the method of estimating the estimated ratio of the
present invention Can also make an accurate judgment.
[0010]
The figure which shows an example of the scene which utilizes the acoustic signal emphasis
apparatus 400. FIG.
The figure which shows the propagation path of the sound indoors. The figure which shows the
relationship between the ratio between direct and the distance between microphones. The figure
11-04-2019
4
which shows notionally the principle corresponding to each Example. It is a figure which shows
two beamformers which have the same directivity shape, and the main beam was turned to a
different direction, (a) is a beamformer which directed the beam to a sound source direction, (b)
points a null to a sound source direction Shows a beamformer. FIG. 2 is a diagram showing an
example of a functional configuration of the acoustic signal enhancement device 400 of the first
embodiment. FIG. 7 is a diagram showing an operation flow of the acoustic signal enhancement
device 400. FIG. 7 is a diagram showing an example of a functional configuration of a processing
target signal generation unit 43. FIG. 7 is a diagram showing an example of a functional
configuration of an in-between ratio calculation unit 44. FIG. 7 is a diagram showing an example
of a functional configuration of a direct-to-interior ratio calculation unit 44 ′. The figure which
shows typically the example of the directivity shape of each reverberation directivity formation
part 44311-4431N. FIG. 7 is a diagram showing an example of a functional configuration of a
direct-to-interior ratio calculation unit 44 ′ ′. FIG. 7 is a diagram showing an example of a
functional configuration of a distance determination device 130 according to a second
embodiment. The figure which shows the experimental condition of an effect confirmation
experiment. The figure which shows the simulation result of direct ratio estimation. FIG. 2 is a
diagram showing an example of a functional configuration of a direct-to-right ratio estimation
device 160
[0011]
Hereinafter, embodiments of the present invention will be described with reference to the
drawings. The same reference numerals are given to the same components in the drawings, and
the description will not be repeated. Also, in the following description, the symbols “¯”, “^”,
etc. used in the text should be written directly above the letter immediately after the letter, but
due to the restriction of the text notation, just before the letter Described in. In the formula, these
symbols are described at their original positions.
[0012]
Before describing the embodiments, the principle corresponding to each embodiment will be
described. [Principle] The acoustic signal enhancement device according to the first embodiment
uses a single microphone array to enhance or suppress only the sound within a specific distance
range from the microphone array to collect the sound of the sound source within a
predetermined range. The purpose is to sound. The distance determination device of the second
embodiment determines the distance between the sound source position of the sound reception
signal.
11-04-2019
5
[0013]
FIG. 1 illustrates a scene in which the acoustic signal enhancement device 400 of the first
embodiment is used. For example, assume that a small microphone array 11 is being surrounded
by four speakers 12 to 14, for example. It is assumed that a television 16, a telephone 17, and a
speaker 18 for in-house broadcasting are arranged in the conference room. In such a scene, the
utterers 12 to 14 are positioned within a predetermined distance range (in a circle indicated by a
broken line) centering on the small microphone array 11 without collecting the sound of the
indoor broadcast, the sound of a telephone call, etc. Want to pick up only the utterances of
[0014]
Therefore, in order to identify the distance from the microphone array to the sound source,
attention is focused on the ratio of direct sound to indirect sound (also referred to as
reverberation) included in the received sound (hereinafter, referred to as direct ratio). FIG. 2
shows a propagation path of sound from the sound source 21 to the microphone 22 when the
microphone is placed indoors and sound is recorded. The direct sound is a sound wave indicated
by a thick solid line which directly reaches from the sound source 21 to the microphone. One
reverberation sound is a sound wave indicated by a broken line which reaches the microphone
22 after the sound emitted from the sound source 21 is reflected by a wall, a floor, a ceiling or
the like.
[0015]
FIG. 3 shows the relationship between the in-plane ratio and the distance between microphones.
The horizontal axis in FIG. 3 is the distance from the microphone to the sound source, and the
vertical axis is the direct ratio. In general, indirect sound exhibits a constant magnitude that does
not depend on the distance from the microphone. With respect to the indirect sound, the direct
sound exhibits a monotonically decreasing characteristic as the distance from the microphone
increases. The direct ratio divided by the indirect sound by the indirect sound becomes a
characteristic that monotonously decreases with the increase of the distance as the direct sound.
[0016]
11-04-2019
6
From this direct ratio, it is possible to estimate a predetermined distance range around the
microphone array 11. Therefore, it becomes possible to emphasize only the acoustic signal from
a desired sound source by using this direct ratio.
[0017]
FIG. 4 conceptually illustrates the principle of the direct ratio estimation of the present invention.
It is generally known that reverberation can be assumed to be diffuse when reverberation is
sufficient, and that reverberation can be modeled as sound arriving with the same magnitude
from all directions when viewed from a microphone. When an arbitrary beam former BF1 is
applied to the output signal of the small microphone array 11, the reverberation direction power
23 can be received with a predetermined directivity shape D1. Three arrows of the reverberation
direction power 23 schematically represent the magnitude of the reverberation obtained by the
directional shape D1.
[0018]
Assuming that the position of the sound source 21 is already known, the direct sound power 25
directly coming from the sound source 21 to the small microphone array 11 has the directivity
shape D0 of the beam former BF0 as D1 and the directivity direction is By setting it as the sound
source 21 direction, the direct sound direction power 26 including the reverberation direction
power having the same magnitude as the reverberation direction power 23 can be received.
[0019]
The direct sound power 25 can be obtained by subtracting the reverberation direction power 23
from the direct sound direction power 26 including the same reverberation component as the
reverberation direction power 23.
Next, this principle will be described theoretically.
[0020]
11-04-2019
7
<Isotropic Arrival Model of Reverberation> In the proposed method, a model considering the
isotropy of reverberation is introduced. Here, although an example using a power spectral
density or its estimated value as a power estimated value is described, this does not limit the
present invention.
[0021]
When the sound reception signal at the m-th microphone of the microphone array consisting of
M (M22) microphones is converted into the frequency domain by short-time Fourier transform
or the like, the following frequency domain signal X <(m)> (ω, t) is obtained.
[0022]
[0023]
Where ω is the frequency, HD <(m)> (ω) is the transfer function of direct sound from the sound
source to the mth microphone, and HR <(m)> (ω) is the mth microphone from the sound source
S (ω, t) is a signal obtained by converting the sound of the sound source into the frequency
domain.
t is the index of the time frame.
[0024]
Here, it is assumed that direct sound is coherent, while indirect sound is diffuse because its main
component is reverberation.
That is, when focusing on each direction of arrival, the direct sound only comes from the
direction of the sound source, while the indirect sound has the property of coming with uniform
power from all directions (hereinafter referred to as isotropy). In the proposed method, the direct
sound power and the indirect sound power are estimated by focusing on the difference in these
spatial arrival characteristics to obtain the directness ratio.
11-04-2019
8
[0025]
As a precondition, the arrival direction of direct sound (hereinafter referred to as “direct sound
source direction”) is known, and direct sound and indirect sound coming from any direction can
be regarded as plane waves, and direct sound and indirect sound are Do not correlate with each
other. At this time, transfer functions HD <(m)> (ω) and HR <(m)> (ω) of the direct sound and
indirect sound from the sound source to the m-th microphone can be expressed as follows.
[0026]
[0027]
Where HDref (ω) is the direct sound component of the transfer function from the sound source
to the reference point (referred to as the “reference point”) of the microphone array, and
HRref, θ (ω) is an indirect sound component in the direction θ viewed from the reference point
is there.
The reference point may be inside the microphone array or at any position of the microphone of
the microphone array.
[0028]
Each of the transfer functions HD <(m)> (ω) and HR <(m)> (ω) of direct and indirect sounds are
transfer function components from the sound source to the reference point, and from the
reference point to the m th microphone And the phase difference component due to the
propagation delay of Therefore, a microphone array input vector <→> x (ω, t) = [X <(1) whose
element is frequency domain signal X <(m)> (ω, t) (m ∈ {1, ..., M}) )> (Ω, t),..., X <(M)> (ω, t)] <T>
is represented by the following formula. T represents transposition.
[0029]
11-04-2019
9
[0030]
However, SD (ω, t) = HDref (ω) S (ω, t), SR, θ (ω, t) = HRref, θ (ω) S (ω, t).
<→> aθ (ω) is an array manifold vector in the θ direction expressed by equation (5). Each
element of the array manifold vector depends on the propagation delay τθ <(m)>. When direct
sound and indirect sound can be regarded as plane waves, the propagation delay τθ <(m)>
depends on the relative position and direction θ of each microphone with respect to the
reference point of the microphone array. For details of the array, manifold, and vector, for
example, reference 1: "Asano Ta," Array signal processing of sound-localization, tracking and
separation of sound source (The Acoustical Society of Japan, Acoustic Technology Series) ",
Corona Co., Ltd. Co., February 25, 2011, ISBN 978-4-339-01116-6, Chapter 1 (P1-26).
[0031]
When any beamformer (BF) is applied to this microphone array input, its output power spectral
density (PSD) can be converted to a beamformer to the output power spectral density (PSD) of
each of direct sound and indirect sound shown in equation (6) The power gain of (BF) | Dθ (ω) |
<2> is the sum of multiplication.
[0032]
[0033]
However, PD (ω) = E [| SD (ω, t) | <2>] t, PR, θ (ω) = E [| SR (ω, t) | <2>] t, <→> w (Ω) is the
filter coefficient of the beamformer (BF), and R (ω) is a microphone array having Rij (ω) = E [Xi
(ω, t) Xj <*> (ω, t)] t in the ij component It is an input signal space correlation matrix.
E [•] represents expected value calculation.
[0034]
In the sound field where it is assumed that the indirect sound arrives isotropically in Equation
(6), the reverberant sound power PR, θ (ω) is a constant not depending on the direction θ. It
can be replaced by PR (ω), and the output power spectral density can be expressed by equation
11-04-2019
10
(7).
[0035]
[0036]
Here, assuming that there are two beam formers BF0 and BF1 having the same directivity shape
as shown in FIG. 5 and the main beams directed in different directions, the second term of the
right side of equation (7) Dθ | Dθ (Ω) | <2> dθ is equal, and the output of each beamformer
changes only by the first term of the right side, ie, the power gain of the beamformer for direct
sound.
[0037]
Therefore, the output power spectral density P1 (ω) of the beam former BF1 directed null (point
with low directivity sensitivity) in the sound source direction from the output power spectral
density P0 (ω) of the beam former BF0 which directed the beam toward the sound source The
direct sound power 25 can be determined by subtraction.
[0038]
[0039]
According to the above principle, reverberant sounds coming directly from the sound source
direction can be distinguished, and as a result, it is possible to improve the estimation accuracy
of the direct ratio.
[0040]
FIG. 6 shows a functional configuration example of the acoustic signal enhancement device 400
of the first embodiment.
The operation flow is shown in FIG.
The acoustic signal enhancement apparatus 400 includes a microphone array 41, a plurality of
11-04-2019
11
frequency domain conversion units 421 to 42M, a processing target signal generation unit 43, an
inter-area ratio calculation unit 44, a target signal adjustment unit 45, and inverse frequency
domain conversion. And 46.
Each functional component except the microphone array 41 is realized by, for example, a
predetermined program being read into a computer including a ROM, a RAM, a CPU, and the like,
and the CPU executing the program.
[0041]
The microphone array 41 comprises a plurality of microphones m1,.
A plurality of frequency domain conversion units 421, ..., 42M receive the received signals xm (n)
received by the plurality of microphones m1, ... mM, respectively, and convert the respective
received signals into signals in the frequency domain ( Step S42).
The frequency domain conversion units 421, ..., 42M sample the sound reception signal xm (n)
at, for example, a sampling frequency of 16 kHz and convert it into a digital signal. For example,
256 samples are made into one frame, and discrete Fourier transform is performed in each
frame. The conversion is performed to output the frequency component Xm (ω, t) (step S42).
ω is a frequency and t is a frame number.
The A / D converter for converting the sound reception signal xm (n) into a digital signal is
omitted.
[0042]
The processing target signal generation unit 43 combines the signals Xm (ω, t) in the frequency
domain output by the plurality of frequency domain conversion units 421, ..., 42 M to generate
the processing target signal Y (ω, t) (step S43).
[0043]
11-04-2019
12
The ratio calculation unit 44 receives the signal Xm (ω, t) in the frequency domain output by the
plurality of frequency domain conversion units 421, ..., 42 m as an input ratio estimate value DRR
(ω, t) of the reception signal Is calculated (step S44).
A detailed operation description of the direct-to-interior ratio calculation unit 44 will be
described later.
[0044]
The target signal adjustment unit 45 receives the processing target signal Y (ω, t) and the inbetween ratio estimated value DRR (ω, t) as inputs, and adjusts the amplitude of the processing
target signal Y (ω, t) according to the values. The post-processing signal Z (ω, t) is generated
(step S45).
[0045]
The inverse frequency domain conversion unit 46 converts the processed signal Z (ω, t) into a
time domain signal z (n) (step S46).
The operations from step S41 to step S46 are continued until all the sound reception signals xm
(n) are finished.
[0046]
Here, adjustment according to the value of the in-between ratio estimate value DRR (ω, t) means
threshold processing of DRR (ω, t) or the amplitude of the processed signal Z (ω, t) as the value
is larger. And the processing such as reducing the amplitude of the processed signal Z (ω, t) as
the value thereof becomes larger. Details will be described later.
[0047]
11-04-2019
13
By the above operation, for example, only the sound within a specific distance range is
emphasized by the microphone array, and the noise removal is performed to suppress and collect
the sound outside the range. Hereinafter, the present invention will be described in more detail
by showing a more specific functional configuration example of each part.
[0048]
[Processing Target Signal Generating Unit] FIG. 8 shows a more specific functional configuration
example of the processing target signal generating unit 43. The processing target signal
generation unit 43 includes a plurality of weight multiplication units 4311 to 431 M and an
addition unit 432. The plurality of weight multiplying units 4311 to 431 M weight the respective
frequency components X1 (ω, t),..., XM (ω, t) of the plurality of received signals xm (n) received
by the M microphones. Multiply the coefficient wm (ω).
[0049]
For the weights used by the weight multiplying means 4311 to 431 M, for example, if M
microphones are nondirectional, all frequency components X1 (ω, t),. The processing target
signal Y (ω, t) is stabilized by taking an average of ω, t). Also, when M microphones have
directivity, it is possible to use only the signal of a specific microphone by setting w1 = 1, wm = 0
(m = {2,..., M}) . For example, if the filter coefficients of weight beamforming are used by using the
method as described in reference 2 “Oga, Yamazaki, Kanada,“ Acoustic system and digital
signal processing ”published by the Institute of Electronics, Information and Communication
Engineers” The microphone array can also form any directivity.
[0050]
The addition means 432 adds all the frequency components X1 (.omega., T),..., XM (.omega., T)
multiplied by the weight, and outputs a processing target signal Y (.omega., T).
[0051]
A microphone may be installed separately from the microphone array at a position close to the
sound source without using the addition means, and the sound collection signal of the installed
microphone may be used as the processing target signal Y (ω, t).
11-04-2019
14
[0052]
[In-Plane Ratio Calculation Unit] FIG. 9 shows an example of a functional configuration of the inplane ratio calculation unit 44.
The in-between ratio calculating unit 44 includes a received sound power estimating unit 441, a
direct sound direction power estimating unit 442, a reverberant sound direction power
estimating unit 443, a subtracting unit 444, and an in-between ratio calculating unit 445.
[0053]
The reception sound power estimation unit 441 converts the reception signals received by the
plurality of microphones included in the microphone array 41 into the frequency domain to
obtain frequency domain signals X1 (ω, t), ..., XM (ω). , T) to generate and output a power
estimation value of a frequency domain signal corresponding to a received signal.
This power estimation value is a power estimation value of the frequency domain signal Xm (ω,
t) corresponding to any one microphone m (mε {1,..., M}) as in equation (9). Alternatively, the
power estimated values of the frequency domain signals X 1 (ω, t),..., X M (ω, t) may be weighted
and averaged as shown in the equation (10). In the first embodiment, the power spectral density
PX, L (ω) is determined as the power estimation value of the frequency domain signal
corresponding to the sound reception signal.
[0054]
[0055]
Here, L is the number of frames, and α m is a non-negative weight to the microphone m which is
set to become equation (11).
E [•] represents expected value calculation.
11-04-2019
15
[0056]
[0057]
Direct sound direction power estimation unit 442 is a direct signal obtained by performing
processing for passing only signal components that directly come from the sound source
direction to frequency domain signals X 1 (ω, t),..., X M (ω, t). Estimated power value PDD (ω) of
the sound direction signal or a direct sound direction obtained by converting a signal obtained by
processing only the signal component that has directly arrived from the sound source direction
to the sound reception signal into the frequency domain Obtain a power estimate PDD (ω) of the
signal.
The power PDD (ω) of the direct sound direction signal is the same as P0 (ω) in the abovementioned equation (8).
[0058]
The direct sound direction power estimation unit 442 includes a directivity forming unit 4421
and a power estimation unit 4422. The directivity forming unit 4421 forms directivity so that a
directional beam is directed in a predetermined direction, and outputs a signal that has passed
the directivity. The directivity of the directivity forming unit 4421 is set so that the main beam of
directivity is directed to the direct sound direction. As a method of directivity formation, for
example, the delay-sum beam described in reference 1 (by Asano Ta, "Array signal processing of
sound-localization, tracking and separation of sound sources" Corona, pp. 70-79)) A method such
as forming can be used.
[0059]
When the output of the directivity forming unit 4421 is expressed as YBF (ω, t), the power
estimated value PDD (ω) of the direct sound direction signal output from the power estimating
unit 4422 is obtained by Expression (12).
[0060]
11-04-2019
16
[0061]
Also, the output power spectral density of the power estimation value PDD (ω) of the direct
sound direction signal is expressed by equation (13).
[0062]
[0063]
Here, | D0θ (ω) | <2> corresponds to the power gain of the beam former BF0 described in FIG.
[0064]
The reverberation direction power estimation unit 443 mainly has the same directivity shape as
the process of mainly passing the signal component coming from the direct sound source
direction of the direct sound direction power estimation unit 442 the signal component coming
from other than the direct sound source direction. Power estimation value of reverberated sound
direction signal obtained by processing to pass through, or signal obtained by processing to pass
a signal component that comes from other than direct sound source direction directly to the
sound receiving signal is converted to frequency domain To obtain a power estimate of the
reverberant sound direction signal obtained.
[0065]
Ideally, the reverberation direction power estimation unit 443 includes a reverberation directivity
formation unit 4431 and a reverberation power estimation unit 4432.
The directivity of the reverberation directivity forming unit 4431 is set so that the main beam of
directivity avoids the direct sound direction.
The directivity shape is set to be the same as the directivity forming unit 4421.
It is desirable to set the directivity shapes of the reverberation directivity forming unit 4431 and
the directivity forming unit 4421 to be the same shape as much as possible.
11-04-2019
17
The setting of the directivity shape can be easily realized by the prior art.
The estimation of the direction of the sound source is described, for example, in chapter 7.2 of
reference 2 “Oga, Yamazaki, Kanada”, “Acoustic system and digital signal processing”
published by the Institute of Electronics, Information and Communication Engineers.
[0066]
The reverberation power estimation unit 4432 receives the reverberation sound received so as to
avoid the direct sound direction, and outputs a power estimation value PRD (ω) of the
reverberation sound direction signal (Equation 14).
The power estimation value PRD (ω) of the reverberation direction signal is received so as to
avoid the direct sound direction, so by setting | D1, θD | <2> << 1, the direct sound component |
D0, θ (ω) | <2> PD (ω) becomes sufficiently small.
[0067]
[0068]
Here, | D1 (ω) | <2> corresponds to the power gain of the beam former BF1 described in FIG.
[0069]
From the power estimation value PDD (ω) of the direct sound direction signal output by the
direct sound direction power estimation unit 442, the subtraction unit 444 calculates the power
estimation value PRD (ω) of the reverberation sound direction signal output by the reverberation
power estimation unit 4432. The direct sound power estimated value ^ PD (ω) resulting from the
subtraction is output (equation (15)).
[0070]
[0071]
11-04-2019
18
The denominator of Expression (15) is used to normalize the direct sound power estimated value
^ PD (ω) by the difference between the power gain of each of the beam formers (BF) of the
directivity forming unit 4421 and the reverberation directivity forming unit 4431. It is a term.
[0072]
The inter-period ratio calculation unit 445 uses the power spectral density PX, L (ω) and the
direct sound power estimated value ^ PD (ω) output from the received sound power estimation
unit 441, and calculates the direct sound power estimated value ^ PD (ω). Then, an in-between
ratio estimated value DRR (ω) which is a ratio of power of the power estimated value of the
reverberation direction signal is obtained (Equation (16)).
[0073]
[0074]
Further, if the received sound power output from the received sound power estimation unit 441
is expressed by the equation (9) corresponding to any one microphone m (mε {1,. Can also be
estimated by equation (17).
[0075]
[0076]
Further, the in-plane ratio can also be estimated by equations (18) and (19) as the in-plane ratio
not depending on the frequency.
In addition, since it is a value calculated | required for every frame number L, although it
describes with DRR ((omega)), the value calculated | required for every frequency for every one
frame is described with DRR ((omega), t).
[0077]
[0078]
11-04-2019
19
The above-described direct-to-right ratio estimation method is a new method that focuses on the
isotropic signal arrival of reverberant sound to a microphone array since it is a signal with strong
diffusion.
The direct sound component and the indirect sound component are correctly separated by
obtaining a signal including direct sound and reverberation and a signal including only
reverberation by two beamformers having the same directivity shape realized by the microphone
array As a result, it is possible to improve the estimation accuracy of the direct ratio.
[0079]
Equations (16), (17), (18), and (19) may be estimated ratio DRR not represented in decibels as
follows.
[0080]
[0081]
[Modification 1] FIG. 10 shows an example of the functional configuration of the direct-to-IR ratio
calculation unit 44 ′, which is a modification of the functional configuration of the
reverberation sound direction power estimation unit 443 of the direct-inside ratio calculation
unit 44.
The ratio calculation unit 44 'calculates the reverberation sound direction power PRD (ω) by
averaging reverberation sound direction powers PRD1 (ω) to PRDN (ω) of a plurality of (two or
more) directivity directions. It is a thing.
[0082]
The reverberation direction power estimation unit 443 'of the in-between ratio calculation unit
44' includes two or more reverberation directivity formation units 44311 to 4431N, two or more
reverberation power estimation units 44321 to 4432N, and a reverberation direction power
11-04-2019
20
calculation unit. It differs from the ratio calculation unit 44 in that the unit 4433 is provided.
The direction of the main beam of the beam former of the reverberation directivity forming unit
44311 is, for example, the direction θ1 from the reference point.
The direction of the main beam of the beam former of the reverberation directivity forming unit
44312 is the direction θ1, and the direction of the main beam of the beam former of the
reverberation directivity forming unit 4431N is the direction θN.
[0083]
FIG. 11 schematically shows the directivity shape of each reverberation directivity forming unit
44311 to 4431N.
The directivity shape of each of the reverberation directivity forming portions 44311 to 4431 N
is different only in the direction θ of the main beam and the shape is the same.
Reverberation sound power estimated values PRD1 (ω) to PRDN (ω) in each directivity direction
by the reverberation power estimation units 44321 to 4432N connected to the signals that have
passed through the directivity of each reverberation directivity formation unit 44311 to 4431N
Is required.
[0084]
The reverberation direction power calculation unit 4433 calculates a reverberation direction
power PRD (ω) by performing weighted averaging (equation 20) on a plurality of power
estimated values PRD1 (ω) to PRDN (ω).
[0085]
[0086]
11-04-2019
21
Here, β n is a non-negative weighting coefficient, which is set in advance so as to satisfy
equation (21).
Since the reverberation direction power PRD (ω) determined in this manner is a value
determined by averaging reverberation direction powers in a plurality of directions, the accuracy
can be improved.
As a result, the accuracy of the direct-to-right ratio estimated value DRR (ω) can be improved.
[0087]
[Modification 2] FIG. 12 shows a functional configuration example of the direct-to-inside ratio
calculation unit 44 ′ ′ in which the functional configuration of the reverberation sound
direction power estimation unit 443 of the direct-inside ratio calculation unit 44 is changed.
The inter-period ratio calculation unit 44 ′ ′ is configured to be able to automatically set the
direction of the main beam of the beam former of the directivity forming unit 4421 and the
reverberation directivity forming unit 4431.
[0088]
The point-to-point ratio calculation unit 44 ′ ′ differs from the point-to-point ratio calculation
unit 44 in that the point-to-point ratio calculation unit 44 ′ ′ includes a sound source direction
estimation unit 446 and a beamformer generation unit 447.
The sound source direction estimation unit 446 converts the received sound signals received by
the plurality of microphones included in the microphone array 41 into the frequency domain to
obtain frequency domain signals X 1 (ω, t),..., X M (ω, Based on t), the direction of the sound
source is estimated and a sound source direction signal is output.
The direction of the sound source can be determined, for example, from the phase difference of
the frequency domain signals X1 (ω, t),..., XM (ω, t), etc.
11-04-2019
22
[0089]
A beam former generation unit 447 receives a sound source direction signal, generates a beam
former BF0 having a main beam in the sound source direction, and a beam former BF1 in which
the main beam is set so as to avoid the sound source direction. The former BF0 is output to the
direct sound direction power estimation unit 442, and the beam former BF1 is output to the
reverberation sound direction power estimation unit 443.
The directivity forming unit 4421 of the direct sound direction power estimation unit 442
applies the beam former BF0 and outputs the above-mentioned output signal YBF (ω, t).
The reverberation direction power estimation unit 443 applies a beam former BF1 to output
reverberation direction power PRD (ω).
[0090]
Thus, the direct-to-interior ratio calculation unit 44 ′ ′ can automatically set the directivity
shape of the direct sound direction power estimation unit 442 and the reverberation sound
direction power estimation unit 443. In the above, the operation of the direct-ratio-ratio
calculation unit 44, 44 ', 44' 'has been described as an example of operating in the frequency
domain, but the technical concept of the present invention including the modification is applied
as it is to the time domain operation. It is possible. It is also possible to apply the concept of the
direct-to-inside ratio calculation unit 44 ′ ′ to the direct-to-inside ratio calculation unit 44 ′.
[0091]
[Target Signal Adjustment Unit] The target signal adjustment unit 45 receives the processing
target signal Y (ω, t) and the in-range ratio estimated value DRR (ω, t) as input, and generates an
in-between ratio estimated value DRR (ω, t). In response, the amplitude of the processing target
signal Y (ω, t) is adjusted, and the processed signal Z (ω, t) is generated and output. In other
words, the target signal adjustment unit 45 multiplies the processing target signal Y (ω, t) by the
gain (filter coefficient) according to the in-between ratio estimated value DRR (ω, t), and thereby
11-04-2019
23
the processed signal Z (ω , T) are generated and output (step S45).
[0092]
The magnitude of the gain determined in accordance with the in-plane ratio estimate value DRR
depends on what distance range from the microphone array 41 the sound emitted from the
direct sound source is to be enhanced. For example, when emphasizing the sound emitted from
the direct sound source close to the microphone array 41, the ratio of the power estimation value
of the direct sound to the power estimation value of the indirect sound represented by the direct
ratio estimation value DRR is the first value. The gain by which the processing target signal is
multiplied in some cases is larger than the gain by which the processing target signal is
multiplied when the ratio is a second value smaller than the first value. For example, when
emphasizing the sound emitted from the direct sound source far from the microphone array 41,
the ratio of the power estimate of the direct sound to the power estimate of the indirect sound
represented by the direct ratio estimate DRR is the first value. The gain G (ω, t) by which the
processing target signal is multiplied in some cases is smaller than the gain by which the
processing target signal is multiplied when the ratio is a second value smaller than the first value.
[0093]
The target signal adjustment unit 45 can be configured by, for example, a filter coefficient
calculation unit 451 and a multiplication unit 452 (FIG. 6). The filter coefficient calculation unit
45 receives the direct current ratio estimated value DRR (ω, t) as input and calculates and
outputs a filter coefficient G (ω, t). For calculation of the filter coefficient G (ω, t), for example, a
binary filter using a threshold as shown in equations (22) and (23) is used.
[0094]
[0095]
The threshold value Th1 can be set to any value between the minimum value and the maximum
value of the in-between ratio estimated value DRR (ω, t).
11-04-2019
24
When the threshold value Th1 approaches the minimum value (0), the sound quality is improved.
Conversely, when the threshold value Th1 approaches the maximum value, the noise suppression
effect is enhanced, but the distortion of the sound reception signal becomes large and the sound
quality is degraded.
[0096]
As described above, the threshold Th1 has a trade-off relationship between the sound quality and
the noise suppression. Therefore, the threshold value Th1 is empirically determined according to
the purpose of use in consideration of the trade-off relationship.
[0097]
In addition, as shown in the equations (24) and (25) when calculating the filter coefficient G (ω,
t), if the temporal frequency band in which the in-between ratio estimated value falls below the
threshold Th2 is emphasized, a specific distance can be obtained. It is possible to emphasize
sound sources far from the range.
[0098]
[0099]
Although a binary filter of 0 or 1 has been mentioned as an example of the filter coefficient G (ω,
t), the filter coefficient G (ω, t) does not necessarily have to be 0 and 1, for example, 0.1 And 0.9
as long as they have sufficiently different values.
[0100]
Further, one or more real numbers may be set as the filter coefficient G (ω, t).
That is, a gain G (ω, t) for amplifying the processing target signal Y (ω, t) may be determined.
Further, a gain G (ω, t) (for example, a value of 0.1 or less) that largely suppresses the processing
11-04-2019
25
target signal Y (ω, t) may be determined.
Further, instead of determining the gain G (ω, t) by the threshold determination, the estimated
value of the in-between ratio or its function value may be used as the gain G (ω, t). For example,
the gain G (ω, t) may be determined as in the following equations (26) to (29).
[0101]
[0102]
However, F is a function such as a monotonically increasing function or a monotonously
decreasing function.
[0103]
The filter coefficient G (ω, t) thus obtained is multiplied by the processing target signal Y (ω, t)
in the multiplication means 452 to process the processed signal Z (ω, t) = G (ω, t) · Y (ω, t) is
generated.
Therefore, the post-processing signal Z (ω, t) can be composed of only the processing target
signal Y (ω, t) having a large in-between ratio estimated value DRR (ω, t).
That is, only direct sound can be extracted.
[0104]
As a second embodiment, a distance determination device 120 that determines the distance
between sound sources using the estimated direct ratio DRR (ω, t) described in the first
embodiment will be described. FIG. 13 shows a functional configuration example of the distance
determination device 120. The distance determination device 120 includes a microphone array
41, a plurality of frequency domain conversion units 411 to 41m, a distance ratio calculation
unit 44, and a distance determination unit 121. The microphone array 41, the plurality of
frequency domain conversion units 411 to 41m, and the inter-area ratio calculation unit 44 are
11-04-2019
26
the same as those of the noise removal device 400. The perspective determination device 120 is
also realized by, for example, a predetermined program being read into a computer including a
ROM, a RAM, a CPU, and the like, and the CPU executing the program.
[0105]
The perspective determination device 120 determines whether the sound source of the sound
received at a certain time is far or near when the sound sources at a plurality of different
distances sound at different times. The distance determination unit 121 configuring the distance
determination device 120 includes a frequency averaging unit 1210, an accumulation unit 1211,
and a determination unit 1212.
[0106]
The distance determination unit 121 determines the determination value corresponding to the
estimated value of the directness ratio obtained based on the sound reception signal received in
the judgment section including one or more frames, and the number of frames more than the
judgment section. The distance determination of the direct sound source in the determination
section is performed by comparison and determination using reference values corresponding to a
plurality of distance ratio estimated values obtained on the basis of the received signal received
in the reference section.
[0107]
Frequency averaging means 1210 receives estimated range ratio DRR (ω, t) as input, averages
the values in the frequency direction, and outputs estimated range average ratio EtEt (equation
(30)).
[0108]
[0109]
Here, K is the total number of frequency bins of the Fourier transform performed by the
frequency domain transform units 421 to 42M.
[0110]
11-04-2019
27
The accumulation means 1211 accumulates the frequency average in-phase ratio estimated
value EtEt for the past L time frames, and outputs the comparison object in-between ratio
estimated value EE.
For the comparison target area ratio estimated value ^ E, for example, an average value ^ E = 1 /
L 平均 t <L> of the accumulated average value of average frequency ratio ¯Et or an average value
E of the minimum value and the maximum value ^ = 1/2 (max Et Et + min Et Et) or the like is
used.
[0111]
The judging means 1212 compares the frequency average in-plane ratio equivalent value EtEt
with the comparison target in-room ratio equivalent value EE and, when ¯Et> ^ E, indicates that
the distance determination result Yl is close, for example, 1 If <El <^ E, the perspective
determination result Yt outputs, for example, 0 representing that the distance is far.
The perspective determination result Yt indicates whether the sound reception signal for the
latest past L hours is a sound from a relatively near sound source or a sound from a relatively far
sound source.
[0112]
By using this perspective determination result Yt, it is possible to divide the sound reception
signal that is sequentially input according to the distance between the microphone and its sound
source.
That is, the sounds of a plurality of sound sources can be selected according to the distance from
the microphone.
[0113]
[Experimental Results] For the purpose of confirming the effects of the present invention,
simulation experiments using the mirror image method were performed.
11-04-2019
28
[0114]
The simulation conditions are shown in FIG.
FIG. 14 is a plan view, assuming a room with a width of 4 m, a depth of 6 m, and a height of 2.7
m.
The sound absorption coefficient of the wall was set to α = 0.05 (reverberation time T60 = 1.8
seconds). The height of the reference point was 1.5 m using a microphone array in which eight
microphones are arranged in a circle. The height of the sound source was also 1.5 m.
[0115]
Under these conditions, FIG. 15 shows the result of comparison between the measured value
DRRactual (□) of DRR estimated from the impulse response, the present invention (▽), and the
conventional method (○). The DRR (▽) estimated by the method of the present invention is
closer to the measured value DRRactual (□) compared to the conventional method, and improves
by about 3 dB particularly when the sound source is at a distance.
[0116]
In general, the power of the indirect component is constant regardless of the distance of the
sound source, while the power of the direct component is inversely proportional to the square of
the distance. Therefore, in the case of a distant sound source, the power of the direct component
becomes smaller than that of the indirect component, and even if the error included in the
estimated direct component is small, the DRR estimation result is greatly affected. According to
the method of the present invention, the directivity control of the microphone array minimizes
the influence of the signal coming from the sound source direction and finds the power of the
indirect sound, so that more accurate estimation becomes possible, and the far sound source is
DRR. Can be estimated correctly.
11-04-2019
29
[0117]
As described above, the new direct-to-right ratio estimation method of the present invention is a
new method that assumes that the reverberation sound arrives isotropically to the microphone
array since it is a signal with strong diffusion. The direction of the sound source is the same as
that of the beamformer in which the directivity shape realized by the microphone array is the
same and the direction of the main beam is directly set to the sound source direction, and the
beamformer set to direct the main beam directly The direct component and the indirect
component coming from can be correctly separated, and as a result, it is possible to increase the
accuracy of the estimated value of the direct ratio.
[0118]
In the above description, the direct ratio estimation method of the present invention has been
described as an example incorporated in the acoustic signal enhancement device 400 or the
distance determination device 130, but as shown in FIG. It may be configured as a directness
ratio estimation device 160 that realizes only the above. In that case, the inter-area ratio
estimation device 160 can be configured by the microphone array 41, a plurality of frequency
domain conversion parts 421 to 42M, and the inter-area ratio calculation part 44.
[0119]
It should be noted that although an example in which the direct current ratio estimated value
DRR is expressed in digital notation is shown in the equations (16) to (19), it is needless to say
that the direct current ratio estimated value may be obtained by the ratio of power spectral
density The value of the DRR represented by the above equation may be multiplied by any
constant as the estimated ratio value, or the reciprocal of the DRR represented by the above
equation may be multiplied by the constant It is good also as a direct ratio estimate. Also, the
constant may be a monotonically increasing function value. That is, the estimated direct ratio
DRR of the present invention is not limited to those represented by the above-described
equations (16) to (19).
[0120]
Note that the processes described in the above method and apparatus are not only performed in
chronological order according to the order of description, but also may be performed in parallel
11-04-2019
30
or individually depending on the processing capability of the apparatus that executes the process
or the need. Good.
[0121]
Further, when the processing means in the above-mentioned device is realized by a computer, the
processing content of the function that each device should have is described by a program.
Then, by executing this program on a computer, the processing means in each device is realized
on the computer.
[0122]
The program describing the processing content can be recorded in a computer readable
recording medium. As the computer readable recording medium, any medium such as a magnetic
recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory,
etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a
flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R
(Recordable) / RW (Rewritable), etc. as magneto-optical recording medium, MO (Magneto Optical
disc) etc., as semiconductor memory, EEP-ROM (Electronically Erasable and Programmable Only
Read Memory) etc. It can be used.
[0123]
Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable
recording medium such as a DVD, a CD-ROM or the like in which the program is recorded.
Furthermore, the program may be stored in a storage device of a server computer, and the
program may be distributed by transferring the program from the server computer to another
computer via a network.
[0124]
Further, each means may be configured by executing a predetermined program on a computer,
11-04-2019
31
or at least a part of the processing content may be realized as hardware.
11-04-2019
32
Документ
Категория
Без категории
Просмотров
0
Размер файла
46 Кб
Теги
description, jp2013179388
1/--страниц
Пожаловаться на содержимое документа