close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2017503388

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2017503388
The method includes estimating spatial coherence between a first diffuse sound portion in a first
microphone signal and a second diffuse sound portion in a second microphone signal. The first
microphone signal is captured by a first microphone, and the second microphone signal is
captured by a second microphone spaced apart from the first microphone in a known manner.
The method further comprises defining a linear constraint on the filter coefficients of the diffuse
sound filter, the linear constraint being based on spatial coherence. The method also includes
calculating at least one of signal statistics and noise statistics for the first microphone signal and
the second microphone signal. The method also includes determining the filter coefficients of the
diffuse sound filter by solving an optimization problem for at least one of signal statistics and
noise statistics, taking into account linear constraints on the filter coefficients.
[0005] A method is provided that includes defining linear suppression for the filter coefficients
of the diffuse sound filter. Linear suppression is based on spatial coherence between the first
diffuse sound portion in the first microphone signal and the second diffuse sound portion in the
second microphone signal. The first microphone signal is captured by a first microphone, and the
second microphone signal is captured by a second microphone spaced apart from the first
microphone in a known manner. The method also includes at least one of an incoming direction
of at least one direct sound, signal statistics on the first microphone signal and the second
microphone signal, and noise statistics on the first microphone signal and the second
microphone signal. Including calculating one. The method determines the filter coefficients of the
diffuse sound filter by solving an optimization problem for at least one of the incoming direction
of at least one direct sound, the signal statistics and the noise statistics, taking into account the
linear suppression to the filter coefficients. Further include.
[0007] A further embodiment provides an apparatus comprising a linear suppression calculator
11-04-2019
1
configured to define linear suppression to the filter coefficients of the diffuse sound filter. Linear
suppression is based on spatial coherence between the first diffuse sound portion in the first
microphone signal and the second diffuse sound portion in the second microphone signal. The
first microphone signal is captured or captured by a first microphone and the second
microphone signal is captured by a second microphone spaced apart from the first microphone
in a known manner Or is captured. The apparatus may also include at least one direct sound
incoming direction, signal statistics for the first microphone signal and the second microphone
signal, and noise for the first microphone signal and the second microphone signal and the
second microphone signal. Also included is a statistical calculator configured to calculate at least
one of the statistics. The apparatus determines the filter coefficients of the diffuse sound filter by
solving an optimization problem for at least one of the incoming direction of at least one direct
sound, the signal statistics and the noise statistics, taking into account the linear suppression to
the filter coefficients. The filter further comprises a filter coefficient calculator configured as
follows.
[0008] Embodiments are based on the insight that a diffuse sound filter can be determined,
taking into account at least one linear suppression related to the diffuse sound portion of the
microphone signal.
[0020] A simple way to find an appropriate filter is to calculate the weight wm so that L plane
waves are suppressed while the stationary noise Xn (k, n, dm) contained in the microphone signal
is minimized. It is. Expressed mathematically, the filter weights are given by
[0021] Here, n n is the PSD matrix (power spectral density matrix) of stationary noise, ie, which
can be estimated, for example, using known techniques in the absence of diffuse or direct sound.
Besides, al is a so-called propagation vector. The element is the relative transfer function of the lth plane wave from the m-th microphone to the other microphones. Thus, al is a column vector of
length M (only the diffuse sound at the m th microphone is estimated by w m of the M
microphone signals, ie weighted linear combinations and the diffuse sounds at the other
microphones are Recall that it is substantially redundant, since the signal of f (m) can be related
via the relative transfer function from the mth microphone to the other microphones and can be
calculated this way if needed). The element of al depends on the DOA of the l-th plane wave. This
means that al is a function of the l-th plane wave DOA, ie, al = f (nl). Since al depends on direct
sound (ie, plane wave), it will be called direct sound suppression in the following. This spatial
filter substantially creates a beamforming period having a sound collection pattern with zeros in
the direction of the L plane waves. As a result, all plane waves are suppressed. Unfortunately,
solving the minimization problem results in a zero weight wm, since only zero constraint is
present, ie diffuse sound can not be extracted.
11-04-2019
2
[0023] An example of the resulting sound collection pattern is shown in FIG. Here, two direct
sounds come from the azimuths 51 ° and 97 °. This figure shows the resulting sound
collection pattern at a frequency of 2.8 kHz when using a uniform linear array with 16
microphones with 5 cm microphone spacing. This sound collection pattern holds exactly zero for
51 ° and 97 ° and holds a high gain for 180 °, which corresponds to direction n0. Moreover,
the sound collection pattern has a plurality of other spatial zero values or low gains for almost all
other directions. This sound collection pattern is not suitable for capturing diffuse sound coming
from all directions. Again, note that direct sound suppression al directly relates to direct sound
DOA. A desired sound collection pattern that can not be achieved with the spatial filter in this
subsection is shown in FIG. This sound collection pattern has two spatial zero values for the
direct sound DOA, but in other aspects is nearly omnidirectional. This sound collection pattern is
achieved by using the proposed filter described below in connection with FIG.
[0024] A closed-form solution for computing the filter weights wm given the above constraints
can be found in [VanTrees 2002]. In order to calculate the spatial filter, we have to know the
DOA of L plane waves, ie we have to calculate the direct sound suppression al and a0. This DOA
information may be determined using known narrowband DOA estimators, such as Root MUSIC
or ESPRIT. Further note that a0 generally needs to be recalculated for each k and n, since the
elements of a0 are generally complex numbers and the DOA of the plane wave must be assumed
to be highly time-variant I want to be Varying a0 can lead to audible artifacts.
[0025] An exemplary system for extracting diffuse sound using the presented multi-channel filter
is shown in FIG. After transforming the microphone signal into the time frequency domain, we
estimate the stationary noise and the DOA of L plane waves. Then, from the DOA information, M
+ 1 linear direct sound suppressions (al and a0) are obtained. Based on this information, filter
weights can be calculated. Applying these weights to the microphone signal provides the desired
estimate of the diffuse sound. From this description it is clear that the resulting filter depends
only on the direct sound (ie respectively the corresponding relative transfer function of the plane
wave between the DOA and the microphone) and not the diffuse sound. This means that the filter
does not take into account any information that may be available about diffuse sound, even if it is
used to estimate the diffuse sound.
[0028] The weight vector wm proposed below is minimized as in the multi-channel filter
described above, minimizing the specific cost function.
11-04-2019
3
[0029] However, in contrast to the multi-channel filter described above, the present invention
proposes to use linear suppression that is not dependent on direct sound (i.e. L plane waves).
More precisely, the proposed new constraints are not respectively a function of the DOA of the
plane wave or the corresponding relative transfer function of the plane wave between the
microphones.
[0031] The proposed spatial filter is obtained by minimizing the specific cost function while
satisfying distortion free constraints on the diffuse sound. This constraint corresponds to the
relative transfer function of the diffuse sound between the microphones. Expressed
mathematically, the filter is calculated as in response to linear suppression (several 10) w <H>
bm (k, n) = 1.
[0032] Here, J is the cost function to be minimized by the filter. The cost function may be, for
example, stationary noise power at the filter output, interference energy at the filter output, or a
squared error of the estimated diffuse sound. An example of J is given in these embodiments. The
suppression vector bm is such that bm (k, n) = [B1, m (k, n), B2, m (k, n),. . . , BM, m (k, n)] <T>.
The m'th element Bm ', m is the relative transfer function of the diffuse sound between the
microphones m and m'. This relative transfer function is given by:
[0035] This corresponds to the so-called spatial coherence of the diffuse sound between the
microphones m and m 'with a relative transfer function γ m', m. Spatial coherence is defined as
follows, where (·) * indicates complex conjugation. This spatial coherence describes the
correlation of the diffuse sound between the microphones m and m 'in the frequency domain.
This coherence depends on the particular diffuse sound field. The coherence can be measured in
advance for a given room. Alternatively, the coherence is known from the theory of particular
diffuse sound fields [Elko 2001]. For example, for a spherical isotropic diffuse sound field that
can often be assumed in practice, where: sinc denotes the sink function, f is the acoustic
frequency of a given frequency band k, c Is the speed of speech. Furthermore, .gamma.m ', m is
the distance between the microphones m and m'. When spatial coherence is used as linear
suppression Bm ', m, which represents the average relative transfer function of the diffuse sound
between microphones, the resulting filter is equivalent to the sum of many linearly suppressed
spatial filters, and Each captures a different realization of random diffuse sound without
distortion.
[0036] The diffuse noise suppression introduced above provides a spatial filter that captures
diffuse noise equally well from all directions. This is in contrast to the multi-channel filter
described above, which mainly captures speech from one direction, ie the direction to which the
11-04-2019
4
selected propagation vector a0 corresponds.
[0037] Note that diffusive sound suppression bm is conceptually quite different from direct
sound suppression al and a0. Therefore, the new filters proposed in this section are conceptually
quite different compared to the multi-channel filters described above.
[0038] The proposed invention is shown in block form in FIG. First, M microphone signals are
transformed 101 into a time frequency domain (or another domain suitable for signal
processing) using a filter bank (FB). Second, at block (102), calculate the linear diffuse sound
suppression vector bm. The diffuse sound suppression vector is either estimated from the signal
or corresponds, for example, to the theoretical spatial coherence of a particular hypothesized
diffuse field as described above. At block (104), certain statistics (eg, noise statistics) are
estimated from the microphone signal. This information, usually represented as the PSD matrix
((k, n), is used to generate a cost function J which must be minimized by the filter. Filter weights
are calculated in block (103) that receive the diffuse noise suppression and minimize the cost
function. Finally, weights are applied to the microphone signal to provide the desired diffuse
sound estimate. Specific implementations of the invention are presented in the following
embodiments.
[0039] Minimizing Output Power Satisfying Diffuse Sound Suppression In this embodiment, we
define a spatial filter that is subject to diffuse sound suppression to minimize the overall output
of the filter. Diffuse sound suppression ensures that the diffuse sound is retained by the spatial
filter while the remaining signal parts (undesired stationary noise and plane waves) are
minimized. The filter weight wm is calculated as in response to linear suppression (Equation 17)
w <H> bm (k, n) = 1.
[0041] Where xx is the PSD matrix of the microphone signal, which can be calculated as
(Equation 20) xx (k, n) = E {x (k, n) x <H> (k, n)} Where x (k, n) is a vector containing the
microphone signal. In practice, this prediction is approximated by, for example, time averaging.
Furthermore, the suppression vector bm (k, n) = [B1, m (k, n), B2, m (k, n),. . . , BM, m (k, n)] <T> is
the spatial coherence of the diffuse sound between the microphones m and m ′, that is,
(Equation 21) Bm ′, m (k, n) = γm ′ , M (k, n).
[0044] This embodiment is shown in block form in FIG. After transforming the microphone signal
using the filter bank (101), the signal statistics estimation block (104) calculates the signal PSD
matrix Φ x. Moreover, at block (102), a linear diffuse sound suppression vector bm is calculated,
11-04-2019
5
either from this signal or using a priori information assuming a specific diffuse sound field. The
filter weights are then calculated in block (103). By multiplying these weights with the
microphone signal, a desired estimate of diffuse sound is provided.
[0047] Linear Suppressed Minimum Variance Filter This embodiment represents a combination
of the new approach and the current state of the art approach of the multi-channel filter
described above in connection with FIG. In this embodiment, a linear suppression spatial filter is
defined that is subject to diffusion and additional directional constraints to minimize stationary
noise at the filter output. The filter weight wm is calculated as in response to linear suppression
(Equation 24) w <H> bm (k, n) = 1 and.
[0048] Clearly, the filter minimizes only stationary noise at the output. Undesired plane waves
are suppressed using a second linear suppression (as described above for the multi-channel filter
of FIG. 2). Compared to the output power minimization filter according to FIG. 3, these additional
constraints ensure even stronger suppression of the interfering plane wave. The resulting filter
still retains the diffuse noise due to the first linear suppression. The closed-form solution for this
filter, which can actually be calculated, is given by:
[0049] Here, the vector C = [bm, a1, a2,. . . , AL] is a constraint matrix including linear
suppression as defined above, and g = [1, O] (O is a 0 vector of length L) is the corresponding
response. For the multi-channel filter shown in FIG. 2, the vector al depends on the DOA of the
plane wave of L and can be calculated as known from the cited document [VanTrees 2002], in
contrast the elements of bm are between microphones Describes the correlation or coherence of
the diffuse sound of The elements of bm are calculated as described in connection with FIG.
Moreover, n n is the PSD matrix of stationary noise. This PSD matrix can, for example, be
estimated while speech is interrupted. If the stationary noise in the different microphones are
independent of one another, one can simply replace Φ n with an M × M sized identity matrix.
[0050] This embodiment is shown in block form in FIG. After transforming the microphone signal
using the filter bank (101), the noise statistics estimation block (104) calculates the PSD matrix n
n of the stationary noise. Moreover, in block (102), the linear diffuse sound suppression bm is
calculated either from this signal or using a priori information assuming a specific diffuse sound
field. In block (105), DOA of L plane waves are estimated. From this information, direct sound
suppression al is calculated in block (106). The calculated information is provided to the filter
calculation block (103), which uses the closed form solution presented above to calculate the
filter weights wm. By multiplying these weights with the microphone signal, a desired estimate of
diffuse sound is provided.
11-04-2019
6
[0052] The filters calculated in this embodiment have the following advantages compared to
other spatial filters (e.g. the filters described in the background art). Plane waves are strongly
attenuated due to direct sound suppression. A nearly omnidirectional sound collection pattern
that is desired to capture diffuse sound.
[0059] This embodiment is shown in block form in FIG. After transforming the microphone signal
using the filter bank (101), in block (104), calculate the microphone PSD matrix Φx and the
noise PSD matrix nn. Moreover, in block (102), the linear diffuse sound suppression bm is
calculated either from this signal or using a priori information assuming a specific diffuse sound
field. In block (105), DOA of L plane waves are estimated. From this information, direct sound
suppression al is calculated in block (106). These constraints are used with Φ n at (107) to
calculate the weight w1. The power of the diffuse sound φ d is calculated from w 1 and n n at
(108). The final weight wm of the spatial filter can then be calculated at (103) using φ d, Φ x
and b m. The parameter α can be used to scale the spatial filter between the MMSE filter and the
PMWF. Multiplying the weight wm with the microphone signal provides the desired estimate of
the diffuse sound.
[0063] The following list gives a brief overview of some of the above mentioned aspects. Receive
at least two microphone signals. The microphone signal is transformed into the time frequency
domain or another suitable domain. Compute linear diffuse sound suppression as a function of
the correlation or coherence of the diffuse sound between microphones. Calculate signal
statistics and / or noise statistics.
[0064] In some embodiments, the DOA of the direct sound is estimated and a direct sound
suppression representing the relative transfer function of the direct sound between the
microphones is calculated.
[0065] In some embodiments, an auxiliary filter is calculated to estimate the power of the diffuse
sound. By taking the diffuse sound suppression into account, calculate the spatial filter weights
for extracting the diffuse sound using the signal / noise statistics obtained and the optional
diffuse sound power information. The linear combination of microphone signals is implemented
using the calculated spatial filter weights.
11-04-2019
7
Документ
Категория
Без категории
Просмотров
0
Размер файла
18 Кб
Теги
description, jp2017503388
1/--страниц
Пожаловаться на содержимое документа