Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2017503388 The method includes estimating spatial coherence between a first diffuse sound portion in a first microphone signal and a second diffuse sound portion in a second microphone signal. The first microphone signal is captured by a first microphone, and the second microphone signal is captured by a second microphone spaced apart from the first microphone in a known manner. The method further comprises defining a linear constraint on the filter coefficients of the diffuse sound filter, the linear constraint being based on spatial coherence. The method also includes calculating at least one of signal statistics and noise statistics for the first microphone signal and the second microphone signal. The method also includes determining the filter coefficients of the diffuse sound filter by solving an optimization problem for at least one of signal statistics and noise statistics, taking into account linear constraints on the filter coefficients. [0005] A method is provided that includes defining linear suppression for the filter coefficients of the diffuse sound filter. Linear suppression is based on spatial coherence between the first diffuse sound portion in the first microphone signal and the second diffuse sound portion in the second microphone signal. The first microphone signal is captured by a first microphone, and the second microphone signal is captured by a second microphone spaced apart from the first microphone in a known manner. The method also includes at least one of an incoming direction of at least one direct sound, signal statistics on the first microphone signal and the second microphone signal, and noise statistics on the first microphone signal and the second microphone signal. Including calculating one. The method determines the filter coefficients of the diffuse sound filter by solving an optimization problem for at least one of the incoming direction of at least one direct sound, the signal statistics and the noise statistics, taking into account the linear suppression to the filter coefficients. Further include. [0007] A further embodiment provides an apparatus comprising a linear suppression calculator 11-04-2019 1 configured to define linear suppression to the filter coefficients of the diffuse sound filter. Linear suppression is based on spatial coherence between the first diffuse sound portion in the first microphone signal and the second diffuse sound portion in the second microphone signal. The first microphone signal is captured or captured by a first microphone and the second microphone signal is captured by a second microphone spaced apart from the first microphone in a known manner Or is captured. The apparatus may also include at least one direct sound incoming direction, signal statistics for the first microphone signal and the second microphone signal, and noise for the first microphone signal and the second microphone signal and the second microphone signal. Also included is a statistical calculator configured to calculate at least one of the statistics. The apparatus determines the filter coefficients of the diffuse sound filter by solving an optimization problem for at least one of the incoming direction of at least one direct sound, the signal statistics and the noise statistics, taking into account the linear suppression to the filter coefficients. The filter further comprises a filter coefficient calculator configured as follows. [0008] Embodiments are based on the insight that a diffuse sound filter can be determined, taking into account at least one linear suppression related to the diffuse sound portion of the microphone signal. [0020] A simple way to find an appropriate filter is to calculate the weight wm so that L plane waves are suppressed while the stationary noise Xn (k, n, dm) contained in the microphone signal is minimized. It is. Expressed mathematically, the filter weights are given by [0021] Here, n n is the PSD matrix (power spectral density matrix) of stationary noise, ie, which can be estimated, for example, using known techniques in the absence of diffuse or direct sound. Besides, al is a so-called propagation vector. The element is the relative transfer function of the lth plane wave from the m-th microphone to the other microphones. Thus, al is a column vector of length M (only the diffuse sound at the m th microphone is estimated by w m of the M microphone signals, ie weighted linear combinations and the diffuse sounds at the other microphones are Recall that it is substantially redundant, since the signal of f (m) can be related via the relative transfer function from the mth microphone to the other microphones and can be calculated this way if needed). The element of al depends on the DOA of the l-th plane wave. This means that al is a function of the l-th plane wave DOA, ie, al = f (nl). Since al depends on direct sound (ie, plane wave), it will be called direct sound suppression in the following. This spatial filter substantially creates a beamforming period having a sound collection pattern with zeros in the direction of the L plane waves. As a result, all plane waves are suppressed. Unfortunately, solving the minimization problem results in a zero weight wm, since only zero constraint is present, ie diffuse sound can not be extracted. 11-04-2019 2 [0023] An example of the resulting sound collection pattern is shown in FIG. Here, two direct sounds come from the azimuths 51 ° and 97 °. This figure shows the resulting sound collection pattern at a frequency of 2.8 kHz when using a uniform linear array with 16 microphones with 5 cm microphone spacing. This sound collection pattern holds exactly zero for 51 ° and 97 ° and holds a high gain for 180 °, which corresponds to direction n0. Moreover, the sound collection pattern has a plurality of other spatial zero values or low gains for almost all other directions. This sound collection pattern is not suitable for capturing diffuse sound coming from all directions. Again, note that direct sound suppression al directly relates to direct sound DOA. A desired sound collection pattern that can not be achieved with the spatial filter in this subsection is shown in FIG. This sound collection pattern has two spatial zero values for the direct sound DOA, but in other aspects is nearly omnidirectional. This sound collection pattern is achieved by using the proposed filter described below in connection with FIG. [0024] A closed-form solution for computing the filter weights wm given the above constraints can be found in [VanTrees 2002]. In order to calculate the spatial filter, we have to know the DOA of L plane waves, ie we have to calculate the direct sound suppression al and a0. This DOA information may be determined using known narrowband DOA estimators, such as Root MUSIC or ESPRIT. Further note that a0 generally needs to be recalculated for each k and n, since the elements of a0 are generally complex numbers and the DOA of the plane wave must be assumed to be highly time-variant I want to be Varying a0 can lead to audible artifacts. [0025] An exemplary system for extracting diffuse sound using the presented multi-channel filter is shown in FIG. After transforming the microphone signal into the time frequency domain, we estimate the stationary noise and the DOA of L plane waves. Then, from the DOA information, M + 1 linear direct sound suppressions (al and a0) are obtained. Based on this information, filter weights can be calculated. Applying these weights to the microphone signal provides the desired estimate of the diffuse sound. From this description it is clear that the resulting filter depends only on the direct sound (ie respectively the corresponding relative transfer function of the plane wave between the DOA and the microphone) and not the diffuse sound. This means that the filter does not take into account any information that may be available about diffuse sound, even if it is used to estimate the diffuse sound. [0028] The weight vector wm proposed below is minimized as in the multi-channel filter described above, minimizing the specific cost function. 11-04-2019 3 [0029] However, in contrast to the multi-channel filter described above, the present invention proposes to use linear suppression that is not dependent on direct sound (i.e. L plane waves). More precisely, the proposed new constraints are not respectively a function of the DOA of the plane wave or the corresponding relative transfer function of the plane wave between the microphones. [0031] The proposed spatial filter is obtained by minimizing the specific cost function while satisfying distortion free constraints on the diffuse sound. This constraint corresponds to the relative transfer function of the diffuse sound between the microphones. Expressed mathematically, the filter is calculated as in response to linear suppression (several 10) w <H> bm (k, n) = 1. [0032] Here, J is the cost function to be minimized by the filter. The cost function may be, for example, stationary noise power at the filter output, interference energy at the filter output, or a squared error of the estimated diffuse sound. An example of J is given in these embodiments. The suppression vector bm is such that bm (k, n) = [B1, m (k, n), B2, m (k, n),. . . , BM, m (k, n)] <T>. The m'th element Bm ', m is the relative transfer function of the diffuse sound between the microphones m and m'. This relative transfer function is given by: [0035] This corresponds to the so-called spatial coherence of the diffuse sound between the microphones m and m 'with a relative transfer function γ m', m. Spatial coherence is defined as follows, where (·) * indicates complex conjugation. This spatial coherence describes the correlation of the diffuse sound between the microphones m and m 'in the frequency domain. This coherence depends on the particular diffuse sound field. The coherence can be measured in advance for a given room. Alternatively, the coherence is known from the theory of particular diffuse sound fields [Elko 2001]. For example, for a spherical isotropic diffuse sound field that can often be assumed in practice, where: sinc denotes the sink function, f is the acoustic frequency of a given frequency band k, c Is the speed of speech. Furthermore, .gamma.m ', m is the distance between the microphones m and m'. When spatial coherence is used as linear suppression Bm ', m, which represents the average relative transfer function of the diffuse sound between microphones, the resulting filter is equivalent to the sum of many linearly suppressed spatial filters, and Each captures a different realization of random diffuse sound without distortion. [0036] The diffuse noise suppression introduced above provides a spatial filter that captures diffuse noise equally well from all directions. This is in contrast to the multi-channel filter described above, which mainly captures speech from one direction, ie the direction to which the 11-04-2019 4 selected propagation vector a0 corresponds. [0037] Note that diffusive sound suppression bm is conceptually quite different from direct sound suppression al and a0. Therefore, the new filters proposed in this section are conceptually quite different compared to the multi-channel filters described above. [0038] The proposed invention is shown in block form in FIG. First, M microphone signals are transformed 101 into a time frequency domain (or another domain suitable for signal processing) using a filter bank (FB). Second, at block (102), calculate the linear diffuse sound suppression vector bm. The diffuse sound suppression vector is either estimated from the signal or corresponds, for example, to the theoretical spatial coherence of a particular hypothesized diffuse field as described above. At block (104), certain statistics (eg, noise statistics) are estimated from the microphone signal. This information, usually represented as the PSD matrix ((k, n), is used to generate a cost function J which must be minimized by the filter. Filter weights are calculated in block (103) that receive the diffuse noise suppression and minimize the cost function. Finally, weights are applied to the microphone signal to provide the desired diffuse sound estimate. Specific implementations of the invention are presented in the following embodiments. [0039] Minimizing Output Power Satisfying Diffuse Sound Suppression In this embodiment, we define a spatial filter that is subject to diffuse sound suppression to minimize the overall output of the filter. Diffuse sound suppression ensures that the diffuse sound is retained by the spatial filter while the remaining signal parts (undesired stationary noise and plane waves) are minimized. The filter weight wm is calculated as in response to linear suppression (Equation 17) w <H> bm (k, n) = 1. [0041] Where xx is the PSD matrix of the microphone signal, which can be calculated as (Equation 20) xx (k, n) = E {x (k, n) x <H> (k, n)} Where x (k, n) is a vector containing the microphone signal. In practice, this prediction is approximated by, for example, time averaging. Furthermore, the suppression vector bm (k, n) = [B1, m (k, n), B2, m (k, n),. . . , BM, m (k, n)] <T> is the spatial coherence of the diffuse sound between the microphones m and m ′, that is, (Equation 21) Bm ′, m (k, n) = γm ′ , M (k, n). [0044] This embodiment is shown in block form in FIG. After transforming the microphone signal using the filter bank (101), the signal statistics estimation block (104) calculates the signal PSD matrix Φ x. Moreover, at block (102), a linear diffuse sound suppression vector bm is calculated, 11-04-2019 5 either from this signal or using a priori information assuming a specific diffuse sound field. The filter weights are then calculated in block (103). By multiplying these weights with the microphone signal, a desired estimate of diffuse sound is provided. [0047] Linear Suppressed Minimum Variance Filter This embodiment represents a combination of the new approach and the current state of the art approach of the multi-channel filter described above in connection with FIG. In this embodiment, a linear suppression spatial filter is defined that is subject to diffusion and additional directional constraints to minimize stationary noise at the filter output. The filter weight wm is calculated as in response to linear suppression (Equation 24) w <H> bm (k, n) = 1 and. [0048] Clearly, the filter minimizes only stationary noise at the output. Undesired plane waves are suppressed using a second linear suppression (as described above for the multi-channel filter of FIG. 2). Compared to the output power minimization filter according to FIG. 3, these additional constraints ensure even stronger suppression of the interfering plane wave. The resulting filter still retains the diffuse noise due to the first linear suppression. The closed-form solution for this filter, which can actually be calculated, is given by: [0049] Here, the vector C = [bm, a1, a2,. . . , AL] is a constraint matrix including linear suppression as defined above, and g = [1, O] (O is a 0 vector of length L) is the corresponding response. For the multi-channel filter shown in FIG. 2, the vector al depends on the DOA of the plane wave of L and can be calculated as known from the cited document [VanTrees 2002], in contrast the elements of bm are between microphones Describes the correlation or coherence of the diffuse sound of The elements of bm are calculated as described in connection with FIG. Moreover, n n is the PSD matrix of stationary noise. This PSD matrix can, for example, be estimated while speech is interrupted. If the stationary noise in the different microphones are independent of one another, one can simply replace Φ n with an M × M sized identity matrix. [0050] This embodiment is shown in block form in FIG. After transforming the microphone signal using the filter bank (101), the noise statistics estimation block (104) calculates the PSD matrix n n of the stationary noise. Moreover, in block (102), the linear diffuse sound suppression bm is calculated either from this signal or using a priori information assuming a specific diffuse sound field. In block (105), DOA of L plane waves are estimated. From this information, direct sound suppression al is calculated in block (106). The calculated information is provided to the filter calculation block (103), which uses the closed form solution presented above to calculate the filter weights wm. By multiplying these weights with the microphone signal, a desired estimate of diffuse sound is provided. 11-04-2019 6 [0052] The filters calculated in this embodiment have the following advantages compared to other spatial filters (e.g. the filters described in the background art). Plane waves are strongly attenuated due to direct sound suppression. A nearly omnidirectional sound collection pattern that is desired to capture diffuse sound. [0059] This embodiment is shown in block form in FIG. After transforming the microphone signal using the filter bank (101), in block (104), calculate the microphone PSD matrix Φx and the noise PSD matrix nn. Moreover, in block (102), the linear diffuse sound suppression bm is calculated either from this signal or using a priori information assuming a specific diffuse sound field. In block (105), DOA of L plane waves are estimated. From this information, direct sound suppression al is calculated in block (106). These constraints are used with Φ n at (107) to calculate the weight w1. The power of the diffuse sound φ d is calculated from w 1 and n n at (108). The final weight wm of the spatial filter can then be calculated at (103) using φ d, Φ x and b m. The parameter α can be used to scale the spatial filter between the MMSE filter and the PMWF. Multiplying the weight wm with the microphone signal provides the desired estimate of the diffuse sound. [0063] The following list gives a brief overview of some of the above mentioned aspects. Receive at least two microphone signals. The microphone signal is transformed into the time frequency domain or another suitable domain. Compute linear diffuse sound suppression as a function of the correlation or coherence of the diffuse sound between microphones. Calculate signal statistics and / or noise statistics. [0064] In some embodiments, the DOA of the direct sound is estimated and a direct sound suppression representing the relative transfer function of the direct sound between the microphones is calculated. [0065] In some embodiments, an auxiliary filter is calculated to estimate the power of the diffuse sound. By taking the diffuse sound suppression into account, calculate the spatial filter weights for extracting the diffuse sound using the signal / noise statistics obtained and the optional diffuse sound power information. The linear combination of microphone signals is implemented using the calculated spatial filter weights. 11-04-2019 7

1/--страниц