close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2005091560

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2005091560
When separating and extracting a required signal from a plurality of signals, in the independent
component analysis method in the frequency domain and the time domain, there is a problem
that the separation accuracy is lowered due to a phenomenon called Permutation. In particular,
this phenomenon is noticeable when the number of signal sources is smaller than the number of
sensors. Therefore, the present invention aims to solve the problem of performance deterioration
due to the mismatch between the number of signal sources and the number of sensors.
SOLUTION: Independent component analysis (ICA) in a frequency domain (FD) and a time
domain (TD) is sequentially performed, and in particular, a signal identification process in FDICA
is divided into a plurality of subblocks. During this process, the number of signal sources was
estimated, and the results were used to substantially match the number of active sensors with the
number of signal sources. [Selected figure] Figure 1
Signal separation method and signal separation apparatus
[0001]
The present invention relates to a method and apparatus for separating and extracting a required
signal from a plurality of signals detected by a sensor such as a plurality of microphones
(hereinafter referred to as a microphone).
[0002]
When a plurality of signals are mixed and observed, a technique for identifying a source signal
using only the observation signal is called Blind Source Separation (hereinafter referred to as
10-04-2019
1
BSS).
In recent years, a signal separation method based on Independent Component Analysis
(hereinafter referred to as ICA) is mainstream. According to this signal separation method, for
example, a plurality of sound signals are received by K microphones (sensors), and the reception
signals are utilized utilizing that sound signals coming from each sound source are statistically
independent. By processing, it is possible to separate K or less or less sound sources as many as
the microphones. At first, the source separation method using ICA was difficult to apply to the
microphone array because the time difference of the incoming sound from each sound source
was not taken into consideration. However, in recent years, many methods have been proposed
in which a plurality of sound signals are observed using a microphone array in consideration of a
time difference, and an inverse conversion to a mixing process of signals coming from the
plurality of sound sources in a frequency domain is obtained.
[0003]
In general, when sound signals coming from L multiple sound sources are linearly mixed and
observed by K microphones, the observed sound signal can be written as follows at a certain
frequency f.
[0004]
Here, S (f) is a sound signal vector sent from each sound source, X (f) is an observed signal vector
observed by the microphone array which is a sound receiving point, A (f) is each sound source
and a sound receiving point Is a mixing matrix with respect to propagation vectors indicating
propagation characteristics of a spatial acoustic system, and can be written respectively as
follows.
[0005]
[0006]
[0007]
Here, T on the right shoulder denotes transposition of a vector, and the 'symbol denotes an
element (scalar amount) in each matrix.
10-04-2019
2
At this time, if the mixing matrix A (f) is known, using the observed signal vector X (f) at the
sound receiving point,
[0008]
The sound signal vector S (f) transmitted from the sound source can be calculated by calculating
the general inverse matrix of A (f) as follows.
However, in general, the propagation vector A (f) is unknown, and the sound signal vector S (f)
must be obtained by using only the observation signal vector X (f).
[0009]
In order to solve the BSS problem, it is assumed that a sound signal vector S (f) is generated
stochastically, and furthermore, each component of the sound signal vector S (f) is all
independent of each other.
At this time, since the observed signal vector X (f) detected by the microphone is a signal in
which a plurality of sound source signals are mixed, the distribution of each component of the
observed signal vector X (f) is not independent.
Therefore, it is considered to search each sound source signal by ICA from independent
components contained in the observed signal vector X (f), that is, mixed sound source signals.
That is, a matrix W (f) (hereinafter, inverse mixing matrix) for converting the observation signal
vector X (f) into independent components is calculated, and the inverse mixing matrix W (f) is
applied to the observation signal vector X (f). Thus, an approximate signal is obtained for the
sound signal vector S (f) transmitted from the sound source.
[0010]
10-04-2019
3
A method of analyzing in the time domain and a method of analyzing in the frequency domain
have been proposed for processing for obtaining inverse transformation of the mixing process by
ICA. Here, a method of calculating in the frequency domain will be described as an example with
reference to FIG.
[0011]
In FIG. 12, after the incoming signal X (f) from the sound source is detected by the microphones
401 and 402, the orthogonal transform (for example, the short time discrete Fourier transform /
st-DFT in FIG. 12) is used for a short time Perform frame analysis. At this time, by plotting the
complex spectral values at specific frequency bins at the input of one of the microphones 401, it
is considered as a time series. Here, the frequency bin indicates individual complex components
in the signal vector frequency-transformed by the short time discrete Fourier transform.
Similarly, the same operation is performed on the input of the other microphone 402. The timefrequency signal sequence obtained here is
[0012]
It can be written as Next, signal separation is performed using the inverse mixing matrix W (f).
Assuming that a signal separated from the signals input to the microphones 401 and 402 is Y (f,
t), the process of this signal separation is shown as follows.
[0013]
Here, the inverse mixing matrix W (f) is such that the L time series outputs Y (f, t), that is, Y1 ′
(f, t) and Y2 ′ (f, t) become independent of each other. Optimized. These processes are
performed for all frequency bins. Finally, although not shown in FIG. 12, inverse orthogonal
transformation is applied to the separated time series Y (f, t) to reconstruct the sound source
signal time waveform restoration.
[0014]
In the above processing, evaluation of independence and optimization of the inverse mixing
matrix include unsupervised learning algorithms based on Kullback-Leibler divergence
minimization and “second-order or higher-order correlation” in “Non-Patent Document 1”
10-04-2019
4
below. An algorithm has been proposed to decorrelate.
[0015]
Generally, it is known that the method of analyzing in the frequency domain has less
computational complexity and improves the separation performance as compared with the
method of analyzing in the time domain.
However, in the method of analyzing in the frequency domain, a phenomenon (Permutation) may
occur where sound sources analyzed for each frequency alternate in adjacent frequency bins.
[0016]
On the other hand, Nishikawa et al. Aim to set ICA in the frequency domain (hereinafter referred
to as FDICA) to the processing of the former stage and ICA in the time domain (hereinafter
referred to as TDICA) to the processing of the latter stage. The processing method shown in FIG.
13 is proposed in the following "Non-Patent Document 2" as MSICA (Multi Stage ICA) of
multistage connection in which ICAs in a region are connected in series. Nishikawa et al. Pointed
out that in the case where the number of signal sources is two and the number of microphones is
two, the separation accuracy of the target signal is improved compared to the method (FDICA)
only in the conventionally reported frequency domain. However, there have been no reports of
success regarding the case where the number of signal sources and the number of microphones
are two or more.
[0017]
The ICA processing as described above is not limited to sound signal processing, and, for
example, signals that have reached a mixed line in mobile communication etc. are separated
separately, or the brain reported in Non-Patent Document 3 below To separate and extract the
target signal from the measurement signal when the signal generated in various places inside is
measured from the outside using an electrocardiograph, magnetoencephalograph, fMRI
(Functional Magnetic Resonance Imaging), etc. It is also used in
[0018]
10-04-2019
5
"Basics of blind source separation using array signal processing", Technical report of IEICE, EA
2001-7) T. Nishikawa, H. Saruwatari and K. Shikano, "Blind source separation of acoustic signals
based on multistage ICA combining Frequency-domain ICA and Time-domain ICA ", IEICE Trans.
Fundamentals, vol. E84-A, No. 1 Jan 2001) What is Independent Component Analysis? Computer
Today, p 38-43, 198.9, No. 87, "Application to fMRI image analysis" Computer Today, p 60-67,
2001.1 No. 95).
[0019]
As one of the problems of the method of analyzing in the frequency domain, there is a
phenomenon called Permutation in which sound sources analyzed for each frequency are
interchanged in adjacent frequency bins. This phenomenon occurs notably when the number of
signal sources is smaller than the number of sensors, and the separation accuracy of the
separated target signals is significantly reduced. However, it is difficult to always match the
number of signal sources with the number of microphones, and when the system is actually
created, the difference in the number of sound sources causes dispersion of the separation
accuracy of the target signal.
[0020]
In the method of Nishikawa et al. In "Non-Patent Document 2", since TDICA is installed in the
latter stage, an effect can be expected against the Permutation problem that occurs in the former
stage FDICA. However, when the number of signal sources is smaller than the number of sensors,
it is expected that the drop in separation accuracy in the FDICA performed in the previous stage
will be significant because the Permutation problem per frequency bin becomes complicated. In
addition, if the number of signal sources is smaller than the number of sensors, local solutions
are likely to fall into convergence in the optimization learning process of FDICA. As an easy
solution to these problems, the number of signal sources can be estimated by predicting the
number of signal sources and selecting the same number of sensors as the number of signal
sources predicted from a plurality of redundantly arranged sensors. It is conceivable to match the
number of sensors. However, this method is disadvantageous in cost because all sensors can not
be used effectively.
10-04-2019
6
[0021]
Therefore, in the present invention, while ensuring the separation performance of the maximum
target signal by using all the sensors, while constructing a method that does not lower the
separation performance even when the number of signal sources is smaller than the number of
sensors, The purpose is to give a method to realize.
[0022]
In the present invention, independent component analysis (ICA) was adopted to achieve the
above object.
That is, as the first processing step, the wave property signals from the plurality of signal sources
are detected by the plurality of fixed sensors, and the detected signals for each of the plurality of
channels are detected for data detection such as amplification amplification waveform shaping.
After performing signal detection processing, time signal group which is data identified as
parameter value of target signal by dividing into channels for each frequency band and
performing independent component analysis in frequency domain (FDICA) 1 and a plurality of
signal identification processes 1 for transmitting both of the time signal group 2 which is data
identified as an unnecessary signal parameter value. The parameter value corresponds to the
energy contained in the frequency bin corresponding to the output of each element of the matrix
indicating the reception signal from the plurality of sound sources.
[0023]
Next, as a second processing step, the above separated signals are further subjected to
independent component analysis in the time domain (TDICA) to statistically analyze the temporal
characteristics of the above time signals 1 and 2 A signal identification process 2 is provided
which analyzes and separates the target signal of at least one signal source. The signal
identification process 2 also includes a second-order attenuation process for attenuating
parameter values of the above identified unnecessary signal. In particular, in the present
invention, in order to analyze the signals input from the plurality of sensors, the above-described
signal identification process 1 divides the signal identification process 1 into a plurality of
subblocks smaller than the number of all the sensors. When signal groups of a plurality of
channels divided into frequency bands are input, each signal group is configured to be
independently identified and processed in each sub block.
10-04-2019
7
[0024]
A first-order attenuation process that attenuates time signal group 2 identified by using
independent component analysis (FDICA) in the frequency domain and independent component
analysis (TDICA) in the time domain and identifying signals related to unnecessary signal
parameter values in FDICA , TDICA also includes a second-order attenuation process for
attenuating unnecessary signal parameter values identified other than the target signal
parameter value of the signal source, and further, the signal identification process 1 is divided
into sub blocks for signal processing By estimating the number of sound sources from the
received signal, the number of sound sources and the number of sensors for reception can be
made approximately the same, and signal extraction, that is, sound source separation can be
performed with high accuracy.
[0025]
Hereinafter, the basic configuration of the present invention and the operation principle thereof
will be described.
In the present invention, the problem of Permutation is solved by sub-blocking the FDICA
installed at the front stage in the method of Nishikawa et al. (Non-Patent Document 2). First,
multi-stage processing (hereinafter referred to as MSICA) in which processing in the frequency
domain (hereinafter referred to as FDICA) and processing in the time domain (hereinafter
referred to as TDICA) are combined will be described. In the following, when the signals are
represented in the time-frequency domain, L source signal vectors SL (f, m), observed signal
vectors X K (f, m), and output signals vector Z L (f of FDICA) with respect to the input and output
signals. , m) can be displayed by appending the symbol 'to the elements constituting the matrix
indicating each vector
[0026]
[0027]
[0028]
The relationship between the source signal vector SL (f, m) and the observed signal vector X K (f,
m) is
10-04-2019
8
[0029]
Given by
Here, AKL (f) is a mixing matrix of K rows and L columns giving spatial propagation
characteristics of the signal, m is a frame number in short-time discrete Fourier (st-DFT) analysis,
and f is a frequency.
In the processing procedure of MSICA, first, FDICA processing is performed on the observation
signal.
The output signal ZL (f, m) of FDICA is generated using the separation matrix VLL (f) which
relates the signals between the input and output.
[0030]
Obtained as In FDICA, VLL (f) is optimized so that L output signals become independent of each
other for each frequency f as in the processing between W (f) and Y (f, t) in FIG.
[0031]
Second, the individual output signals after source separation by FDICA in the frequency domain
[0032]
Is regarded as the input signal of the next stage, TDICA, and the processing of TDICA is executed.
However, t represents time, and F <-1> [] represents the inverse discrete Fourier transform for
the equation in []. The output signal yL (t) of the final separated signal TDICA is
10-04-2019
9
[0033]
として
[0034]
Given by
Here, wLL (τ) is a separation filter matrix having an FIR filter as an element, and Q is a filter
length. In TDICA, wLL (τ) is optimized so that L output signals are independent of each other.
[0035]
Embodiment 1 In the present invention, the observed signals obtained from the K microphones
are considered as a set of L (<K) observed signals, and the subblocks (FDICA1, FDICA2,..., Shown
in FIG. 1) are considered. It is considered as FDICAN). Then, N subblocks are configured, and
FDICA is performed in each subblock. The separation process of FDICA in the n-th subblock can
be expressed by the following equation (12) if the subblock number n is attached with
superscripts with parentheses.
[0036]
It can be shown by. However,
[0037]
[0038]
である。
10-04-2019
10
Next, the output signals of the N subblocks are regarded as the input signal of the next stage
TDICA, and the processing of TDICA is performed. The separation process of TDICA is
[0039]
Given by However,
[0040]
[0041]
And w L (L × N) (τ) is a separation filter matrix of L rows × N columns.
The separation filter matrix w L (L × N) (τ) of TDICA is optimized using the following iterative
learning.
[0042]
Here, w <i> L (L × N) (τ) is the i-th separation filter matrix, and α is the step size of iterative
learning. (Equation 22), although the number of times of learning may be fixed, by terminating
learning when the identification level 2 exceeds a certain threshold, convergence time is
advanced while guaranteeing the separation performance of the separation filter. I can do it. For
example, T. Nishikawa, H. Saruwatari and K. Shkano, “Blind Source Separation of acoustic
signals based on multistage ICA combining frequency-domain ICA and Time-domain ICA” IEICE.
TRANS. Fundamentals, Vol. The evaluation function J proposed by E-84A, No1 Jan 2001) may be
used. That is, using signal YL (f, t) obtained by frequency-converting a time cutout signal in shorttime frame analysis of separated signal yL (t) of (Equation 19)
[0043]
It is preferable that the learning be repeated until the evaluation function J exceeds a certain
10-04-2019
11
threshold. In equation (23), <> t and <> f represent averaging over time and frequency for the
equation in <>, and the symbol <H> is the conjugate transpose of the matrix with this symbol And
diag represents a diagonal matrix, and the vertical double line on the right side represents the
Frobenius norm. Also, Φ (YL (f, t)) is
[0044]
It is a function given by In the present invention, the input signal is divided into a small number
of channel groups in each subblock, and FDICA is applied to the channel groups. Therefore, even
when the number of microphones is larger than the number of signal sources when all the
microphones are used, the number of microphones is larger than the number of signal sources
by matching the number of channels in the channel group with the number of signal sources. It is
possible to prevent the decrease in the separation accuracy of FDICA. Furthermore, since output
signals from all subblocks are input signals to TDICA, input information for all K microphones
can be effectively used.
[0045]
FIG. 2 shows a block diagram of the above process. In FIG. 2, the observation signals are detected
and converted into electrical signals in the sensors 10-1 to 10-n and the detection process 20.
The band division process 30 which is the next step gives the observed signal X <(n)> L (f, m) in
the equation (16). This band-divided signal is input to a signal identification process 1 shown at
40, and a dispersion matrix V <(n)> LL (f) is obtained. Here, the signal identification process 1 is
carried out for the parameter value indicating the state of the signal in the band subjected to
frequency analysis for each channel, the signal type according to the difference in the spatial
position between the signal source and the sensor and the type of signal source Statistical
analysis of temporal characteristics and frequency characteristics of the detection signal due to
differences in human voice, engine noise, road noise, etc.), and input from the same signal source
from the above parameter values An identification level 1 for identifying a target signal
parameter value of at least one signal source is calculated, and a time signal group 1 identified as
a signal related to the target signal parameter value using the identification level 1 and an
unnecessary signal parameter It is a plurality of signal identification processes in which all of the
time signal group 2 identified as the signal related to the value are transmitted.
[0046]
10-04-2019
12
Next, the calculation of equation (16) is performed in the first-order attenuation process 50 to
calculate the output signal Z <(n)> LL (f, m) of the n-th FDICA sub-block. Based on the result of
the above processing, the separation filter matrix w L (L × N) (τ) in the equation (19) is
obtained by the signal identification process 2 shown by 60. In the signal processing step 2
indicated by 60, the signal identification level 2 of the separation filter is calculated, and learning
is repeated until the signal identification level 2 reaches a desired level. In the second-order
attenuation process 70, the equation (19) is calculated to calculate the separated signal yL (t).
The equations (16) to (21) are merely examples, and do not represent the calculation method and
all of the present invention.
[0047]
In order to show the effect of the present invention, separation experiments of sound source
signals by off-line simulation were conducted. In this experiment, we compare the separation
accuracy in the sound signal of MSICA with 2ch and MSICA with 12ch to which the present
invention is applied. As a sound source signal, an impulse response with a reverberation time of
300 ms is convoluted to a signal source by an RWCP database often used for this kind of
experiment to create a reverberant voice (sampling frequency: 8 kHz). The signal source was an
audio signal, and experiments were conducted on 12 combinations of speaker and sound source
positions. The number of microphones was 2 (2ch-MSICA) and 12 (12ch-MSICA: proposed
method), and they were arranged linearly at a height of 1.46 m from the floor at intervals of 2.83
cm 2. Assuming that the sound source signal comes from two different azimuths set at a height
of 1.72 m from the floor, two conditions (source arrangement pattern 1: azimuth: voice from two
directions of -60 ° and + 40 °) Radiation, source arrangement pattern 2: voice is radiated from
two directions of azimuth -40 ° and + 20 °, where 0 ° is the direction perpendicular to the
microphone array). The distance between the sound source and the center of the microphone
array is 2.02 m, and the SNR when mixing two signals is 0 dB.
[0048]
Also, the separation method proposed by Saruwatari in FDICA (H. Saruwatari et al .: Proc.
Eurospeech 2001, vol. 4, pp. 2603-2606, Sep. 2001. The separation method proposed by Choi in
TDICA (S. Choi et al .: Proc. International Conference on ICA and BSS) pp. 371--376, Jan. 1999.
Was used. The separation filter of FDICA in each sub-block is a blind spot control type
beamformer with 1024 taps and an initial value forming a blind spot to ± 60 °. Moreover, the
separation filter of TDICA was 2048 taps. In this experiment, Noise Reduction Rate (NRR; output
10-04-2019
13
SNR [dB]-input SNR [dB]) was used as an objective evaluation measure of separation accuracy.
[0049]
FIG. 3 shows the results of separation accuracy for two experimental conditions. The
convergence accuracy of the separation filter will be described first. Under any of the conditions,
the SNR is improved as compared with the mixing time because the NRR value is a positive
number. From this, the convergence of the separation filter of 2ch-MSICA is well performed, and
furthermore, the convergence of the separation filter is well performed also in 12ch-MSICA
where the number of signal sources is smaller than the number of microphones. Recognize.
[0050]
Next, the separation accuracy is compared for the two methods. In the signal source arrangement
pattern 1, while the NRR value of 2ch-MSICA is 11.92 dB, the NRR value of 12ch-MSICA is 15.06
dB, and a performance improvement of 3.14 dB can be observed. Further, in the signal source
arrangement pattern 2, the NRR value of 2ch-MSICA was 7.92 dB, and that of 12ch-MSICA was
10.98 dB, and a performance improvement of 3.06 dB was observed. From the above, it can be
seen that in the present invention, the separation accuracy is improved as compared with the
conventional method.
[0051]
Next, the basic configuration of the apparatus corresponding to the above-described processing
procedure will be described with reference to FIG. 4 and the configuration of the central portion
of the processing apparatus will be described with reference to FIG. The sensor means 110-1 to
110-n and the detection means 120 shown in FIG. 4 are used to receive and detect an incoming
observation signal. This can be realized by a sensor group such as microphones shown by
sensors 210-1 to 210-2 in FIG. 5, a filter 220, and an A / D converter 230. The sensor groups
210-1 to 210-2 detect a plurality of wave signals such as light, sound, vibration, magnetic
change, magnetic field change, electricity, radio waves, etc. ), Used in spatially different positions.
Specifically, a single sensor or a plurality of sensors for detecting a wave represented by an
optical sensor, a sound sensor, a microphone, a vibration sensor, a magnetic sensor, an electric
sensor, and an antenna are used.
10-04-2019
14
[0052]
The filter is used to remove noise contained in the electrical signal obtained from the sensor. It is
sufficient to use a band pass filter which removes only the signal of the component which can
not be characteristic of the signal source for the electric signal detected by each sensor, and a
conventionally existing electric filter circuit is used It can be realized by The A / D converter may
be any device having a sufficient sampling frequency to accurately discretize the signal band in
the signal source, and can convert continuous electrical signals into discrete information signals.
This can be realized by using a D conversion circuit or the like.
[0053]
The band dividing means 130 in FIG. 4 converts the detected signal into a mathematically
orthogonal space using a function of an orthogonal transformation system. Specifically,
frequency conversion functions such as discrete Fourier transform, Z conversion, and Laplace
transform may be used, and calculation can be performed by the arithmetic unit 240 and the
storage unit 250 in FIG. The arithmetic unit 240 is configured by combining a single or a
plurality of main arithmetic circuits and circuit groups such as a CPU, MPU, DSP, and FPGA of a
general computer, sub arithmetic circuits which are peripheral circuits, and memory circuits. The
storage device 250 is a device capable of storing electrical signals represented by cache memory,
main memory, disk memory, compact disk, flash memory, DVD, tape, floppy (registered
trademark) disk, magneto-optical disk, MD, DAT, and It can be realized by using a medium. Also,
the signal identification means 1 140 calculates the identification level 1 of the separation filter
in each frequency band, and performs an operation for extracting the target signal from the
divided signals. This can be realized by the arithmetic unit 240 and the storage unit 250 of FIG.
[0054]
In the first attenuation means 150 for performing the FDICA process of FIG. 4 and the second
order attenuation means 160 for performing the TDICA process, a desired signal of interest is
extracted from the input signal, and the other unnecessary signals are attenuated. Do the
processing. This can be realized by the arithmetic unit 240 and the storage unit 250 in FIG.
[0055]
10-04-2019
15
Second Embodiment In the first-order attenuation process of the present invention, separation of
signals from a plurality of directional signal sources is aimed. However, in the real environment,
diffuse signal sources are also present, causing degradation of the separation performance of the
signal sources. For this reason, a processing method for preventing mixing of components other
than the signal sent from the target signal source in the frequency band where separation of the
signal is difficult even if using FDICA due to the presence of the diffuse signal source And
realization of the device is required. In the second embodiment, processing for removing
diffusive noise by frequency band suppression (SBE / SubBand Elimination) will be described.
[0056]
First, processing steps using the method according to the present invention will be described
with reference to FIG. In FIG. 6, the sensors 10-1 to 10-n, the detection process 20, the band
division process 30 and the first-order attenuation process 50 are as described in FIG. In FIG. 6,
in the primary secondary attenuation process 55, a parameter value indicating the state of the
signal in the frequency band in which the identification level 1 does not reach a predetermined
level by the primary attenuation process 50 and is difficult to separate is regarded as an
unnecessary component. Suppress time signal group 2 Thereby, using at least two or more
sensors, the target signal parameter value of at least one signal source is temporally, frequency,
geometrically spatially independent from the unnecessary signal parameter value using the
parameter value of signal identification process 1 Identification level 1 is increased when sex is
high. As described above, by providing the primary secondary attenuation process 55 after the
primary attenuation process 50, it is easier to suppress the time signal group 2 identifying the
parameter value indicating the state of the unnecessary signal which was difficult to suppress in
the frequency band. I made it.
[0057]
In order to know the frequency band of the unnecessary signal whose separation was difficult,
for example, a band suppression method using a cost function with cosine distance proposed by
Saruwatari et al. (Saruwatari et al., “Blind source separation and sub-band removal processing
This can be achieved by introducing "in-room sound recognition", IEICE, EA 2002-8). In the
method of Saruwatari et al., Attention is paid to the fact that the signal separation accuracy by
ICA is deteriorated as the cost function used in calculating FDICA is larger, and the SNR
improvement rate is improved by suppressing these bands. There is. That is, a cosine distance
10-04-2019
16
indicating the difference between the above target signal parameter value and the above
unnecessary signal parameter value is defined as a cost function, and when the value of the cost
function is low, the identification level 1 is determined as being highly independent. I am raising
it. Here, the cost function is for evaluating the independence between the separated signals, and
can be obtained using the higher order correlation value between the separated signals or the
cosine distance in the matrix space of the signals. In particular, the latter method using cosine
distance is considered to be efficient with a small amount of calculation. Equation (25) shows a
cost function J (f) based on the cosine distance of two sound sources.
[0058]
In Expression (25), Y1 (f, t) and Y2 (f, t) are separated signals after the unnecessary band is
removed, <> indicates time average, and <*> indicates complex conjugate. In order to actually
apply the cost function obtained by this, it is also necessary to perform processing such as
smoothing, but in any case, this method can be introduced to the present invention, and it
precedes the second attenuation process 70. By removing the band containing the diffusive noise
in advance, the separation performance in the second-order attenuation process 70 can be
expected to be improved.
[0059]
In the correction process 80 in FIG. 6, each time the primary attenuation process 50 for
separation is calculated in the signal identification process 1 indicated by 40, the above cost
function J (f) is referred to, and the primary secondary attenuation process 55 is performed.
Learning is performed by correcting the frequency band to be suppressed.
[0060]
Third Embodiment FIG. 7 shows a third embodiment of the present invention.
Third Embodiment A third embodiment will be described with reference to FIGS. 7 and 5.
[0061]
10-04-2019
17
The primary secondary attenuation means 155 in FIG. 7 suppresses the parameter values of the
frequency band for which separation is difficult by the primary attenuation means 150. Here, the
parameter value corresponds to the energy contained in the frequency bin corresponding to the
output of each element of the matrix indicating the reception signal from the plurality of sound
sources. Specifically, the parameter value suppression may be performed by reducing the energy
of the parameter value of the target frequency band, for example, setting the parameter value to
1 / n or combining with a notch filter It is sufficient to apply a method such as removing a target
frequency band by This can be realized by using the arithmetic unit 240 and the storage unit
250 in FIG.
[0062]
The storage means 180 in FIG. 7 stores information on the bandwidth of the signal divided by
the band division means 130 and information on the identification level used in signal
identification used in the signal identification means 1 shown at 140. As the information on the
bandwidth, for example, information on an analysis width at the time of analyzing and frequency
converting a signal, information on an analysis width at the time of subbanding after frequency
converting, and the like are stored. The information on the identification level stores the
information on the cost function. Also, the correction means 190 in FIG. 7 refers to the cost
function and corrects the primary secondary attenuation means 155 each time the primary
attenuation means 150 for separation is calculated in the signal identification means 1 indicated
by 140.
[0063]
Fourth Embodiment In FDICA, good separation accuracy can be obtained when the number of
signal sources and the number of microphones match. Therefore, it is necessary to realize means
for matching the number of signal sources with the number of microphones, and it is necessary
to predict the number of signal sources. For example, consider the case where the present
invention is applied to hands-free voice communication and voice input devices in a vehicle
cabin. At this time, the target signal is voice of the user, and the unnecessary signal can be
considered as various noises generated in the car, such as road noise, engine noise, air
conditioner noise, and the like. The road noise can be predicted by detecting the vehicle speed,
and the engine noise can be predicted by the vehicle speed and the presence or absence of idling.
The air conditioner noise can be predicted by the air conditioner ON / OFF and the switching
state of the spout. In the fourth embodiment, the generation of these unnecessary signals is
predicted, and the process of determining the number of subblocks of the signal identification
10-04-2019
18
unit 1 and the number of channels and the number of channels corresponding to one subblock
will be described.
[0064]
FIG. 8 is a block diagram of this processing system. The configuration of the apparatus according
to the present invention will be described using FIGS. 8 and 5. The prediction means 180 in FIG.
8 predicts the occurrence of various unwanted signals in the real environment and sends the
number of signal sources to the change means 190 in FIG. The changing means 190 determines
the number of subblocks included in the signal identification means 1 shown by 140 in FIG. 8
and the number of channels and channels corresponding to one subblock according to the
number of signal sources. Change the number and the channels and the number of channels
corresponding to one subblock.
[0065]
Fifth Embodiment The number of sub-blocks, and the number of channels and the number of
channels corresponding to the input and output terminals of one sub-block can be determined as
follows.
[0066]
FIG. 9 is a block diagram of a system for determining the above numbers.
Hereinafter, the fifth embodiment will be described using FIG. 10 and FIG. 5 in combination.
[0067]
The table means 200 in FIG. 9 stores a plurality of standard patterns describing the number of
subblocks to be changed by the changing means 190, the channels to be corresponded to the
input / output terminals of one subblock, and the number of channels. The table means 200 can
be realized by using the storage means 250 of FIG. An example of the standard pattern is shown
in FIG. In FIG. 10, in a system in which the number of sensors corresponding to the microphone
is five, the subblocks when the number of signal sources is predicted to be two, the path from
10-04-2019
19
each sensor to each subblock, and the channel input to one subblock The state where the number
has been determined is shown. At this time, three subblocks are generated, and the number of
channels input to each subblock is two, which is the same as the number of predicted signal
sources. As for the path from each sensor to each sub block, sensor 1 and sensor 2 become the
input of sub block 1, sensor 3 and sensor 4 become the input of sub block 2, sensor 4 and sensor
5 become the input of sub block 3, There is. Also, the number of channels of the output signal
from TDICA corresponding to the secondary attenuation means 170 is two, the same as the
number of predicted signal sources. The number of these subblocks and the standard pattern of
the channels and the number of channels corresponding to one subblock are influenced by the
position of the sensor, so it is desirable to decide according to the environment in which the
present invention is applied.
[0068]
Sixth Embodiment In the sixth embodiment, as one method of sensor arrangement, a method of
constructing a sub sensor array and associating one sub block with each sub sensor array is
disclosed. Each of the sub sensor arrays 304, 305, and 306 shown in FIG. 11 has two sensors in
one sub sensor array, and one sub block is disposed downstream of these sub sensor arrays. In
the case of using such a method, it is desirable to arrange the sub-sensor array so as to be able to
detect signals spatially independently since two signal sources are assumed in advance. As
described above, by arranging a plurality of microphones each serving as a sensor, it is possible
to improve the degree of freedom of the sensor arrangement and to improve the practicability.
Each of the above-described embodiments is merely an example to which the present invention is
applied, and does not limit the scope of the present invention.
[0069]
The systematic diagram which shows the flow of the signal in this invention. FIG. 2 is a flow chart
illustrating a signal processing process according to the first embodiment. 2 is a histogram
showing the signal processing results according to the invention. FIG. 1 is a block diagram
showing a basic configuration of a signal processing apparatus according to Embodiment 1. The
block diagram which shows the connection relation of the central part hardware of a signal
processing system. FIG. 10 is a flowchart for explaining a signal processing process according to
the second embodiment. FIG. 10 is a block diagram according to a signal processing device
according to a third embodiment. FIG. 14 is a block diagram according to a signal processing
device according to a fourth embodiment. FIG. 16 is a block diagram according to a signal
processing device according to a fifth embodiment. The interconnection diagram regarding the
10-04-2019
20
subblock in Embodiment 5. The interconnection diagram regarding the subblock in Embodiment
6. FIG. The processing-system systematic diagram explaining the conventional signal separation
processing. FIG. 14 is a system diagram showing another conventional signal separation process.
Explanation of sign
[0070]
10_ 1 to 10 _ n 110 1 to 110 _ n 210 1 to 210 _ n: Sensor 20: Detection process 30: Band
division process 40: Signal discrimination process 1 50: Primary attenuation process 55: Primary
secondary attenuation process 60: Signal discrimination process 2 70: Secondary attenuation
Process 80: Correction process 120: Detection means 130: Band division means 140: Signal
discrimination means 1 150: Primary attenuation means 155: Primary and secondary attenuation
means 160: Signal discrimination means 2 170: Secondary attenuation means 180: Memory
means 181: Prediction Means 190: Modification means 191: Modification means 200: Table
means 220: Filter 230: AD converter 240: Arithmetic unit 250: Storage unit 300: Sub block 1
301: Sub block 2 302: Sub block 3 303: TDICA 304, 305 , 306: Sub sensor array
10-04-2019
21
Документ
Категория
Без категории
Просмотров
0
Размер файла
36 Кб
Теги
description, jp2005091560
1/--страниц
Пожаловаться на содержимое документа