close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2017085265

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2017085265
Abstract: In reverberation addition, a directional impulse response is generated that
approximates the spatial impression of a measured sound field more accurately. SOLUTION: A
virtual sound source information estimation unit 13 of an impulse response generation device 1
is calculated based on respective signals obtained by collecting the same sound signal by a
plurality of microphones installed at different positions in space. Each impulse response is used
to estimate information of a virtual sound source in space at an arbitrary position surrounded by
the plurality of microphones. The comparison / determination unit 14 determines, in the impulse
response, the reflected sound coming from the direction corresponding to the microphone in
which the impulse response is measured, based on the comparison between the information of
the estimated virtual sound source and the impulse response. The reflected sound removing unit
15 thins out the reflected sound other than the reflected sound determined by the comparing
and determining unit 14 from the impulse response when it comes from the direction
corresponding to the microphone in which the impulse response is measured, and generates an
impulse response according to direction. Do. [Selected figure] Figure 3
Impulse response generator and program
[0001]
The present invention relates to an impulse response generator and program.
[0002]
In the production of sound such as television programs, music, movies, etc., a technique is used
11-04-2019
1
to add reverberation to give a sense of sound expanse and presence.
Infinite impulse response (IIR) filter type that simulates space reverberation by simulation, FIR
(Finite impulse response) filter type that generates reverberation using impulse response
measured in real space Various reverberation-applying devices have been developed, such as U.S.
Pat. In particular, the FIR filter type reverberation adding apparatus has a feature that the
reverberation characteristics of the space in which the impulse response is measured can be
faithfully reproduced.
[0003]
By the way, multi-channel audio system such as 5.1 channel surround system and 22.2 multichannel system for Super Hi-Vision express sounds coming from various directions using
multiple speakers arranged on the plane or in space It has the feature that it can be done (see, for
example, non-patent document 1). Generally, even in the process of producing these contents, a
reverberation-applying device is often used to impart rich presence and spatial expanse. As
reflected sound comes from various directions such as walls, floors, and ceilings in real space,
even with reverberation in multi-channel acoustics, characteristic reflected sounds coming from
any given direction (direction-specific reverberations), It is ideal that the reverberation
characteristics of the space where the impulse response is measured be reproduced as faithfully
as possible by reproducing from the speaker installed in that direction. In particular, it has been
found that, after direct sound, the early reflections that arrive within about 100 ms (milliseconds)
contribute to the perception of the width and direction of the sound image, and in reverberation
addition, the early reflections are separated by direction. Reproducing is important in
reproducing the impression of the space in which the impulse response is measured. Therefore,
techniques have been proposed for generating directional impulse responses by using impulse
responses measured by directional microphones (see, for example, Patent Documents 2 and 3).
[0004]
On the other hand, using a plurality of microphones such as a close proximity four-point method
(see, for example, non-patent document 2) or an extended method thereof (for example, see nonpatent documents 3 and 4) Various techniques for estimating position have been proposed. In
these methods, the time of arrival and the direction of arrival of the reflected sound at the sound
receiving point can be predicted by calculating the mutual correlation and intensity of each
microphone.
11-04-2019
2
[0005]
JP-A-2014-45282 JP-A-2013-238643 JP-A-2014-112767
[0006]
W. woszczyk, et al., “Space Builder: An impulse response-based tool for immersive 22.2 channel
ambiance design”, Japan, AES 40th international conference, 2010 Yoshio Yamazaki, “Spatial
information of sound field by the proximity four-point method Visualization, Journal of the
Institute of Television Engineers of Japan, 1990, Vol. 44, No. 3, p. 253-258 Masahiro Nishimura,
2 others, "Proposal of extended algorithm of proximity four-point method for virtual sound
source estimation by BoSC microphone , Mar. 2012, Proceedings of the Acoustical Society of
Japan, pp. 789-790 Marie Kobayashi et al., 2 others, "Source position estimation and sound
source separation by proximity-disposed four-point microphones", Japan Acoustical Society
Lecture Articles, March 2012, p.685-686
[0007]
The impulse response generated by the above-described technique also includes the reflected
sound coming from other than an arbitrary direction.
Although speaking as a directional microphone, the directivity has a width, so that a plurality of
microphones in close proximity pick up the same reflected sound.
For this reason, when reverberation is performed using these impulse responses, a certain
reflected sound is reproduced from a plurality of speakers, the density of the reflected sound at
the listening point becomes high, and the spatial impression of the measurement sound field is
strictly determined. There is a problem that is not reproduced.
[0008]
The present invention has been made in consideration of such circumstances, and provides an
impulse response generation device and program for generating a directional impulse response
that can more closely reproduce the spatial impression of the measurement sound field in
reverberation addition. .
11-04-2019
3
[0009]
According to an aspect of the present invention, a plurality of microphones are calculated using
impulse responses calculated based on respective signals obtained by collecting the same
acoustic signal by a plurality of microphones installed at different positions in space. A virtual
sound source information estimating unit for estimating information of the virtual sound source
in the space at an arbitrary position surrounded by the comparison of the information of the
virtual sound source estimated by the virtual sound source information estimating unit with the
impulse response Based on the impulse response, the comparison / determination unit that
determines the reflected sound coming from the direction corresponding to the microphone from
which the signal from which the impulse response was calculated is obtained, and the impulse
response from the impulse response It is determined by the comparison / determination unit that
the signal that is the source of the response calculation comes from the direction corresponding
to the microphone from which the signal is obtained A reflected sound removing portion for
generating a direction different impulse response by decimating the reflected sound other than
ion, an impulse response generation apparatus, characterized in that it comprises a.
[0010]
One aspect of the present invention is the impulse response generation device described above,
wherein the information of the virtual sound source indicates an arrival time and an arrival
direction of the virtual sound source at the arbitrary position, and the comparison determination
unit determines the virtual sound source The impulse based on a comparison of the time of
arrival of the sound source with the temporal change in sound magnitude represented by the
waveform structure of the impulse response calculated from the signal collected by the
microphone corresponding to the direction of arrival of the virtual sound source It is
characterized in that a reflected sound coming from a direction corresponding to the microphone
from which the signal which is the source of the response calculation is obtained is determined.
[0011]
One aspect of the present invention is the impulse response generation device described above,
wherein the comparison / determination unit selects a microphone corresponding to the arrival
direction of the virtual sound source among the plurality of microphones, and collects the sound
by the selected microphone In the impulse response calculated from the selected signal, a local
time occurring within a predetermined time range including a time obtained by subtracting a
delay according to the distance between the selected microphone and the arbitrary position from
the arrival time of the virtual sound source Processing for identifying a dynamic peak is
performed for each of the virtual sound sources, and the local peak specified by the processing in
the impulse response is used as the microphone from which the signal from which the impulse
response is calculated is obtained. It is characterized in that it is determined that the reflected
11-04-2019
4
sound comes from the corresponding direction.
[0012]
One aspect of the present invention uses a plurality of impulse responses calculated based on
respective signals obtained by collecting the same acoustic signal by a plurality of microphones
installed at different positions in space. Virtual sound source information estimating means for
estimating information of the virtual sound source in the space at an arbitrary position
surrounded by the microphones, information of the virtual sound source estimated by the virtual
sound source information estimating means, and the impulse response Comparing and judging
means for judging the reflected sound coming from the direction corresponding to the
microphone from which the signal which is the source of the calculation of the impulse response
is obtained in the impulse response based on the comparison with If the signal from which the
impulse response has been calculated comes from the direction corresponding to the microphone
from which the signal was obtained, the comparison Is a program for functioning as an impulse
response generation apparatus comprising a reflected sound removing means for generating a
direction different impulse response by decimating the reflected sound other than the reflected
sound is determined by the constant means.
[0013]
According to the present invention, in reverberation addition, it is possible to generate directional
impulse responses that can more closely reproduce the spatial impression of the measurement
sound field.
[0014]
It is a figure which shows the directional characteristic of a unidirectional microphone.
FIG. 5 shows an impulse response measured by a unidirectional microphone.
It is a functional block diagram showing the composition of the impulse response generation
device by one embodiment of the present invention.
It is a figure which shows the example of the impulse response measuring method by the
embodiment.
11-04-2019
5
It is a flowchart of the impulse response generation processing by direction in the impulse
response generation device by the embodiment.
It is a figure which shows the example of the impulse response measuring method by the
embodiment.
It is a figure for demonstrating the estimation process of virtual sound source information using
the intensity | strength by the embodiment. It is a flowchart of virtual sound source information
estimation processing using the cross correlation coefficient by the embodiment. It is a figure
which shows the example of the comparison determination processing and reflected sound
removal processing by the embodiment.
[0015]
Hereinafter, embodiments of the present invention will be described in detail with reference to
the drawings. First, sound collection by a plurality of unidirectional microphones will be
described using FIGS. 1 and 2. FIG. 1 is a diagram showing the directivity characteristic of a
unidirectional microphone. The microphones M a and M b are unidirectional microphones. In the
drawing, in order to pick up the reflected sound in different directions, the microphone M a is
oriented in the D1 direction, and the microphone M b is oriented in the D2 direction. The
directivity of the microphone M a is R a and the directivity of the microphone M b is R b. The
dotted lines indicate the sound collection range when the microphones M a and M b are
omnidirectional microphones. The reflected sound A comes from the D1 direction, and the
reflected sound B comes from the D2 direction.
[0016]
FIG. 2 is a diagram showing an impulse response measured by the unidirectional microphone
shown in FIG. The impulse response is calculated by emitting a signal for acoustic measurement,
such as a TSP (Time Stretched Pulse) signal, and processing the signal collected (measured) by
the microphone. Represents a change. In the present specification, an impulse response
calculated by processing a signal picked up by a microphone is referred to as an impulse
response measured by the microphone. Specifically, an impulse response measured by a
11-04-2019
6
microphone at an arbitrary position is obtained as information in which time t and the amplitude
of sound collected by the microphone at time t are associated with each other. 2 (a) shows an
impulse response h a (t) measured by the microphone M a shown in FIG. 1, and FIG. 2 (b) shows
an impulse response h b measured by the microphone M b shown in FIG. (T) is shown.
[0017]
Essentially, the reflected sound A from the D1 direction is picked up only with the microphone M
a installed facing the D1 direction, and the reflected sound B from the D2 direction picked up
only with the microphone M b installed facing the D2 direction Is the ideal. However, as shown in
FIG. 1, in practice there is a wide range of directional characteristics of the directional
microphone. Therefore, both the reflected sounds A and B are picked up by both the
microphones M a and M b, and have waveforms such as impulse responses h a (t) and h b (t)
shown in FIG. That is, in the impulse response ha (t), the level of the reflected sound B is smaller
than that of the reflected sound A, and in the impulse response h b (t), the level of the reflected
sound A is smaller than that of the reflected sound B. Reflected sound coming from other than
the direction corresponding to the microphone (direction to be measured) is also included.
[0018]
Therefore, if reverberation is added using impulse responses measured by each of a plurality of
directional microphones, the reflected sound coming from other than the direction
corresponding to each microphone (the direction to be measured) is reproduced from a plurality
of speakers. The impression of the measurement space may change. Therefore, in the impulse
response generation device of the present embodiment, the direction (the direction to be
measured) corresponding to the microphone such as the reflected sound B collected by the
microphone M a or the reflected sound A collected by the microphone M b. The reflected sound
coming from outside is removed from the measured impulse response to produce a more
accurate directional impulse response.
[0019]
In order to generate a direction-specific impulse response, in the present embodiment, a plurality
of microphones are first placed at an arbitrary position in space, and the impulse response is
measured. The impulse response generator estimates the arrival direction and arrival time of a
11-04-2019
7
virtual sound source (reflected sound) in space at an arbitrary position surrounded by the
microphone group using the impulse response measured by each microphone. The impulse
response generator compares the estimated information with the arrival time of the reflected
sound included in the impulse response measured by each microphone. The impulse response
generator specifies the reflected sound coming from other than the direction (the direction to be
measured) corresponding to each microphone as a target to be removed from the impulse
response measured by the microphone based on the result of the comparison. The impulse
response generator removes any reflected sound to be removed from the impulse response
measured by each microphone, and generates a direction-specific impulse response. Note that
the arrival direction to be collected by each microphone may differ depending on the installation
direction of the adjacent microphones in addition to the installation direction of the microphone.
[0020]
FIG. 3 is a functional block diagram showing the configuration of the impulse response generator
1 according to an embodiment of the present invention, and only functional blocks related to the
present embodiment are extracted and shown. The impulse response generation device 1 can be
realized by, for example, a computer device. As shown in the figure, the impulse response
generation device 1 includes a storage unit 11, a measurement information acquisition unit 12, a
virtual sound source information estimation unit 13, a comparison determination unit 14, and a
reflected sound removal unit 15. Be done.
[0021]
The storage unit 11 stores various information such as impulse responses measured by the
respective microphones. The measurement information acquisition unit 12 acquires impulse
responses measured by each of the plurality of microphones arranged at arbitrary positions in
the space, and writes the impulse responses in the storage unit 11. Hereinafter, an i-th
microphone of n microphones is described as M i (i is an integer of 1 or more and n or less), and
an impulse response measured by the microphone M i is described as h i (t). Also, let the times at
local peaks (maximum points of the envelope) included in the impulse response h i (t) be t i1, t
i2,.
[0022]
11-04-2019
8
The virtual sound source information estimation unit 13 calculates intensity, cross correlation
coefficient, and the like from the impulse responses measured by the microphones, thereby
obtaining an arbitrary point surrounded by the microphone group (microphones M 1 to M n),
Estimate the information of the virtual sound source in space. The object of estimation is a virtual
sound source at a level considered to have a large contribution to direction perception. The
virtual sound sources at levels considered to have a large contribution to direction perception are
mainly virtual sound sources that arrive at the receiving point within about 100 ms
(milliseconds) from the direct reflection portion, the direct sound. The comparison determination
unit 14 compares the information on the virtual sound source estimated by the virtual sound
source information estimation unit 13 with the waveform structure of the original impulse
response. The comparison determination unit 14 determines the reflection sound (local peak) to
be removed from the original impulse response based on the comparison result so that a certain
reflection sound is not reproduced redundantly from a plurality of speakers at the time of
reverberation addition. Do. The reflected sound removal unit 15 removes the reflected sound
(local peak) determined by the comparison and determination unit 14 from the original impulse
response, and generates an impulse response according to direction. Reverberation is performed
according to the prior art using the generated directional impulse response.
[0023]
FIG. 4 is a diagram showing an example of an impulse response measurement method. In the
impulse response measurement space, a speaker for reproducing an impulse response
measurement signal and a plurality of microphones M 0 to M n are arranged. Here, the
microphones M 0 to M n are unidirectional microphones, but omnidirectional microphones may
be used. The position of the speaker and the number and position of the microphones are set
according to the size of the measurement space, the application of the reverberation, and the
reproduction method. For example, in the case of the 22.2 multi-channel system, it is conceivable
to use 24 microphones or 22 microphones except for the LFE (Low Frequency Effect) channel.
Then, an acoustic signal for impulse response measurement such as a TSP signal is emitted from
a speaker installed at an appropriate position, and is recorded by a microphone group including
microphones M 0 to M n. The impulse responses h 0 (t), h 1 (t), h 2 (t),..., H n (t) measured by the
respective microphones M 0, M 1, M 2,. Is obtained. Then, let P be an arbitrary point in the space
surrounded by the microphone group. In FIG. 4A, the installation position of the microphone M 0
is the point P, and in FIG. 4B, the position where none of the microphones M 0 to M n is installed
is the point P.
[0024]
11-04-2019
9
FIG. 5 is a flowchart of the direction-specific impulse response generation process in the impulse
response generator 1. First, the measurement information acquisition unit 12 of the impulse
response generation device 1 measures impulse responses h 0 (t), h 1 (t), h 2 measured by the
microphones M 0, M 1, M 2,. (T), ..., h n (t) are acquired (step S10). For example, the
measurement information acquisition unit 12 may receive an impulse response from each of the
microphones M 0 to M n, may receive it from another computer device, and may read it from a
recording medium. The measurement information acquisition unit 12 writes the acquired
impulse response in the storage unit 11.
[0025]
The virtual sound source information estimation unit 13 is surrounded by the microphone group
based on the impulse responses h 0 (t), h 1 (t), h 2 (t),..., H n (t) read from the storage unit 11.
Information e 1 (r 1, θ 1, φ 1, t 1), e 2 (r 2, θ 2, φ 2, t 2),. (Rm,? M,? M, tm) is predicted (step
S20). Where r is amplitude, θ is azimuth angle, φ is elevation angle, t is arrival time, m is the
number of predicted virtual sources, and subscripts of r, θ, φ and t are m virtual sources Of
which virtual sound source. The amplitude r represents the level of the virtual sound source, and
the azimuth angle θ and the elevation angle φ represent the arrival direction of the virtual
sound source at the point P. The three-dimensional coordinate space used to represent the
azimuth angle θ and the elevation angle φ can be arbitrarily set. Here, the position of point P is
origin point 0, the x axis is a straight line connecting the speaker and point P, the y axis is a
direction perpendicular to the x axis on a plane parallel to the ground, and the z axis is a
direction perpendicular to the xy plane Use a three-dimensional coordinate space. Further, here,
the case where the microphone M 0 is installed at the point P will be considered. The virtual
sound source information estimation unit 13 predicts virtual sound source information by
calculating the intensity and the cross-correlation between each channel using an arbitrary
existing method such as a proximity four-point method or a method obtained by extending the
same. Details of the calculation procedure will be described later. The virtual sound source
information estimation unit 13 writes the predicted virtual sound source information e 1 to e m
in the storage unit 11.
[0026]
Next, the comparison and determination unit 14 performs comparison and determination
processing to determine the reflected sound to be removed from each of the impulse responses h
1 (t) to h n (t) (step S30). First, the comparison / determination unit 14 generates impulse
11-04-2019
10
responses h 0 (t), h 1 (t), h 2 (t),..., H n (t) and virtual sound source information e 1 (r 1, θ) from
the storage unit 11. 1, φ 1, t 1), e 2 (r 2, θ 2, φ 2, t 2),..., E m (rm, θ m, φ m, t m) are read. The
comparison determination unit 14 extracts the arrival times (t 1, t 2,..., T m) of each virtual sound
source from the virtual sound source information e 1 to e m at the point P estimated by the
virtual sound source information estimation unit 13 Do. The comparison / determination unit 14
determines the arrival times (t 1, t 2,..., T m) of each virtual sound source and the measured
impulse responses h 1 (t), h 2 (t),. The local peak arrival times (t 11, t 12,...), (T 21, t 22,...),..., (T
n1, t n2,...) Included in each are compared. The comparison determination unit 14 determines the
reflected sound to be removed from each impulse response based on the comparison result. The
reflected sound to be removed is a reflected sound coming from a direction different from the
direction (the direction in which the microphone is directed) corresponding to the microphone
among the reflected sounds included in the impulse response measured by a certain microphone.
That is, the reflected sound to be removed is not only the microphones arranged in the direction
of arrival but also reflections measured redundantly in the impulse response measured by other
microphones arranged in the direction other than the direction of arrival. It is a sound. The
comparison determination unit 14 also takes into consideration the delays τ 1, τ 2,..., Τ n
according to the distances from the point P to the microphones M 1, M 2,. Further, in the
determination, it is necessary to take into account a measurement error and the like. Therefore,
the comparison and determination unit 14 sets a certain predetermined window width w. The
comparison / determination unit 14 compares the arrival time of the virtual sound source in
consideration of the delay with the time at which a local peak occurs in the time range of the
window width w centered on the arrival time in the impulse response and performs the
determination. Do.
[0027]
The reflected sound removal unit 15 performs a reflected sound removal process for removing
the reflected sound that the comparison determination unit 14 has determined to be removed
from each of the impulse responses h 1 (t) to h n (t), Are generated (step S40). First, the reflected
sound removal unit 15 removes the impulse responses h 1 (t), h 2 (t),..., H n (t) read from the
storage unit 11 by the comparison / determination unit 14 from the impulse responses The
reflected sound determined to be to be removed is removed, and directional impulse responses h
'1 (t), h' 2 (t), ..., h 'n (t) are generated.
[0028]
Hereinafter, the detailed processing of step S20 to step S40 of FIG. 5 will be described with a
11-04-2019
11
specific example. Essentially, in order to obtain virtual sound source information in a threedimensional space, it is necessary to place microphones at four or more points that are not in the
same plane, but here, in order to explain the principle in an easy-to-understand manner, Consider
the case where a microphone is placed for measurement.
[0029]
FIG. 6 is a diagram showing an example of an impulse response measurement method. The
microphones M 1 to M 4 are placed equidistantly and equidistantly from the microphone M 0 so
as to surround the microphone M 0. The angle between the adjacent microphones M 1 to M 4
and the microphone M 0 is 90 degrees, and the distance between the microphone M 0 and each
of the microphones M 1 to M 4 is L. Specifically, the microphones M 1, M 2, M 3 and M 4 are
placed on the xy plane of the two-dimensional space with the microphone M 0 as the origin. The
installation directions of the microphones M 0, M 1, M 2, M 3, and M 4 are 0 degrees, 0 degrees,
90 degrees, 270 degrees, and 180 degrees with respect to the x-axis, respectively.
[0030]
Then, the measurement signal is reproduced from the speaker, and impulse responses h 0 (t) to h
4 (t) are calculated from the signals collected by the microphones M 0 to M 4. As shown in the
figure, in each impulse response h i (t) (i is an integer of 1 or more and 4 or less), the level of the
reflected sound coming from the direction corresponding to the microphone M i (the direction
facing the microphone) Not only the solid line) but also the level (broken line) of the reflected
sound coming from other directions is included. If it measures with a directional microphone, it
can control the reflected sound which arrives from other than the direction (direction which
turned the microphone) corresponding to a microphone rather than measuring with an
omnidirectional microphone. However, although speaking as a directional microphone, its
directivity has a width, so that a plurality of adjacent microphones pick up the same reflected
sound. Therefore, the impulse response generation device 1 estimates virtual sound source
information at the center point M 0 of the microphone group (the position of the microphone M
0) by processing to be described later, and each microphone M 1 is based on the estimated
virtual sound source information. The overlapping reflections are thinned out from the impulse
responses h 1 (t) to h 4 (t) of ~ M 4.
[0031]
11-04-2019
12
Next, processing in which the virtual sound source information estimating unit 13 estimates
virtual sound source information from the measured impulse response in step S20 will be
described. Although various methods are used for estimation, here, a method using intensity and
a method using a cross correlation coefficient are shown as an example.
[0032]
First, estimation processing of virtual sound source information using intensity will be described.
FIG. 7 is a diagram for explaining estimation processing of virtual sound source information
using intensity. First, the virtual sound source information estimation unit 13 calculates the
instantaneous intensity I x (t) in the 0-X direction (direction of M 0 -M 1) from the impulse
response h 0 (t) and the impulse response h 1 (t). Ask. Furthermore, the virtual sound source
information estimation unit 13 calculates the instantaneous intensity I y (t) in the 0-Y direction
(direction of M 0 -M 2) from the impulse response h 0 (t) and the impulse response h 2 (t). Ask.
[0033]
The instantaneous intensity I x (t) is obtained by the following equation (1).
[0034]
[0035]
In equation (1), p 1 (t) and p 2 (t) are sound pressures at time t measured by the microphone, and
impulse responses h 0 (t) and h 1 (t) are used, respectively.
Also, d is the distance between the microphones, and uses the distance L between the
microphone M 0 and the microphone M 1.
ρ 0 is the density of the medium. The sign of the value of the instantaneous intensity I x (t)
represents the direction of arrival in the x-axis direction. The instantaneous intensity I y (t) can
be determined as well. Impulse responses h 0 (t) and h 2 (t) are used for p 1 (t) and p 2 (t), and a
distance L between the microphone M 0 and the microphone M 2 is used for d. The sign of the
11-04-2019
13
value of the instantaneous intensity I y (t) represents the direction of arrival in the y-axis
direction.
[0036]
The instantaneous intensity I x (t) is calculated using a combination of impulse responses
measured by an arbitrary microphone placed on the x-axis (for example, impulse responses h 0
(t) and h 4 (t)). can do. Similarly, the instantaneous intensity I y (t) can be calculated using a
combination of impulse responses (eg, impulse responses h 0 (t) and h 3 (t)) measured by
arbitrary microphones placed on the y-axis It can be calculated.
[0037]
Subsequently, the virtual sound source information estimating unit 13 calculates envelope
intensities Ih x (t) and Ih y (t) obtained by taking the envelope of the instantaneous intensity I x
(t) and the instantaneous intensity I y (t) by Hilbert transform. Ask for). The virtual sound source
information estimation unit 13 vector-synthesizes the levels of the envelope intensities Ih x (t)
and I h y (t) to obtain the level Lev_Ih (t) of the synthesis intensity. By vector synthesizing the
levels of envelope intensity Ih x (t) and I h y (t), the arrival direction of the virtual sound source at
time t can also be obtained, but in the figure only the temporal change of the level is shown
There is. The virtual sound source information estimation unit 13 detects local peaks P 1, P 2,...,
P m that appear in the range from the first peak level Lev_Ps (direct sound) of the synthetic
intensity level Lev_Ih (t) to an arbitrary level Lev_Pe. To detect The virtual sound source
information estimation unit 13 detects virtual sound source information e 1 (r 1, θ 1, φ 1, t) in
the room (measurement space) from the synthesized intensities at the detected peaks P 1, P 2,.
1), e 2 (r 2, θ 2, φ 2, t 2),..., E m (r m, θ m, φ m, t m) are obtained. Note that the arrival time t 1
<t 2 <t 3 <... <T m. Information (e (t)) is obtained by arranging information e 1 to e m of virtual
sound sources according to the arrival time.
[0038]
The above is an example in which the microphones M 0 to M 4 are installed on the same plane
(xy plane), the Z coordinate of the virtual sound source is 0, and the elevation angles φ 1, φ 2,.
It is. When microphones are also installed on the z-axis, that is, when microphones are installed
three-dimensionally, the combination intensity of impulse response measured by the installed
11-04-2019
14
microphone is further used to calculate the synthesis intensity in the same manner as above.
Thus, the elevation angle of each virtual sound source can be calculated. In other words, the
virtual sound source information estimation unit 13 uses the combination of impulse responses
measured by two microphones placed on the z-axis to generate an envelope of the instantaneous
intensity I z (t) in the 0-Z direction. Calculate the intensity Ih z (t). The virtual sound source
information estimation unit 13 vector-synthesizes the levels of the envelope intensities Ih x (t), Ih
y (t) and Ih z (t), and local peaks P 1 and P appearing in the range of Lev_Ps and Lev_Pe 2, ..., P m
to detect.
[0039]
Next, estimation processing of virtual sound source information using the cross correlation
coefficient will be described. A short time correlation coefficient S ij (τ) between two impulse
responses h i (t) and h j (t) from time t a to time t a + Δ is given by the following equation (2). Δ
is the analysis window length and τ is the time difference.
[0040]
[0041]
The same reflected sound appears at different times in the impulse response measured by each
of the microphones because the time when the same reflected sound reaches each of the
microphones installed at different positions deviates.
The fact that the value of the short-term correlation coefficient S ij is large indicates that the
same reflected sound appears in h i (t) and h j (t) with a time difference τ shifted. That is, it
indicates that the same reflected sound deviates by τ and reaches the microphone M i and the
microphone M j. In this embodiment, since a combination of impulse responses measured by
arbitrary microphones placed on the x-axis and y-axis, respectively, i = 0, 1, 2, and j = 1, 2 are
used here. (I ≠ j) That is, there are three combinations of (i, j): (0, 1), (0, 2), (1, 2) or (2, 1). In the
following, the case of i = 0, 1, 2 and j = 1, 2 will be described as an example, but combinations of
impulse responses measured by arbitrary microphones placed on the x-axis and y-axis
respectively ( For example, i = 0, 3, 4 and j = 3, 4, etc.) can be used.
11-04-2019
15
[0042]
FIG. 8 is a flowchart of virtual sound source information estimation processing using the cross
correlation coefficient. First, the virtual sound source information estimation unit 13 sets an
initial value at time t a (step S110). The initial value is, for example, a time when a direct sound is
detected in the impulse response h 0 (t). The virtual sound source information estimation unit 13
finds, for each (i, j) combination, a time difference τ = τ ij giving a maximum value to the
correlation coefficient S ij (τ) (step S 120). The virtual sound source information estimation unit
13 determines the time length (window width) Δ ij at which the correlation coefficient S ij (τ) is
equal to or greater than a predetermined value (for example, 0.8) for each (i, j) combination.
Among the obtained values, the time length Δ ij_min of the smallest value is selected (step
S130). The time length Δ ij _ min corresponds to the shortest time from the arrival of one
reflected sound to the arrival of the next reflected sound. By selecting the minimum value of time
length Δ ij, the next reflected sound is not included in the window width.
[0043]
Next, the virtual sound source information estimation unit 13 performs interpolation processing
in the vicinity of the time length Δ ij_min. Here, as the interpolation processing, the sampling
frequency fs is increased to several tens times. In the present embodiment, the sampling
frequency fs is increased by 16 times. The virtual sound source information estimation unit 13
performs the same processing as step S120 using the impulse responses h i (t) and h j (t)
obtained by the interpolation processing, and the correlation coefficient S ij (τ) is maximized. Is
obtained, and a more accurate time difference τ = hτ ij is determined (step S140). The virtual
sound source information estimation unit 13 calculates the coordinates and angle of the virtual
sound source using the obtained time difference hτ ij (step S150).
[0044]
The distance between the virtual sound source and the microphone M 0 is obtained by (t a −t 0)
× c, where c is the speed of sound and the time when the virtual sound source is emitted is t 0.
Further, the distance between the virtual sound source and the microphone M 1 is obtained by (t
a + hτ 01 −t 0) × c, and the distance between the virtual sound source and the microphone M
2 is obtained by (t a + hτ 02 −t 0) × c. The distance between the microphone M 0 and the
microphone M 1 on the x axis and the distance between the microphone M 0 and the
microphone M 2 on the y axis are known (distance L). Therefore, based on the distance between
11-04-2019
16
these microphones and the distance between the virtual sound source and each of the
microphones M 0, M 1 and M 2, the coordinates (X, of the virtual sound source that has reached
the center point M 0 at time t a Y, Z) are obtained, and from these coordinates the azimuth and
elevation are obtained.
[0045]
If the time t a has not reached the time of analysis end (step S 160: NO), the virtual sound source
information estimating unit 13 shifts the time t a which is the start point of the window to be
analyzed by Δ ij_min (step S 170) The process from step S120 is performed. The virtual sound
source information estimation unit 13 detects virtual sound sources one after another until the
time t a reaches the analysis end time (step S160: YES). By such processing, the virtual sound
source information estimation unit 13 calculates virtual sound source information e 1 (r 1, θ 1,
φ 1, t 1) and e 2 (r 2, θ 2,) at the center point M 0 of the microphone group. φ 2, t 2),..., e m (r
m, θ m, φ m, t m) can be estimated.
[0046]
In the above, since the microphones M 0 to M 4 are placed on the same plane (xy plane), the Z
coordinate of the virtual sound source is 0, and therefore the elevation angles φ 1, φ 2,. . By
performing the same process as above using the impulse response measured by a microphone
not on the same plane, such as a microphone on the z axis that is not on the xy plane, the Z
coordinate of each virtual sound source is determined, and the elevation angle is also It can be
calculated.
[0047]
Next, the comparison determination process in step S30 of FIG. 5 and the reflected sound
removal process in step S40 of FIG. 5 will be described. FIG. 9 is a diagram illustrating an
example of the comparison determination process and the reflected sound removal process.
Here, as a result of virtual sound source information estimation, four virtual sound source
information e 1 (r 1, θ 1, φ 1, t 1), e 2 (r 2, θ 2, φ 2, t 2), e 3 ( An example in which r 3, θ 3,
φ 3, t 3) and e 4 (r 4, θ 4, φ 4, t 4) are estimated will be described. The comparison /
determination unit 14 extracts the arrival times (t 1, t 2, t 3, t 4) of the virtual sound sources
from the information e 1 to e 4 of the virtual sound sources at the center point M 0. The
11-04-2019
17
comparison / determination unit 14 determines the arrival time (t 1, t 2, t 3, t 4) of each virtual
sound source, and the measured impulse responses h 1 (t), h 2 (t), h 3 (t), Arrival times (t 11, t
12, ...), (t 21, t 22, ...), (t 31, t 32, ...), (t 41) of the local peaks included in each of h 4 (t) , T 42,...)
To determine the reflected sound to be removed. The reflected sound to be removed is a reflected
sound coming from other than the direction (the direction in which the microphone is directed)
corresponding to the microphone whose impulse response is measured in a certain impulse
response.
[0048]
The flow of judgment is shown below. Here, m is the predicted number of virtual sound sources
(m = 4 in FIG. 9), and n is the number of microphones (n = 4 in FIG. 9).
[0049]
(Procedure 1) The comparison determination unit 14 determines the microphone M i (i is 1)
corresponding to the arrival direction of the virtual sound source based on the azimuth angle θ
k for each virtual sound source information e k (k is an integer of 1 or more and m or less).
Identify an integer no less than n) and select an impulse response h i (t) measured by the
identified microphone M i. Specifically, the comparison / determination unit 14 selects an
impulse response measured by a microphone whose installation direction is the azimuth angle θ
k. If the azimuth angle θ k does not coincide with the installation direction of any one of the
microphones M 1 to M n, the comparison and determination unit 14 selects an impulse response
according to either (1) or (2) below.
[0050]
(1) The comparison and determination unit 14 selects the impulse response of the microphone
whose installation direction is closer to θ k among the microphones present on both sides in the
θ k direction. In this case, the directions of the reflected sound to be measured by the
microphones M 1, M 2, M 4 and M 3 shown in FIG. 6 are, for example, azimuth angles of 0 to 45
and 315 to 360 degrees, 45 ~ 135 degrees, 135-225 degrees, 225-315 degrees.
[0051]
11-04-2019
18
(2) The comparison and determination unit 14 selects two impulse responses measured by each
of the microphones present on both sides in the θ k direction. In this case, the directions of the
reflected sound to be measured by the microphones M 1, M 2, M 4 and M 3 are 0 to 90 and 270
to 360 degrees, 0 to 180 degrees, 90 to 270 degrees, and 180 to 360 degrees. .
[0052]
(Procedure 2) In the impulse response h i (t) selected for the virtual sound source information e k
in step 1, the comparison and determination unit 14 performs t k −τ i −w / 2 ≦ t ≦ t k −τ i
+ w / 2. , Select a time t = t is at which a local peak occurs. Here, t k is the arrival time of the
virtual sound source at the center point M 0, τ i is the delay according to the distance L from
the microphone M i selected in step 1 to the center point M 0, and w is the window width .
Although the window width w is set in advance, it is desirable that the window width w be a
larger value as the distance to the adjacent microphone is longer. Also, s is the index of the local
peak in each impulse response, and t is the time when the s-th local peak occurs in the waveform
structure of the impulse response h i (t).
[0053]
(Procedure 3) The comparison and determination unit 14 repeats (Procedure 1) and (Procedure
2) for each estimated virtual sound source information, and removes local peaks not selected in
each impulse response from the impulse response. It determines that it is a reflected sound. The
reflected sound removal unit 15 removes, from each impulse response, a local peak that the
comparison and determination unit 14 has determined that the impulse response should be
removed.
[0054]
In the example shown in FIG. 9, first, the comparison / determination unit 14 determines that θ
1 representing the arrival direction of the virtual sound source information e 1 (r 1, θ 1, φ 1, t
1) is the installation direction of the microphone M 1 Therefore, the impulse response h 1 (t)
measured by the microphone M 1 is selected. The comparison determination unit 14 determines
that a local peak occurs in the range of t 1 −τ 1 −w / 2 ≦ t ≦ t 1 −τ 1 + w / 2 in the
11-04-2019
19
impulse response h 1 (t). Choose
[0055]
Next, θ 2 representing the arrival direction of the virtual sound source information e 2 (r 2, θ 2,
φ 2, t 2) is an azimuth angle between the installation direction of the microphone M 1 and the
installation direction of the microphone M 2 The virtual sound source comes from between the
microphone M 1 and the microphone M 2. The comparison determination unit 14 selects the
impulse response h 2 (t) measured by the microphone M 2 because the installation direction of
the microphone M 2 is closer to θ 2 than the installation direction of the microphone M 1 (step
1 above) Apply (1)). The comparison determination unit 14 determines that a local peak occurs
in the range of t 2 −τ 2 −w / 2 ≦ t ≦ t 2 −τ 2 + w / 2 in the impulse response h 2 (t).
Choose
[0056]
The comparison / determination unit 14 processes the virtual sound source information e 3 (r 3,
θ 3, φ 3, t 3) and e 4 (r 4, θ 4, φ 4, t 4) in the same manner, and generates an impulse
response h. At time t = t 32 in 3 (t) and time t = t 44 in the impulse response h 4 (t).
[0057]
The reflected sound removal unit 15 sets a peak P (t 11) at time t 11 of the impulse response h 1
(t), a peak P (t 22) at time t 22 of the impulse response h 2 (t), and an impulse response h 3 (t).
Local peaks other than the peak P (t 32) at time t 32 of t) and the peak P (t 44) at time t 44 of
the impulse response h 4 (t) are removed.
From this, the reflected sound removal unit 15 generates new directional impulse responses h1
'(t), h2' (t), h3 '(t) and h4' (t).
[0058]
According to the embodiment described above, it is possible to generate direction-specific
impulse responses closer to the measurement sound field than in the prior art. When using a
11-04-2019
20
plurality of microphones not on the plane, the comparison / determination unit 14 generates an
impulse response based on the arrival direction of the virtual sound source represented by
adding the elevation angle φ k as well as the azimuth angle θ k. You may choose. That is, the
comparison / determination unit 14 determines the microphones whose main axes are directed
to the direction of arrival represented by the azimuth angle θ k and the elevation angle φ k, or
the microphones present on both sides of the direction of arrival Identifies a microphone close to
the direction of arrival and selects the impulse response measured by the identified microphone.
Also, in the example shown above, although the center point M 0 and the microphone group are
arranged at equal intervals so as to surround it, this is not always necessary, and the space
surrounded by the microphone group is not necessary It is possible to estimate virtual sound
source information at any point in
[0059]
According to the embodiment described above, the impulse response generation device 1
calculates each impulse response calculated based on each of the signals obtained by the
plurality of microphones installed at different positions in the space picking up the same acoustic
signal. To estimate the information of a virtual sound source in space at any position surrounded
by a plurality of microphones. The impulse response generation device 1 arrives at an arbitrary
impulse response from a direction corresponding to a microphone from which a signal as a
source of calculation of the impulse response is obtained, based on comparison of information of
the virtual sound source and the impulse response. Determine the reflected sound. The impulse
response generation device 1 generates, from the impulse response, unnecessary reflected sound
arriving from other than the direction corresponding to the microphone from which the signal
for which the calculation of the impulse response was obtained is obtained, and generates an
impulse response according to direction. . Thereby, the impulse response generator 1 can
generate another impulse response in the direction closer to the measurement sound field than
the conventional method.
[0060]
In addition, the above-mentioned impulse response generation apparatus 1 has a computer
system inside. The process of the operation of the impulse response generator 1 is stored in a
computer readable recording medium in the form of a program, and the above process is
performed by the computer system reading and executing this program. The computer system
mentioned here includes a CPU, various memories, an OS, and hardware such as peripheral
devices.
11-04-2019
21
[0061]
The "computer system" also includes a homepage providing environment (or display
environment) if the WWW system is used. The term "computer-readable recording medium"
refers to a storage medium such as a flexible disk, a magneto-optical disk, a ROM, a portable
medium such as a ROM or a CD-ROM, or a hard disk built in a computer system. Furthermore,
“computer-readable recording medium” dynamically holds a program for a short time, like a
communication line in the case of transmitting a program via a network such as the Internet or a
communication line such as a telephone line. In this case, the volatile memory in the computer
system which is the server or the client in that case, and the one that holds the program for a
certain period of time is also included. The program may be for realizing a part of the functions
described above, or may be realized in combination with the program already recorded in the
computer system.
[0062]
DESCRIPTION OF SYMBOLS 1 Impulse response generation apparatus 11 Storage part 12
Measurement information acquisition part 13 Virtual sound source information estimation part
14 Comparison determination part 15 Reflection sound removal part
11-04-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
39 Кб
Теги
jp2017085265, description
1/--страниц
Пожаловаться на содержимое документа