close

Вход

Забыли?

вход по аккаунту

?

JPH09284899

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JPH09284899
[0001]
BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a
signal processing apparatus for localizing a sound image out of the head using binaural
headphones (including an inner earphone type) and an out-of-head sound image localization
filter.
[0002]
2. Description of the Related Art In recent years, devices that enjoy music using headphones such
as headphone stereos and portable CD players have become widespread. When the reproduced
sound of these devices is listened to by headphones, there is a problem that the reproduced
sound image is localized in the head and sounds unnaturally. In order to solve this problem, a
method using an out-of-head sound localization filter has been proposed. As an example of this
method, there is "Sound reproduction method by headphones" in Japanese Patent Publication No.
53-283. This method uses a specific sound source, for example, the spatial transfer function
between the speaker and the listener's ear, and the inverse transfer function of the real ear
headphone transfer function between the headphone and the listener's ear.
[0003]
By the way, when the listener hears the sound from the real sound source, the position of the real
10-05-2019
1
sound source does not change even if the head is rotated, so the sound image position felt by the
listener changes relatively. On the other hand, in the method using the out-of-head sound image
localization filter, when the listener rotates the head, the headphones also rotate together, so that
the position of the sound image felt by the listener does not change relatively. That is, when the
listener rotates the head, the absolute position of the sound image moves according to the
rotation of the head. As described above, there is a problem that sound image localization using
an extra-head sound image localization filter is different from the case of listening to a real sound
source.
[0004]
As a method for solving this problem, Japanese Patent Application Laid-Open No. 54-19242
"Method for preventing change in position of listening phenomenon caused by rotational
movement of head in headphone reproduction", Japanese Patent Application Laid-Open No. 3296400 There are those described in "Acoustic signal reproduction apparatus" and the like. In
these methods, the direction change of the listener's head is detected, and the spatial transfer
function is changed based on the detection result to realize the out-of-head sound image
localization similar to the real sound source.
[0005]
Hereinafter, with reference to the drawings, an example of a signal processing apparatus for
realizing conventional out-of-head sound image localization will be described. FIG. 8 is a block
diagram showing an entire configuration of a conventional signal processing apparatus, FIG. 9
shows a configuration of an angle detection unit, and FIG. 10 shows a configuration of a sound
processing unit. As shown in FIG. 8, the signal processing apparatus includes a headphone device
810, a reference signal source 811, an angle detection unit 814, an acoustic processing unit 823,
and an acoustic signal supply source 824.
[0006]
The headphone device 810 has a headband 801 for wearing on the head M of the listener P, and
supports a pair of headphone units 802L and 802R in the vicinity of the both pinna of the
listener P on the left and right. Support arms 803L and 803R are slidably attached to the
headband 801, and in order to sense the reference signal for position detection sent from the
10-05-2019
2
reference signal source 811, the pair of signal detectors 805L and 805R are support arms It is
attached to the tip part of 803L and 803R. The mounting positions of the signal detectors 805 L
and 805 R can be adjusted by the sliders 804 L and 804 R slidably mounted on the head band
801.
[0007]
The reference signal source 811 includes an ultrasonic signal source 812 and an ultrasonic
speaker 813. The ultrasonic speaker 813 transmits an ultrasonic signal output from the
ultrasonic signal source 812 as a reference signal. An ultrasonic microphone is provided in the
pair of signal detectors 805L and 805R to sense the reference signal.
[0008]
As shown in FIG. 11A, an ultrasonic wave is transmitted from the ultrasonic speaker 813 at a
predetermined level at predetermined time intervals as a reference signal for position detection.
The reference signal may be a burst wave or a level modulation wave whose level fluctuates in a
predetermined cycle, and is a signal whose phase can be detected. When the signal detectors
805L and 805R shown in FIG. 8 receive such reference signals, the signals shown in FIGS. 11 (b)
and (c) correspond to the relative positional relationship between the head direction of the
listener P and the ultrasonic speaker 813. Each detection signal having a time delay as shown is
output.
[0009]
Even when the listener P moves or rotates the head M, the signal detectors 805L and 805R
stably and accurately detect the reference signal for position detection without being behind the
listener P. Can. Since there are individual differences in the shape and size of the head M of the
listener P, it is necessary to adjust the positions of the signal detectors 805L and 805R to
correspond to the spatial positions of the headphone units 802L and 802R. In this case, the
sliders 804L and 804R are slid along the headband 801, and the signal detectors 805L and 805R
are finely adjusted to the position optimum for the detection of the reference signal.
[0010]
10-05-2019
3
Each detection signal obtained by the signal detectors 805L and 805R is supplied to the angle
detection unit 814 in FIG. The first edge detection circuit (DET) 815 to which the detection signal
of the signal detector 805L is input to the angle detection unit 814, the second edge detection
circuit 816 to which the detection signal of the signal detector 805R is input, ultrasonic waves A
third edge detection circuit 817 to which a reference signal for position detection is directly
supplied from a signal source 812 and a binaural time difference detection circuit 819 for
detecting a time difference between output signals of the first and second edge detection circuits
815 and 816, Distance calculation circuit 818 which calculates the distance between the
reference signal source 811 and the listener P based on the signals of the first, second and third
edge detection circuits 815, 816 and 817, the binaural time difference detection circuit 819 and
distance calculation Rotation angle calculation circuits 820 for detecting the head direction of
the listener P based on the output of the circuit 818 are provided.
[0011]
As shown in FIGS. 11 (d) and 11 (e), the first and second edge detection circuits 815 and 816
detect the rising edge of each detection signal by each signal detector 805L and 805R, and
Output a pulse signal. These pulse signals are output to the distance calculation circuit 818 and
the binaural time difference detection circuit 819. As shown in FIG. 11F, the third edge detection
circuit 817 detects the rising edge of the ultrasonic wave signal from the ultrasonic wave signal
source 812 and outputs a pulse signal at this edge. The pulse signal obtained by the third edge
detection circuit 817 is output to the distance calculation circuit 818.
[0012]
As shown in FIG. 11, the time difference between the pulse signal obtained by the third edge
detection circuit 817 and the pulse signal obtained by the first edge detection circuit 815 is t1.
Further, a time difference t2 between the pulse signal obtained by the third edge detection circuit
817 and the pulse signal obtained by the second edge detection circuit 816 is used. When the
difference between the time differences t1 and t2 is t3, the distance calculation circuit 818
calculates the distance between the ultrasonic speaker 813 and the listener P based on the
difference t3 and the sound velocity v. Strictly speaking, this distance indicates the distance 10
between the center of the head M of the listener P and the ultrasonic speaker 813 as shown in
FIG. The sound velocity v may be set in advance in the distance calculation circuit 818 as a
constant, or may be changeable in accordance with fluctuations in temperature, humidity,
10-05-2019
4
atmospheric pressure, and the like. Furthermore, when calculating the distance l0, correction
may be made based on the positional relationship between the signal detectors 805L and 805R
and the center of the head M, or the shape or size of the head M.
[0013]
Signals indicating these distances l 0 and time differences t 1 and t 2 are given to the rotation
angle calculation circuit 820. The binaural time difference detection circuit 819 calculates a
difference value t3 from the time differences t1 and t2 and outputs the difference value t3 to the
rotation angle detection circuit 820.
[0014]
The rotation angle detection circuit 820 uses the time differences t1 and t2, the difference value
t3, the distance l0, the sound velocity v, and the radius r of the head M to calculate the head
direction angle θ0 as shown in FIG. The angle θ 0 can be calculated by the following equation
(1).
[0015]
Then, the position of the ultrasonic speaker 813 is taken as the reference position of the virtual
sound source 821. The rotation angle information of the head M of the listener P obtained by the
rotation angle calculation circuit 820 is output to the sound processing unit 823 in FIG.
[0016]
An acoustic signal supply source 824 shown in FIG. 8 is connected to the acoustic processing
unit 823, from which acoustic signals SL and SR of the left channel and the right channel are
respectively supplied. The acoustic signal supply source 824 is a device that outputs the acoustic
signals SL and SR, and is, for example, various disk reproducing devices, tape reproducing
devices, radio wave receiving devices, or the like.
10-05-2019
5
[0017]
The sound processing unit 823 is a circuit that provides a transfer characteristic from the virtual
sound source 821 to the listener's P ears to the sound signals SL and SR of the left channel and
the right channel supplied from the sound signal source 824. . As shown in FIG. 10, the sound
processing unit 823 includes a memory 825 storing a large number of transfer functions, and
first to fourth signal processing circuits 826 a for adding specific transfer functions read from
the memory 825 to the sound signals SL and SR. , 826b, 826c, 826d, and the acoustic signal EL
for the left ear and the acoustic signal ER for the right ear from the acoustic signals subjected to
the signal processing by the first to fourth signal processing circuits 826a to 826d The first and
second signal adders 827L and 827R are provided.
[0018]
As shown in FIG. 13, a pair of left channel and right channel speaker devices installed facing the
listener P are virtual sound sources SpL and SpR. From these virtual sound sources SpL and SpR,
the impulse response of the sound output reaching the left ear YL and the right ear YR of the
listener P is measured at predetermined rotation angles according to the movement of the head
M of the listener P. And the transfer function of only the direct sound determined by excluding
the characteristics of the speaker device used for the measurement and the characteristics of the
sound field at the time of measurement are hLL (t, θ), hLR (t, θ), hRL (t, θ) , HRR (t, θ). It is
assumed that these transfer functions are stored in advance in the memory 825 using the head
rotation angle θ as an address.
[0019]
Then, using the head rotation angle θ obtained by the angle detection unit 814 as a read
address, the transfer functions hLL (t, θ), hLR (t, θ), hRL (t, θ), hRR (t, θ) from the memory
825 Read out the group data of). The transfer functions hLL (t, θ), hLR (t, θ), hRL (t, θ), and
hRR (t, θ) read from the memory 825 are respectively the first to fourth signal processing
circuits 826a. , 826b, 826c, 826d.
[0020]
10-05-2019
6
That is, in the first signal processing circuit 826a, a transfer function hRR (t, θ) indicating an
impulse response of direct sound to the right ear when the sound signal SR of the right channel
is reproduced is set. In addition, in the second signal processing circuit 826b, a transfer function
hRL (t, θ) indicating an impulse response of direct sound to the left ear when the sound signal
SR of the right channel is reproduced is set. Further, in the third signal processing circuit 826c, a
transfer function hLR (t, θ) indicating an impulse response of direct sound to the right ear when
the acoustic signal SL of the left channel is reproduced is set. In addition, in the fourth signal
processing circuit 826 d, a transfer function hLL (t, θ) indicating an impulse response of direct
sound to the left ear when the sound signal SL of the left channel is reproduced is set.
[0021]
The acoustic signal SR output from the acoustic signal source 824 is sent to the first and second
signal processing circuits 826a and 826b, respectively. The first signal processing circuit 826 a
subjects the acoustic signal SR to convolution processing for providing an impulse response of
the transfer function hRR (t, θ). Further, the second signal processing circuit 826b subjects the
acoustic signal SR to convolution processing for providing an impulse response of the transfer
function hRL (t, θ).
[0022]
Similarly, the acoustic signal SL output from the acoustic signal source 824 is sent to the third
and fourth signal processing circuits 826c and 826d, respectively. The third signal processing
circuit 826c subjects the acoustic signal SL to convolution processing for providing an impulse
response of the transfer function hLR (t, θ). Further, the fourth signal processing circuit 826 d
subjects the acoustic signal SL to convolution processing for providing an impulse response of
the transfer function hLL (t, θ).
[0023]
Next, the output signals of the first and third signal processing circuits 826a and 826c are
supplied to the right side adder 827R and added to each other. The output signal of the adder
827R is sent to the right headphone unit 802R as a right ear acoustic signal ER via the right
amplifier 828R and reproduced. Similarly, the output signals of the second and fourth signal
processing circuits 826b and 826d are supplied to the left adder 827L and added to each other.
10-05-2019
7
The output signal of the adder 827L is sent to the headphone unit 802L on the left as the
acoustic signal EL for the left ear via the amplifier 828L for the left and reproduced.
[0024]
In such a signal processing apparatus, the current head angle information calculated by the
rotation angle calculation circuit 820 is read, and this is used as an address from the memory
825 to the specific transfer functions hLL (t, θ), hLR (t, The set data of θ), hRL (t, θ), and hRR
(t, θ) are extracted. Then, based on these transfer functions, the sound processing unit 823
performs signal processing corresponding to the movement of the listener P and the change of
the transfer function accompanying the rotation of the head. In this way, when the listener P
wears the headphone device 810, it is possible to obtain an out-of-head localization feeling and a
front localization feeling in which the virtual sound source does not move. That is, a sense of
reality can be obtained such that the listener P can reproduce an acoustic signal from a pair of
speaker devices installed facing the listener P without wearing the headphone device 810.
[0025]
However, in such a conventional configuration, there is a problem that it is necessary to secure a
very large capacity of the memory 825 in order to store a large number of sets of transfer
functions. That is, if transfer functions are stored in the memory at intervals of 0.36 degrees, for
example, and the angular intervals of the transfer functions are increased at this interval, up to
about 1.5 degrees, the sense of hearing is almost the same. And if it is more than 2.0 degrees, the
difference in hearing will be understood. Therefore, the transfer functions to be stored need to be
1.5 degrees apart.
[0026]
360
In order to correspond to all directions of the degree, 240 transfer functions must be stored.
Assuming that the impulse response of the direct sound converges in about 5 ms, if the sampling
frequency is 44.1 kHz which is the same as that of the CD etc., the impulse response requires
220 samples. When the impulse response is quantized in 16 bits, the amount of memory
required to store transfer functions from one sound source to each of the left and right ears is
about 1 Kbyte, and the memory required to store 240 sets of transfer functions The total amount
10-05-2019
8
of is 240 K bytes.
[0027]
The present invention has been made in view of such conventional problems, and in the state
where the headphone apparatus is attached to both ears, the listener localizes and listens to the
sound image outside the head and changes the head direction. It is an object of the present
invention to realize a signal processing device capable of not changing the position of a sound
source and greatly reducing the capacity of a memory storing transfer functions.
[0028]
SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the invention
according to claim 1 measures transfer functions from a sound source to the listener's ears for a
plurality of angles, and measures the transfer functions of the measured transfer functions. A
first storage unit for storing a plurality of basis functions obtained by the principal component
analysis and an average transfer function independent of the head direction of the listener when
the principal component analysis is performed; A second storage unit for storing a weighting
coefficient group of each of the held basis functions; and processing of the input sound signal by
the plurality of basis functions held in the first storage unit; and holding in the second storage
unit Signal processing means for controlling the position of the sound image by weighting and
adding the processed signal with the weighting factor as described above.
[0029]
According to such a configuration, when the listener wears the headphone device and listens to
the sound, the input sound signal is processed by a plurality of basis functions, and weighted
addition is performed on the converted output of each of the basis functions.
In this way, a predetermined transfer function can be realized using less storage means data.
Thus, the sound image is localized outside the head according to the head angle of the listener.
[0030]
In the invention according to claim 2 of the present application, the transfer function from the
sound source to the listener's ears is measured at a plurality of angles, and the principal
10-05-2019
9
component analysis of the measured transfer function is performed. Storage means for storing a
plurality of basis functions to be selected and an average transfer function independent of the
head direction of the listener; weighting coefficients of each basis function held in the first
storage means are stored in the listener Second storage means for storing in correspondence
with the head direction, angle detection means for detecting the head direction of the listener as
a rotation angle from a virtual sound source, and a plurality of bases held in the first storage
means Processing the input acoustic signal by a function, reading out the weighting factor stored
in the second storage means based on the rotation angle of the head detected by the angle
detection means, weighting and adding the processing signals Position the sound image It is
characterized in that it comprises signal processing means for Gosuru, the.
[0031]
According to the third aspect of the present invention, when the transfer function from the sound
source to the listener's ears is measured at a plurality of angles and principal component analysis
of the measured transfer function is performed, the principal component analysis is obtained.
Storage means for storing a plurality of basis functions to be selected and a plurality of average
transfer functions independent of the head direction of the listener, and a weighting coefficient
group of each basis function held in the first storage means Storage means corresponding to the
head direction of the person, selection means for selecting a specific average transfer function
from the first storage means, and a rotation angle from the virtual sound source of the listener's
head direction Processing the input sound signal by means of angle detection means for
detecting as and the plurality of basis functions held in said first storage means, said second
memory based on the head rotation angle detected by said angle detection means The weighting
factor held in the means Out look, by adding weighted the processed signal, and is characterized
in that it comprises a signal processing means for controlling the sound image position.
[0032]
In the invention according to claim 4 of the present application, the first storage means converts
the transfer function measured in advance for a large number of listeners into a feature
parameter vector corresponding to human auditory characteristics, and then performs clustering.
The present invention is characterized by storing a plurality of average transfer functions which
are generated from data collected and aggregated into decimals.
[0033]
Further, in the invention according to claim 5 of the present application, the signal processing
means is configured to execute the first to main component analysis according to the ith (i = 1 to
n ordinal) weighting coefficient group held in the first storage means. n convolution means for
realizing the basis function of i and n right sides for multiplying the output of the i-th convolution
10-05-2019
10
means with the ith right weighting coefficient held in the second storage means Multiplying
means, n left multiplying means for multiplying the output of the i-th convolution means by the ith left weighting coefficient held in the second storage means, and the n right multiplying means
The right side adding means for generating the right side converted acoustic signal by adding the
respective outputs of the above, and the left side adding means for generating the left side
converted acoustic signal by respectively adding the outputs of the n left side multiplying means
It is said that.
[0034]
According to such a configuration of claims 2 to 4, when the listener wears the headphone device
and listens to the sound, the input sound signal is processed by a plurality of basis functions.
Then, when the rotation angle of the listener's head is detected by the angle detection means, the
weighting factor of the basis function corresponding to the angle is selected, and the weighted
output is added to the converted output of each basis function.
In this way, a predetermined transfer function can be realized using less storage means data.
Thus, the sound image is localized outside the head according to the head angle of the listener.
[0035]
In particular, according to the third and fourth aspects of the present invention, it is possible to
easily obtain the out-of-head sound image localization most suitable for the user by preparing a
plurality of average transfer functions and selecting the one most suitable for the listener. Can.
[0036]
DESCRIPTION OF THE PREFERRED EMBODIMENTS (First Embodiment) A signal processing
apparatus according to a first embodiment of the present invention will be described using
formulas and drawings.
10-05-2019
11
FIG. 1 is a block diagram showing the configuration of the signal processing apparatus according
to the first embodiment, and shows the case where one input signal, ie, one sound image, is
localized outside the head.
[0037]
Referring to FIG. 1, the signal processing apparatus according to the present embodiment
includes a first storage unit 101, a second storage unit 102, an angle detection unit 103, and a
signal processing unit 105.
The first storage means 101 is a storage means for storing a set of basis functions obtained by
principal component analysis of these transfer functions when the transfer functions from the
sound source position to the listener's ears are measured. The second storage unit 102 is a
storage unit that stores, for each head direction, a weighting coefficient group for each basis
function held by the first storage unit 101. The angle detection means 103 detects the head
direction (head rotation angle) of the listener with respect to the sound source position, and the
angle information is given to the second storage means 102.
[0038]
The signal processing means 105 is a means for performing signal processing of the sound signal
localization outside the head of the sound signal using the plurality of basis functions held in the
first storage means 101 when the sound signal is inputted through the input terminal 104. is
there. The signal processing means 105 comprises, for example, six convolution means 106a to
106f, right channel multipliers 107a to 107f, left channel multipliers 107g to 107l, right channel
adder 108a, and left channel adder 108b.
[0039]
The acoustic signals inputted from the input terminal 104 are respectively given to the
convolution means 106a to 106f. The convolution means 106 a is means for performing a
convolution operation of an acoustic signal using the average basis function BF-0 stored in the
first storage means 101. Similarly, the convolution means 106b uses the first basis function BF-1
stored in the first storage means 101, the convolution means 106c uses the second basis
10-05-2019
12
function BF-2, and the convolution means 106d uses the third basis function. Using the basis
function BF-3, the convolution means 106e uses the fourth basis function BF-4, and the
convolution means 106f uses the fifth basis function BF-5 to perform a convolution operation.
[0040]
The multiplier 107 a is a circuit that multiplies the output of the convolution means 106 a by a
specific weighting coefficient in the right channel held in the second storage means 102.
Similarly, the multipliers 107b, 107c, 107d, 107e and 107f respectively multiply the outputs of
the convolution means 106b, 106c, 106d, 106e and 106f by the respective specific weighting
coefficients held in the second memory means 102. It is. The outputs of the multipliers 107a to
107f are added by the adder 108a and output through the output terminal 109a of the right
channel.
[0041]
The multipliers 107g, 107h, 107i, 107j, 107k and 107l respectively output the outputs of the
convolution means 106a, 106b, 106c, 106d, 106e and 106f to specific weights in the left
channel held in the second memory means 102. It is a circuit which multiplies by a coefficient.
The outputs of the multipliers 107g to 107l are added by the adder 108b and output through the
output terminal 109b of the left channel.
[0042]
The output signals of the output terminals 109a and 109b are input to the left and right
headphone units of the headphone device through the amplifier as in the conventional example.
Further, the details of the angle detection means 103 are the same as those of the conventional
example, and thus the description thereof is omitted here.
[0043]
In the signal processing apparatus according to the first embodiment configured as described
above, a method of determining a basis function by principal component analysis will be
10-05-2019
13
described first. The measurement of the head related transfer function to be subjected to
principal component analysis can be performed by the cross spectrum method, the time
extension pulse method, or the like. These methods are described, for example, in the document
"Hidaka et al.," Measurement method of impulse response, Material of the Acoustical Society of
Japan Acoustics, Material No. AA-89-14 "," Suzuki et al., "Consideration on design method of time
stretching pulse" It is described in detail in Technical Report of the Institute of Information and
Communication Engineers, Material No. EA 92-86.
[0044]
To set the transfer function, the subject sits in the middle of the anechoic chamber. A test sound
is emitted from one of the small speakers arranged on a semicircle of a horizontal plane centered
on the subject. The tympanic response of the subject is obtained as the output of the probe
microphone. The tip of the probe microphone is fixed several mm from the tympanic membrane.
The resulting probe microphone output is converted to a ratio to the input signal in the case of
the cross spectrum method and inverse filtered in the case of the time-stretched pulse method.
As a result, a head related transfer function (impulse response) is obtained. This head-related
transfer function is measured at a horizontal angle (in steps of 1.5 degrees) in the 240 direction.
[0045]
Next, principal component analysis of head related transfer functions will be described. Principal
component analysis is a statistical method that effectively represents a set of correlated
measurement values. The central idea of principal component analysis is to reduce the
dimensions while maintaining as much as possible the variation that the data represents, for a
large set of interrelated measurements. Principal component analysis is characterized by
decomposing an amplitude spectrum into basis functions, as Fourier analysis is a method of
decomposing a waveform into sin and cos components. This basis function is considered as the
basic spectral shape for assembling each spectrum.
[0046]
Each spectrum is approximated by weighted addition of these basis functions. Thus, the
weighting factors serve to determine the relative contribution of each basis function (basic
spectral shape) to a given spectrum. The sum of all the weightings for each amplitude spectrum
10-05-2019
14
is called "principal component" PC. The first basis function (BF-1) and its weight (PC-1) described
above capture the main part of the common variation represented in the data. The remaining
basis functions (BF-2, BF-3,...) And their weightings (PC-2, PC-3,. Is increasing. The number of
basis functions required to fully express data is closely related to the degree of redundancy or
correlation contained in the data. The greater the degree of redundancy, the less the number of
basis functions needed.
[0047]
The procedure for deriving the basis function BF and the principal component PC from the
amplitude data 480 of the head related transfer functions of the left and right ears of the subject
will be described. 0.2 Log magnitudes of 150 frequency components in the range of -15 kHz are
used for analysis. Before performing principal component analysis, the log magnitude function of
the average head-related transfer function in the 240 direction is calculated for the subject's
binaural measurements. The average head-related transfer function is subject-dependent,
direction-independent, and includes spectral features shared by the 240 head-related transfer
functions recorded from each ear. This includes, for example, an ear canal resonance of about 2.5
kHz. To eliminate these effects, the average head-related transfer function is subtracted from
each head-related transfer function. Elimination of the average head transfer function yields 480
log magnitude functions that represent primarily direction dependent spectral effects. These
functions will be referred to as "direction transfer functions" in order to distinguish them from
head-related transfer functions that include both spectral effects of direction-dependent and nondependent components (e.g., ear canal resonance). Here, 480 directional transfer functions are
the subject of principal component analysis.
[0048]
The first step of principal component analysis is the calculation of the frequency covariance
matrix. These covariances provide a measure of the similarity of the 480 directional transfer
functions to each frequency pair. The covariance S for the frequency pair (i, j) is given by the
following equation (2). Here, n is the total number of transfer functions (480 in the case of this
embodiment), p is the total number of frequencies (150 in the case of this embodiment), and Dki
is the logarithmic amplitude of the i-th frequency of the transfer function in the k direction. is
there. Here, k indicates the sound source direction, but in order to correspond to the left and
right ears, let 1 to 240 be the left and 241 to 480 be the right. That is, it is assumed that 1 is
data of the left ear when the sound source is the front, and 241 is data of the right ear. Then, the
data of 1.5 degrees in the counterclockwise direction is 2,242, and the data of 3 degrees is
10-05-2019
15
3,243.
[0049]
It is a basis vector (more precisely, it is a basis vector because it is expressed discretely). In the
description of the matrix operation, cq) using expressions called basis vectors) is q eigenvectors
corresponding to q eigenvalues of the covariance matrix. For a given direction transfer function,
weighting wk that represents the contribution of each basis function to the direction transfer
function is given by the following equation (3). Here, C is a matrix in which each column is a
basis vector, and C ′ is an inverse matrix of C. dk is an amplitude vector of the k-th directional
transfer function. If the equation (3) is rearranged, the amplitude vector dk of the direction
transfer function becomes equal to the weighted addition of the basis vectors. As shown in
equation (4), each direction transfer function is reconstructed by linear combination of basis
vectors. However, reconstruction of the entire head related transfer function requires adding an
average transfer function to the reconstructed directional transfer function.
[0050]
How many details of the direction transfer function to be reconstructed are reproduced depends
on the number of basis functions used for the reconstruction. The determination of how many
basis functions to use is based on how many changes in the original data are to be restored. Here,
the number of basis functions necessary to reconstruct the amplitude spectrum of the directional
transfer function within about 90% is selected. According to this criterion, the purpose is
achieved by using five basis functions. FIG. 2 shows an example of the five basis functions
described above.
[0051]
The basis function BF and the principal component PC derived as described above are stored in
the first storage unit 101 and the second storage unit 102 of FIG. 1, respectively. In addition, the
average head related transfer function is referred to as BF-0 for convenience, and the weighting
coefficient is referred to as PC-0 for convenience. These data are also stored in the first storage
unit 101 and the second storage unit 102, respectively. As shown in FIG. 3, in the second storage
means 102, data of PC-0 to PC-5 of the left ear in the front direction (0 degree angle) are stored
at addresses # 0 to # 5, and addresses # 6 to # 5. The data of PC-0 to PC-5 of the right ear are
10-05-2019
16
stored in 11. Similarly, data of PC-0 to PC-5 on the left and right at 1.5 degrees are stored at
addresses # 12 to # 23.
[0052]
The basis functions BF-0 to BF-5 stored in the first storage means 101 are sent to the
convolution means 106a to 106f and are convoluted with the input acoustic signal. The detection
result of the angle detection means 103 is quantized in 1.5 degree steps. For example, an angle
within ± 0.75 degrees is 0 degrees. Then, a value obtained by multiplying the quantized angle by
12 becomes the data read start address of the second storage means 102, and 12 weighting
coefficients are read out. The read weighting factors are sent to the multipliers 107a to 1071 to
weight the convolution results of the convolution means 106a to 106f. The multiplication results
of the multipliers 107a to 107f are added by the adder 108a and output from the output
terminal 109a of the right channel. The multiplication results of the multipliers 107g to 107l are
added by the adder 108b and output from the output terminal 109b of the left channel.
[0053]
As described above, in the first embodiment, the input acoustic signal is processed with a
plurality of basis functions BF, the head detection angle θ of the listener P is detected by the
angle detection means 103, and the signal processed with the basis functions BF is detected
angle By weighting and adding with the weighting factor wk of the basis function corresponding
to 音響, an acoustic signal in which an out-of-head sound image localization is performed is
obtained. Moreover, in the present embodiment, the capacity of storage means for storing
transfer functions can be significantly reduced, as compared to the case of having data of a large
number of transfer functions in each direction. That is, the storage capacity of the transfer
function is 240 K bytes in the conventional example, but in the present embodiment, the basis
function is 1.5 K bytes (6 x 0.25 K bytes), and the weighting coefficient is 5.8 K bytes (12 x 2
bytes x) 240 directions), total 8.7 K bytes. Furthermore, every time the angle of the listener's
head changes, the data transferred from the second storage means 102 to the signal processing
means 105 is only 24 bytes, which is 1/40 or less compared to 1 K bytes in the conventional
example. There are also excellent effects in terms of points.
[0054]
10-05-2019
17
Second Embodiment Next, a signal processing apparatus according to a second embodiment of
the present invention will be described. The first embodiment has dealt with the case where the
input signal is one channel, but the second embodiment is directed to the case where two
channels are input. FIG. 4 is a block diagram showing the configuration of the signal processing
apparatus in the case of corresponding to two-channel input. The same parts as in the first
embodiment are assigned the same reference numerals and detailed explanations thereof will be
omitted. As shown in FIG. 4, in the present embodiment, two systems of signal processing means
105a and 105b are provided, and after processing the input acoustic signals of two channels
independently, the output signals of the right and left channels are added to each signal. Output
signal.
[0055]
The first storage means 101 is a storage means for storing a set of basis functions obtained by
principal component analysis of the transfer function when the transfer function from the sound
source position to the listener's ears is measured. The second storage means 102 is a storage
means for storing a weighting factor group of basis functions based on the output data of the
angle detection means 103. The angle detection means 103 is a means for detecting the head
rotation angle of the listener with respect to the sound source position. The signal processing
means 105 a is means for converting an acoustic signal according to a specific basis function of
the left channel held in the first storage means 101 when an acoustic signal is inputted from the
left input terminal 104 a. The signal processing means 105 b is means for converting an acoustic
signal according to a specific basis function of the right channel held in the first storage means
101 when an acoustic signal is inputted from the right input terminal 104 b.
[0056]
The adder 410a adds the outputs of the signal processing means 105a and 105b, and outputs an
acoustic signal of the left channel from the left output terminal 411a. Similarly, the adder 410b
adds the outputs of the signal processing means 105a and 105b, and outputs an acoustic signal
of the right channel from the right output terminal 411b.
[0057]
With such a configuration, the signal processing units 105a and 105b receive a plurality of basis
10-05-2019
18
functions held in the first storage unit 101 and the second storage unit 102 when the head
rotation angle is given by the angle detection unit 103. Weighted addition of the input acoustic
signal is performed using the held combination data of specific weighting factors. In the present
embodiment, the same operation as in the first embodiment is performed except that the
weighting coefficient of the second storage unit 102 is selected from the output of the angle
detection unit 103, so the description of these parts is omitted.
[0058]
Here, a method of selecting the weighting coefficient of the second storage unit 102 from the
output of the angle detection unit 103 will be described. Usually, in the case of reproducing the
audio signals of the two left and right channels by the two-channel speakers, as shown in FIG. 13,
as shown by virtual sound sources SpL and SpR, at other vertices of a triangle whose listener P is
at one vertex. A speaker is arranged. Since the angle detection means 103 in FIG. 4 detects the
angle of the listener P with respect to the front direction, the sound source direction must be
corrected when the speaker arrangement shown in FIG. 13 is assumed.
[0059]
For example, when the detection result of the angle detection means 103 is 0 degrees, the left
input signal is 30 degrees, and the right signal is -30 degrees (= 330 degrees). Therefore, when
weighting factors are stored in the second storage means 102 at intervals of 1.5 degrees, the
signal processing means 105a for processing the signal of the left channel receives 12 weights
from the address 240 (= 12 × 30 / 1.5). Coefficients are transferred. Further, 12 weighting
coefficients are transferred from the address 2640 (= 12 × 330 / 1.5) to the signal processing
means 105 b for processing the signal on the right channel.
[0060]
Third Embodiment Next, a signal processing apparatus according to a third embodiment of the
present invention will be described. Assuming an unspecified number of listeners, the head shape
and the pinna shape of each listener are different, so individual differences occur in the transfer
function, and processing corresponding to these differences is required. The signal processing
device of the present embodiment measures the head-related transfer function for a large
number of subjects. Then, a typical average transfer function is stored separately, and when the
10-05-2019
19
listener wears the headphone device, the average transfer function suitable for him / her is
selected.
[0061]
FIG. 5 is a block diagram showing the configuration of a signal processing apparatus having such
an effect. In FIG. 5, providing the first storage unit 101, the second storage unit 102, the angle
detection unit 103, and the signal processing unit 105 in the signal processing apparatus is the
same as the first embodiment. The first storage unit 101 is a unit that measures the transfer
function from the sound source position to the listener's ears and stores the basis function group
obtained by principal component analysis of the transfer function and a plurality of average
transfer functions. The selection means 510 is a means for the listener to select one of a plurality
of average transfer functions held in the first storage means 101.
[0062]
The second storage means 102 is means for storing a set of weighting coefficients of the basis
function based on the output data of the angle detection means 103. The angle detection means
103 is a means for detecting the head rotation angle of the listener with respect to the sound
source position. When an acoustic signal is input from the input terminal 104, the signal
processing unit 105 processes the acoustic signal by the plurality of basis functions held in the
first storage unit 101 and the average transfer function selected by the selection unit 510. These
are means for weighted addition of the sound signal by the weighting factor of the basis function
at the head rotation angle indicated by the angle detection means 103. The output signal of the
signal processing means 105 is outputted from the left output terminal 109a and the right
output terminal 109b.
[0063]
The operation of the signal processing apparatus of this embodiment configured as described
above will be described. First, a method of classifying subjects for selecting a representative
average transfer function will be described. Let SLdi (t) and SRdi (t) be transfer functions in the
direction d measured for a certain individual i. Here, L indicates the left ear and R indicates the
right ear. When the transfer function is represented by an FFT complex spectrum of order n, 2n
points of information are required.
10-05-2019
20
[0064]
The left and right transfer functions of m individual directions are regarded as vectors of order L
= m * 2 (n + 1), N vectors are classified into k clusters by, for example, LBG method, and Let the
center of gravity be the representative transfer function. In the LBG method, N vectors in L
dimensions are first divided into two clusters. This division method is divided by a hyperplane
which is orthogonal to the eigenvector giving the largest eigenvalue on the L-dimensional vector
space and passes through the center of gravity. However, the hyperplane used for this division
does not have to be exact, and even if it is properly determined, it is salvaged by the subsequent
optimization processing thereafter. With the two newly obtained clusters, the respective
centroids are determined, and the clusters are corrected in such a way that a vector closer to one
of the centroids is included in the cluster. Cluster division is optimized by sequential processing
of finding the center of gravity of the corrected cluster again. By repeating the above procedure
for the new cluster optimized in this way, k clusters are obtained.
[0065]
The clustering procedure according to the LBG method is described in detail, for example, in the
document "Joho Makhol, Salim Poucos and Herbertgish" Vector Quantization in Speech Coding
"Proceeding of the IEEE, Vol 73, No. 11, Nov. 1985".
[0066]
This method is a method of mechanically classifying the distribution of vectors according to the
distance scale without considering the relationship between the physical characteristics of the
transfer function and the perception, but as a result, the classification corresponding to the
perception is performed sufficiently There is a nature.
However, in general, the dimension L of the vector is about 8 to 10, and many distances of very
high order (for example, L = 4020 when m = 4 and n = 8) vectors are calculated, and in some
cases, the eigenvalues and eigenvectors are It is not an efficient method because it also
calculates.
[0067]
10-05-2019
21
Therefore, in order to reduce the amount of calculation and the memory size to realistic values,
clustering is performed after extracting parameters related to the perception of distance and
direction. A clue in which a person perceives the direction and distance of a sound source is
considered to be different depending on the direction. When there is a sound source on the front,
the same waveform is added to the left and right ears. In this case, the clue of the direction or
distance is the power spectrum. Therefore, in the present embodiment, subjects having
representative transfer functions are selected by clustering transfer functions in the front
direction.
[0068]
The power spectrum can be represented, for example, by linear prediction (LPC) analysis with
few parameters. When using LPC cepstrum coefficients that are said to correspond well with
perception, orders of the order of 12 are sufficient. The calculation method is to solve the normal
equation from the autocorrelation coefficient of the frontal transfer function SL0i (t), to obtain
the linear prediction coefficient αj (j = 0, ..., J), and then to calculate the property of the
logarithmic spectrum of z domain When used, the cepstrum coefficients cj (j = 0,..., J) can be
obtained sequentially. This method is described in detail in the document "Akira Furui; Digital
Speech Processing Tokai University Press".
[0069]
As an example, vector data of N 12th-order LPC cepstrums are classified into k clusters by the
above-mentioned LBG method. Even with N = 200 and k = 8, the computational complexity is
small. Although cepstrum is used as a parameter in consideration of the property that human
perception is sensitive to the peak of the power spectrum, other LPC parameters such as vocal
tract reflection coefficient, log area ratio, or linear prediction coefficient etc. Needless to say,
clustering can be performed using the FFT power spectrum, and the effects can be obtained. As
described above, k subjects having k representative transfer functions can be selected.
[0070]
Next, for a subject having a representative transfer function, the transfer function in the 240
10-05-2019
22
direction is measured as in the first embodiment. Then, for each subject's binaural measurement
values, the logarithmic amplitude function of the average head related transfer function in the
240 direction is calculated. The average head related transfer function is subject-dependent and
direction-independent. The mean head transfer function then contains the spectral features
shared by the 240 head transfer functions recorded from each ear, ie individual differences.
Therefore, the average head related transfer function for each subject is stored in the first
storage unit 101 as BF-0i (i = 1, 2... K).
[0071]
On the other hand, in order to remove the influence of individual differences, the average head
related transfer function for each subject is subtracted from the head related transfer function of
each subject. By removing the average head transfer function, we obtain 480 log magnitude
functions that represent primarily direction dependent spectral effects. These functions are called
direction transfer functions. Since 480 directions are required for k subjects, a total of (480 × k)
transfer functions are the subject of principal component analysis. The method of principal
component analysis is the same as that of the first embodiment, and thus will not be described
here. As described above, the plurality of average transfer functions stored in the first storage
unit 101 and the basis functions obtained by the principal component analysis are determined.
[0072]
Next, a method for the listener to select an average transfer function suitable for the user from
among the plurality of average transfer functions stored in the first storage unit 101 will be
described. FIG. 6 is a view showing the appearance of the selection means 510. As shown in FIG.
This selection means 510 is provided in the middle of the headphone cable so that the listener
can easily operate it. The headphone cable contains signal lines for the right and left ears, and
signal lines for selecting the average transfer function. In the figure, a push switch 602 and a
display unit 603 are provided in the main body 601 of the selection means 510.
[0073]
Next, the procedure of using the selection means 510 will be described. FIG. 7 is a PAD (Program
analysis diagram) diagram showing the procedure of use, and is a problem analysis diagram used
for program development. Prior to the selection, the first average transfer function BF-01 is
10-05-2019
23
downloaded to the signal processing means 105 as an initial value from the first storage means
101 of FIG. At this time, the mode of the headphone device is set to the reproduction mode, and
in this mode (step 1), when the push switch 602 is kept pressed for 2 seconds or more, the mode
is selected. A typical average transfer function is displayed in this step. Then select a specific
average transfer function.
[0074]
For example, the listener first listens to the sound processed by the first average transfer
function. At this time, the display unit 603 displays "1" (step 2). Next, the listener operates the
push switch 602 (step 3). Every time push switch 602 is pressed within 1 s, second average
transfer function BF-02,..., K-th average transfer function BF-0 k are sequentially selected, and
these data are stored in the first storage means 101 is downloaded to the signal processing
means 105, and the sound processed by the average transfer function is presented. At this time,
the display unit 603 sequentially displays “1”, “2”,.
[0075]
In this way, the listener looks for the average transfer function that is most successfully localized
outside the head. When the listener determines the average transfer function, the sound
processed by the average transfer function is output. At this time, if the push switch 602 is
continuously pressed for 2 seconds or more, the average transfer function is determined and the
reproduction mode is set. Then, the display unit 603 displays the index number of the selected
average transfer function.
[0076]
When the push switch 602 is kept pressed for 2 seconds or more in the reproduction mode, the
selection mode is set again, and the average transfer function is selected. Thus, the listener can
select an average transfer function suitable for him / her from among a plurality of average
transfer function candidates. As described above, according to the present embodiment, a
plurality of average transfer functions are prepared, and the listener selects the most suitable one
for him / her, whereby the out-of-head sound image localization most suitable for the user can be
obtained.
10-05-2019
24
[0077]
As described above, according to the present invention, the input sound signal is processed with
a plurality of basis functions, the angle detection means detects the head rotation angle of the
listener, and the input signal processed with the basis functions is processed. By performing
weighting addition with a basis function weighting coefficient corresponding to the detected
angle, a signal in which the sound image is localized outside the head is obtained. In particular, as
compared with the case of having the transfer function in each direction, the effect of being able
to reduce the capacity of the storage means for storing the transfer function can be obtained.
Furthermore, there is a practically excellent effect that data transferred from the second storage
means to the signal processing means can be significantly reduced whenever the angle of the
listener's head changes.
[0078]
Brief description of the drawings
[0079]
1 is a block diagram showing the configuration of a signal processing apparatus according to the
first embodiment of the present invention.
[0080]
2 is a frequency characteristic diagram of five basis functions used in the signal processing
means of the first embodiment.
[0081]
3 is a memory map showing data arrangement in the second storage means of the first
embodiment.
[0082]
4 is a block diagram showing the configuration of a signal processing apparatus according to a
second embodiment of the present invention.
[0083]
10-05-2019
25
5 is a block diagram showing a configuration of a signal processing apparatus according to a
third embodiment of the present invention.
[0084]
6 is an external view of the selection means in the third embodiment.
[0085]
7 is a PAD showing the selection procedure of the average transfer function in the third
embodiment.
[0086]
8 is a block diagram showing the overall configuration of a signal processing apparatus of the
prior art example.
[0087]
9 is a block diagram of an angle detection unit in the signal processing apparatus of the
conventional example.
[0088]
10 is a block diagram of the sound processing unit in the signal processing apparatus of the
conventional example.
[0089]
11 is a time chart of each signal showing the operation principle of the angle detection unit.
[0090]
12 is a schematic diagram showing the geometrical relationship between the head direction of
the listener and the virtual sound source.
[0091]
13 is a sketch showing the relative positional relationship between the sound source and the
listener.
10-05-2019
26
[0092]
Explanation of sign
[0093]
DESCRIPTION OF SYMBOLS 101 1st memory means, 102 2nd memory means, 103 angle
detection means, 104 audio signal input terminal 105 signal processing means 109a, 411a left
output terminal 109b, 411b right output terminal 106a-106f convolution means 107a-107l
Multipliers 108a to 108b, 410a, 410b Adder 510 Selection means 601 Main body of selection
means 602 Push switch 603 Display portion
10-05-2019
27
Документ
Категория
Без категории
Просмотров
0
Размер файла
44 Кб
Теги
jph09284899
1/--страниц
Пожаловаться на содержимое документа