close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2008022069

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2008022069
PROBLEM TO BE SOLVED: To provide a voice recording apparatus capable of picking up voices
of respective speakers without being affected by reverberation. SOLUTION: A voice recording
device 1 according to the present invention is different from a reference signal generating unit 5
which generates a reference signal to be input to a speaker 15 which is a voice generating unit
placed in a sound source. An impulse response storage unit 6 storing voice signals collected by a
plurality of voice collecting units placed at positions and synchronized in time, the voice signal
stored in the impulse response storage unit 6, and the reference A transfer function calculating
unit 7 for obtaining a transfer function from the position of the sound source to the microphones
11 to 14 which are the sound collecting unit based on the signal, and an inverse filter coefficient
for canceling the characteristic of the obtained transfer function A filter coefficient calculation
unit 8 and an inverse filter processing unit 9 that filters audio signals collected by the plurality of
audio collection units based on the inverse filter coefficients. [Selected figure] Figure 1
Voice recording apparatus and voice recording method
[0001]
The present invention relates to a voice recording device and a voice recording method for
recording a target voice.
[0002]
Heretofore, as a method of individually recording the voice of a speaker, a method using a close
talk type microphone, a headset microphone or the like, and a method of processing an audio
10-04-2019
1
signal based on characteristics of the speaker's voice have been proposed.
Further, as another conventional technique, in a teleconference or the like, a voice signal received
by a plurality of microphones is filtered and output to reduce noise and distortion, and a sound
emitted from a target sound source can be generated. An apparatus for picking up sound with
high quality has been proposed (Patent Document 1). The apparatus described in Patent
Document 1 performs speaker position estimation based on a signal received by a microphone.
Next, in response to the speaker position estimation result, a delay is set in the delay unit such
that the delay and sum array is focused on the estimated speaker position. The signal-to-noise
ratio (SN ratio) is estimated for the audio signal recorded by each microphone. The estimated SN
ratio is used to determine the filter coefficient. The optimum filter calculation unit calculates an
optimum filter that optimizes the SN ratio of the output of the array and the distortion of the
target sound component, and sets it as a filter. Unexamined-Japanese-Patent No. 2001-309483
[0003]
However, according to the technology described in Patent Document 1, reverberant sound
reflected by a wall of a room or the like enters the microphone, so it is impossible to pick up the
voice of each speaker without being affected by the reverberant sound. There was a problem.
[0004]
Therefore, the present invention has been made in view of the above problems, and it is an object
of the present invention to provide a voice recording apparatus and a voice recording method
that can pick up the voice of each speaker without being affected by reverberation. I assume.
[0005]
In order to solve the above problems, the voice recording apparatus according to the present
invention has a reference signal generating unit that generates a reference signal to be input to a
voice generating unit placed in a sound source, and different positions of voice generated by the
voice generating unit. A storage unit for storing voice signals collected by a plurality of voice
collecting units placed in time synchronization with each other, the sound source based on the
voice signal stored in the storage unit and the reference signal; A transfer function calculating
unit for obtaining a transfer function from the position of the sound pickup unit to the voice
collecting unit, an inverse filter coefficient calculating unit for calculating an inverse filter
coefficient for canceling the characteristic of the obtained transfer function, And a filter
processing unit configured to filter audio signals collected by the plurality of audio collection
units.
10-04-2019
2
[0006]
According to the present invention, a reference signal to be input to a sound generation unit
placed in a sound source is generated, and a plurality of sound collection units in which sounds
generated by the sound generation unit are placed at different positions and synchronized in
time are obtained. The voice signal collected by each unit is stored, and the transfer function
from the position of the sound source to the voice collection unit is determined based on the
voice signal and the reference signal stored in the storage unit. By filtering the voice signal by
calculating inverse filter coefficients that cancel the characteristics, it is possible to create a voice
from which reverberation is removed, so that the voice of the speaker is separately collected
without being affected by the reverberation. Can.
[0007]
The filter processing unit picks up a target voice by equalizing and adding the sounds arriving to
the plurality of voice collection units.
Preferably, the reference signal is an impulse input, and the sound signals collected by the
plurality of sound collection units are impulse responses.
The voice recording apparatus according to the present invention further includes a voice
recording unit for recording the voice signal filtered by the filter processing unit as a voice signal
generated by the sound source.
[0008]
The filter processing unit estimates the position of the sound source based on sound source
information indicating the position of the sound source and performs the filtering process.
As a result, the sound source position can be estimated and filtered using sound source
information other than images, for example, positions of chairs, and images using radio waves,
ultrasonic waves, tags using infrared light, and the like.
10-04-2019
3
In addition, if the room has little reverberation, it is possible to estimate the sound source of the
voice. The sound source information is, for example, an image signal captured by an imaging
unit. As a result, the image signal can be used to narrow down the candidates for estimation of
the sound source, and the accuracy of the filter processing can be improved.
[0009]
According to the voice recording method of the present invention, the step of generating a
reference signal to be input to a voice generating unit placed in a sound source and the voice
generated by the voice generating unit are placed at different positions and synchronized in time.
From the position of the sound source to the voice collection unit based on the step of storing in
the storage unit voice signals collected by a plurality of voice collection units, and the voice
signal and the reference signal stored in the storage unit Calculating a transfer function of the
first filter, calculating an inverse filter coefficient that cancels the characteristic of the transfer
function, and filtering the audio signal picked up by the plurality of audio pickup units based on
the inverse filter coefficient And step. According to the present invention, since the reference
signal can be used to create reverberation-removed speech, the speech of the speaker can be
collected without being affected by the reverberation.
[0010]
The reference signal is an impulse input, and the sound signals collected by the plurality of
sound collection units are impulse responses. The filtering step estimates the position of the
sound source based on sound source information indicating the position of the sound source and
performs the filter processing. As a result, the sound source position can be estimated and
filtered using sound source information other than images, for example, positions of chairs, and
images using radio waves, ultrasonic waves, tags using infrared light, and the like. In addition, if
the room has little reverberation, it is possible to estimate the sound source of the voice. The
sound source information is an image signal captured by an imaging unit. As a result, the image
signal can be used to narrow down the candidates for estimation of the sound source, and the
accuracy of the filter processing can be improved.
[0011]
Further, the accuracy can be improved by optimizing the filter parameters in the filtering process
10-04-2019
4
using the technique of blind source separation (BSS) in which ICA (independent component
analysis) of the known technique is applied to speech. The BSS is proposed by the following
document. H. Saruwatari, T. Kawamura, T. Nishikawa, K. Shikano, `` Fast-Convergence Algorithm
for Blind Source Separation Based on Array Signal Processing, '' IEICE Trans. Fundamentals, Vol.
E86-A, No. 3, pp. 286-291 March 2003.
[0012]
Since calculation is complicated and accuracy is reduced in multiple channels, calculation can be
performed in a realistic time by using the filter parameters obtained by the present invention as
initial values of the BSS. A known BSS is divided into bands, and filtering is performed for each
band to separate voice. However, in the present invention, processing is performed in sound
wave sampling units as in the microphone array technology. The BSS is based on the fact that
multiple sound sources are independent (not correlated with each other even if more than one
person speaks), but this can also be used to fine tune and optimize the parameters of the present
invention . It is possible for the present invention to be the initial value of optimization by BSS,
and there is no precedent for this. By applying a known BSS with the parameters of the present
invention as initial values, the accuracy can be further improved and fine adjustment can be
performed when the body (sound source) moves.
[0013]
According to the present invention, it is possible to provide a voice recording device and a voice
recording method capable of picking up the voice of each speaker without being affected by
reverberation.
[0014]
The best mode for carrying out the present invention will be described below.
FIG. 1 is a view for explaining a voice recording apparatus which measures in advance a transfer
function to a sound receiving point and calculates an inverse filter. As shown in FIG. 1, the voice
recording device 1 includes first to fourth voice input terminals 21 to 24, a voice output terminal
3, a reference signal output terminal 4, a reference signal generator 5, an impulse response
storage 6, and a transfer function. The calculation unit 7, the inverse filter coefficient calculation
unit 8, the inverse filter processing unit 9, and the voice recording unit 10 are provided.
10-04-2019
5
Reference numerals 11 to 14 indicate first to fourth microphones (a microphone array and a
plurality of sound collecting units), 15 indicates a speaker, 16 to 18 indicate a speaker, and 20
indicates an imaging unit such as a video camera. The speakers 16 to 18 are present at the
positions of the sound sources A to C, respectively.
[0015]
The voice recording device 1 generates a reference signal from the speaker position, measures
the transfer function to the receiving point in advance, calculates the inverse filter of the transfer
function, and convolutes the received signal. To remove voice and record audio.
[0016]
The reference signal generation unit 5 generates a reference signal to be input to the speaker 15
which is a sound generation unit placed in the sound source.
The reference signal generation unit 5 outputs the generated reference signal from the speaker
15 placed at the same position as the speakers 16 to 18 through the reference signal output
terminal 4. The first to fourth microphones 11 to 14 are disposed at arbitrary positions different
from one another. The first to fourth microphones are two-dimensionally arranged, for example,
on the ceiling or floor of the conference room. The generated reference signal is received by the
first to fourth microphones 11 to 14. The impulse response storage unit 6 stores voice signals
(for example, impulse responses) collected by the plurality of microphones 11 to 14 placed at
different positions and synchronized in time with the voice generated by the speaker 15. The
impulse response storage unit 6 is formed of, for example, a hard disk or a semiconductor
memory, and stores impulse responses of the received channels.
[0017]
FIG. 2 is a diagram for explaining the contents of the impulse response storage unit 6. FIG. 2A is
a diagram for explaining an impulse input and an impulse response in the sound source A, and
FIG. And (C) is a figure explaining the impulse input and impulse response in the sound source C.
FIG. In each figure, (a) is a waveform of an impulse input as a reference signal input to the
speaker 15, (b) is a waveform of an impulse response received by the first microphone 11, and
(c) is received by the second microphone 12. (D) shows the waveform of the impulse response
received by the third microphone 13, and (e) shows the waveform of the impulse response
10-04-2019
6
received by the fourth microphone 14.
[0018]
The transfer function calculation unit 7 obtains a transfer function from the position of the sound
source to the microphones 11 to 14 based on the audio signal and the reference signal stored in
the impulse response storage unit 6. Specifically, the transfer function calculation unit 7
determines the transfer function from the position of the speakers 16 to 18 to the first
microphone 11 based on the received signal of each channel and the reference signal, and The
transfer function from the position to the second microphone 12, the transfer function from the
position of the speakers 16 to 18 to the third microphone 13, and the transfer function from the
position of the speakers 16 to 18 to the fourth microphone 14 are calculated. The inverse filter
coefficient calculation unit 8 calculates an inverse filter coefficient that cancels the measured
transfer function characteristic of each channel.
[0019]
The inverse filter processing unit 9 filters the audio signal collected by the plurality of
microphones 11 to 14 based on the inverse filter coefficient. The inverse filter processing unit 9
filters and outputs the audio signal received by the plurality of microphones 11 to 14 using the
inverse filter coefficient, thereby removing the reverberation-removed audio signal from the
audio output terminal 3 Output. The voice recording unit 10 records the voice signal filtered by
the inverse filter processing unit 9 as a voice signal generated by the sound source. For example,
it comprises a hard disk drive or a semiconductor memory, and stores voice signals outputted
from the voice output terminal 3 for each of the speakers 16 to 18. Thereby, the sound emitted
from the target sound source can be collected with high quality.
[0020]
The processing of the reference signal generation unit 5, the transfer function calculation unit 7,
the inverse filter coefficient calculation unit 8, and the inverse filter processing unit 9 is realized
by executing a predetermined program by a central processing unit (CPU) and its peripheral
circuits. Can also be realized by ASIC (Application Specific Integrated Circuit) or the like. In
addition, the inverse filter processing unit 9 may estimate the position of the sound source based
on the image signal captured by the imaging unit 20 and perform filter processing. As a result,
10-04-2019
7
the image signal can be used to narrow down the candidates for estimation of the sound source,
and the accuracy of the filter processing can be improved.
[0021]
FIG. 3 is a flowchart of inverse filter coefficient calculation processing according to the
embodiment of the present invention. Next, inverse filter coefficient calculation processing will
be described. First, in the inverse filter coefficient calculation process, the speaker 15 is set at the
position A at which each of the speakers 16 to 18 intends to speak (step S11). Next, the reference
signal generator 5 passes the generated reference signal (impulse input) through the reference
signal output terminal 4 and outputs it from the speaker 15 placed at the same position as the
speakers 16 to 18 (step 12). The reference signal generated from the speaker 15 is received by
the first to fourth microphones 11 to 14 (step S13). The impulse response storage unit 6 stores
the received impulse response of each channel (step S14).
[0022]
Next, similarly, the same procedure is performed for positions B and C at which the speakers 16
to 18 intend to speak, and the impulse response storage unit 6 stores the impulse response of
each channel. In the case of a meeting or presentation, such processing is possible because the
position of the sound source is roughly known from the position of the chair and the position of
the presenter.
[0023]
When the transfer function calculation unit 7 determines that impulse responses for all planned
utterance positions have been collected ("Y" in step S15), the speaker is selected based on the
received signal of each channel and the reference signal. Transfer function from the position of
16-18 to the first microphone 11, transfer function from the position of the speakers 16-18 to
the second microphone 12, and transfer function from the position of the speakers 16-18 to the
third microphone 13 The transfer function from the positions of the speakers 16 to 18 to the
fourth microphone 14 is calculated (step S16). The inverse filter coefficient calculation unit 8
calculates an inverse filter coefficient that cancels the measured transfer function characteristic
of each channel (step S17).
10-04-2019
8
[0024]
Next, a process of individually recording the voice of each speaker will be described. FIG. 4 is a
process flow chart when recording the voice of each speaker individually. For example, it is
assumed that the speaker 16 utters at the sound source position A (step S21). Audio signals are
input to the first to fourth microphones 11 to 14, and the input audio signals are input to the
inverse filter processing unit 9 through the first to fourth audio input terminals 21 to 24 (step
S22).
[0025]
The inverse filter processing unit 9 focuses on the target sound source position using a method
called delay sum array that enhances the sensitivity to the focal position by adding the phases
arriving at the first to fourth microphones 11 to 14 in phase and adding them. Thereby, noise
existing at other than the target sound source position is suppressed, and an audio signal of
sound source A whose SN ratio is improved is generated (step S23). As a result, it is possible to
generate an audio signal in which the sound from the target sound source is emphasized. Next,
the inverse filter processing unit 9 filters and outputs the audio signal received by the plurality of
microphones 11 to 14 using inverse filter coefficients (step S24). Thus, the audio signal from
which the reverberation has been removed is output from the audio output terminal 3.
[0026]
The voice recording unit 10 stores the voice signal output from the voice output terminal 3 for
each of the speakers 16 to 18 and separately records the voice of the speaker (step S25). The
voice recording apparatus ends the process when an instruction to end voice recording is input.
As a result, the voice of the speaker emitted from the target sound source can be individually
recorded with high quality.
[0027]
According to the voice recording apparatus according to the above embodiment, since the voice
from which the reverberation is removed can be created using the reference signal, the voice of
the speaker can be separately collected without being affected by the reverberation. .
10-04-2019
9
[0028]
Although the preferred embodiments of the present invention have been described in detail, the
present invention is not limited to the specific embodiments, and various modifications may be
made within the scope of the subject matter of the present invention described in the claims.
Changes are possible.
Although three sound source positions have been described above as an example, the present
invention is not limited to this and can be applied even when there are more sound source
positions. Further, although the example using four microphones has been described above, it is
better to use more microphones in practice to improve the accuracy. Further, although the
example using the impulse input has been described above as an example of the reference signal,
the present invention is not limited to this, and any signal may be used as long as it is a
reference. Moreover, the present invention can be applied not only to the scene of a conference
but also to various scenes. Further, although the example using the image signal captured by the
imaging unit as the sound source information indicating the position of the sound source has
been described above, the present invention is not limited to the sound source information using
the image signal. That is, the inverse filter processing unit can estimate and filter the position of
the sound source based on the sound source information indicating the position of the sound
source.
[0029]
It is a figure explaining the technique of the voice recording device which measures the transfer
function to a sound receiving point in advance, and calculates an inverse filter. It is a figure for
demonstrating the content of an impulse response memory | storage part. 5 is a flowchart of
inverse filter coefficient calculation processing according to an embodiment of the present
invention. It is a process flowchart at the time of recording separately the audio | voice for every
speaker.
Explanation of sign
[0030]
DESCRIPTION OF SYMBOLS 1 audio | voice recording apparatus 21-24 audio | voice input
10-04-2019
10
terminal 3 audio | voice output terminal 4 reference signal output terminal 5 reference signal
generation part 6 impulse response recording part 7 transfer function calculation part 8 inverse
filter coefficient calculation part 9 inverse filter process part
10-04-2019
11
Документ
Категория
Без категории
Просмотров
0
Размер файла
21 Кб
Теги
description, jp2008022069
1/--страниц
Пожаловаться на содержимое документа