close

Вход

Забыли?

вход по аккаунту

?

JP2007142595

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2007142595
In a remote conference apparatus, the position of a sound source is correctly estimated even if
there is a possibility that sound output from a speaker may be picked up by a microphone. A
speaker array SPA and microphone arrays MR and ML are provided on both sides thereof, and a
plurality of focal points are set as positions of a speaker symmetrically with respect to a center
line 101 of the speaker array SPA in front of each of the microphone arrays MR and ML. Output
a bundle of sound collection beams directed to that focal point. The difference value between the
collected sound beams directed to the mutually symmetrical focal point with respect to the
center line 101 is calculated to cancel the sound component that has entered the microphone
from the speaker array SPA. Furthermore, it is estimated which of the set focal points is closer to
the set focal point from the sum of a specific time portion of the square of the peak value of the
difference value, and the square of the peak values of the sound collection beams directed to the
symmetrical focal points described above. Are compared to determine which of the speakers 998,
999. [Selected figure] Figure 1
Teleconferencing device
[0001]
The present invention relates to a microphone array and a speaker array, and to an apparatus for
reproducing received voice and its sound field, and more particularly to identifying a speaker or
a sound source from the microphone array.
[0002]
10-05-2019
1
Conventionally, means for receiving the voice on the transmission side and reproducing the
sound field of the voice on the transmission side have been proposed (see Patent Documents 1 to
3).
In such a device, sound signals and the like collected by a plurality of microphones and the like
are transmitted, and a sound field on the transmission side is reproduced using a plurality of
speakers on the reception side. In this way, there is an advantage that the position of the speaker
can be specified by voice.
[0003]
Patent Document 1 discloses a method of creating stereophonic sound information for
reproducing a sound field of a transmission source by transmitting voice information received by
a plurality of microphone arrays and outputting the voice information by the same number of
speaker arrays. There is.
[0004]
According to the method of Patent Document 1, although the sound field itself of the
transmission source can be transmitted and the position of the speaker can be specified by voice,
problems such as using a lot of line resources There is disclosed a means for specifying and
transmitting position information of a speaker (see, for example, Patent Document 2).
[0005]
In Patent Document 2, the microphone captures the voice of the speaker, the speaker position
information is generated from the speaker information obtained from the microphone, and the
speaker position information is multiplexed and transmitted together with the voice information.
On the receiving side, there is disclosed an apparatus for switching the position of a speaker to
be ringed according to incoming speaker position information and reproducing the speaker's
voice and position on the receiving side.
[0006]
10-05-2019
2
In Patent Document 3, since it is not realistic to provide each speaker with all the microphones in
a multi-person conference system, the microphone control unit is used to shift the phase of the
audio signal input to each microphone. There is a description of a conference system that
identifies a speaker by synthesizing.
In Patent Document 3, the phase shift pattern corresponding to the seat position of the speaker is
changed to determine the phase pattern that maximizes the voice, and the speaker position is
specified from the determined phase shift pattern. There is. JP-A-2-114799 JP-A-9-261351 JP-A10-145763
[0007]
However, the above patent documents have the following problems.
[0008]
As described above, the method of Patent Document 1 has problems such as using a lot of line
resources.
[0009]
In the methods of Patent Documents 2 and 3, although it is possible to generate speaker position
information from the speaker information obtained from the microphone, this position detection
is disturbed by the voice of the speaker that outputs the voice transmitted from the other device.
There is a problem in that the microphone array (the camera in Patent Document 3) is directed
by misidentifying that there is a sound source in a direction different from the actual direction.
[0010]
Therefore, an object of the present invention is to make it possible to estimate a true sound
source in a teleconference device, even if a speaker for outputting a voice transmitted from a
partner device is picked up by a microphone.
[0011]
In the present invention, means for solving the above-mentioned problems are configured as
follows.
[0012]
10-05-2019
3
(1) According to the present invention, there is provided a speaker array comprising a plurality of
speakers for outputting sound in the upward or downward direction, and first and second
microphones provided so as to pick up sound on both sides in the longitudinal direction of the
speaker array. The audio signal collected by each of the microphones of the array and the first
microphone array is subjected to delay processing on the audio signal with a predetermined
delay amount, and then synthesized, thereby pre-processing the first microphone array. First
beam generation means for generating a plurality of first sound collection beams focusing on a
plurality of first sound collection areas defined, and a predetermined sound signal collected by
each microphone of the second microphone array A plurality of second sound collection beams
focused respectively on a plurality of predetermined second sound collection areas on the side of
the second microphone array by delay processing and combining the sound signal with the
specified delay amount Generate Of the sound signals collected from the second beam generation
means, the plurality of first sound collecting areas, and the plurality of second sound collecting
areas, sound collection at mutually symmetrical positions with respect to the longitudinal center
line of the speaker array Difference signal calculation means for calculating difference signals of
audio signals collected from area pairs, first sound source position estimation means for selecting
sound collection area pairs having large signal strengths of the difference signals, and the first
sound source A second sound source position estimating means for selecting a sound collecting
area having a larger strength of the collected sound signal among the sound collecting area pairs
selected by the position estimating means, and estimating that the sound source position exists in
the sound collecting area And.
[0013]
The first beam generating means and the second beam generating means each have a
symmetrical position as a sound collecting area, and focus on the sound collecting area to
generate the first and second sound collecting beams.
Further, the sound transmitted from the partner apparatus and output from the speaker array is
output substantially symmetrically on either side of each of the pair of microphone arrays.
Therefore, it is considered that voices output from the speaker array are substantially equally
input to the first and second sound collection beams, and the difference signal calculation means
calculates difference signals of the first and second sound collection beams. Therefore, it is
possible to cancel the sound output from the speaker array.
10-05-2019
4
Also, even if the difference between the effective values of the collected sound beam is calculated,
it is considered that the sound output from the speaker array is substantially equally input to the
focal point to which the collected sound beam is applied. You can cancel the
[0014]
Further, the voice other than the voice output from the speaker array input to the microphone
array will not disappear even if such a difference is taken.
For example, typically, when a speaker speaks to only one side of the microphone array and a
sound collection beam directed to the direction of the speaker is generated, one of the sound
collection beams is Since the voice of the speaker is input and no voice is input to the opposite
side, the voice of the speaker itself or the reverse phase of the voice remains in the calculation of
the difference.
Also, even if there are sound sources on both sides, the sound is different, so in most cases, the
sound input to the pair of microphone arrays is asymmetric.
Therefore, even if such a difference is taken, the speech of the speaker remains. Also, the
presence of the speaker's voice can be extracted similarly by calculating the effective value.
[0015]
The first sound source position estimating means estimates that the sound source position exists
in one of the large sound collection area pairs of the difference signal. The second sound source
position estimating means compares the sound signals collected by each of the sound collection
area pairs, and estimates which one of the sound source positions is present. As described above,
according to the present invention, the sound source (speaker's voice is included even if there is
a possibility that the sound output from the speaker gets into the microphone and is collected.
same as below. The position of) can be estimated correctly.
[0016]
10-05-2019
5
The effective value of the audio signal can be obtained by calculating in real time the time
average of the square of the peak value of a specific time. The signal strengths of the difference
signals are compared by using a time average of the square of the peak value of a predetermined
time, a sum of squares of a plurality of predetermined frequency gains of the gain obtained by
FFT conversion, or the like. The signal strength of the difference signal of the rms value may be
calculated by using the data for a predetermined time longer than the calculation of the rms
value and calculating the time average of the difference signal of the rms value or the time
average of the square of this difference signal it can. The same is true below.
[0017]
(2) In the invention of (1), the first beam generating means and the second beam generating
means further include a plurality of narrowing noises in a sound collecting area selected by the
second sound source position estimating means. An area is set, and a function of generating a
plurality of narrowing sound beams focused respectively on the plurality of narrowing sound
areas is provided, and the strength of the sound signal collected among the plurality of
narrowing sound areas A third sound source position estimating means is provided for
estimating that the sound source position is in an area where the is large.
[0018]
In the present invention, a plurality of narrow sound collection areas are further set in the sound
collection area where the sound source position is estimated to be present by the second sound
source position estimation means, and narrow sound collection beams are generated in each of
them.
The third sound source position estimating means selects the area where the signal intensity is
large in the narrow sound collecting area, thereby narrowing down the position of the sound
source stepwise and estimating the position of the sound source in a short period of time rather
than narrowly estimating from the beginning. It can be estimated.
[0019]
(3) According to the present invention, a plurality of microphones are arranged symmetrically
with each other on both sides of a speaker array consisting of a plurality of speakers outputting
voices upward or downward and a longitudinal center line of the speaker array. A difference
signal is calculated by differentiating the sound signals collected by the first and second
10-05-2019
6
microphone arrays and the microphones of the first and second microphone arrays for each pair
of microphones located at symmetrical positions. Difference signal calculation means, and third
beam generation means for generating a plurality of third sound collection beams focused on a
plurality of predetermined positions by adjusting the delay amounts of the difference signals and
combining them; A first sound source position estimation means for selecting a sound collection
area pair having a large signal strength of the difference signal among the plurality of sound
collection area pairs, and microphones of the first and second microphone arrays collected sound
Based on voice signal And fourth and fifth beam forming means for forming a sound collection
beam for collecting sound signals of the respective sound collection areas of the sound collection
area pair selected by the first sound source position estimation means, and fourth and fifth The
second sound source position estimation for selecting the sound pickup area having the larger
signal intensity among sound signals collected by the sound collection beam formed by the beam
forming means, and estimating that the sound source position exists in the sound pickup area
And means.
[0020]
According to the present invention, first, an audio signal collected by each pair of microphones at
symmetrical positions of the microphone arrays on both sides is differentially calculated to
calculate a differential signal, and beams are determined in a plurality of predetermined
directions using this differential signal. Generate
Since the microphone arrays on both sides are arranged symmetrically with respect to the
speaker array, this differential signal is one in which the sound that has already entered the
speaker array has been canceled. The fourth sound source position estimating means estimates a
sound source position based on the difference signal. In this estimation, it is sufficient to select
one of a plurality of formed sound collection beams that has a large signal strength, and the
sound source position is a pair when the sound collection beams are respectively formed by the
first and second microphone arrays. It is estimated that the focal position of
[0021]
According to the present invention, it is possible to correctly estimate the position of the sound
source even if there is a possibility that the sound output from the speaker in the teleconference
device may be picked up by the microphone.
[0022]
10-05-2019
7
First Embodiment The configuration and usage of a teleconference device according to a first
embodiment of the present invention will be described with reference to FIG.
The teleconferencing device according to the first embodiment reproduces and outputs the
position of the speaker on the partner device side using the speaker array and outputs the voice
transmitted from the partner device, and the voice of the speaker using the microphone array. ,
And detects the position of the speaker, and transmits the collected voice and position
information to the other device.
[0023]
FIG. 1 shows the appearance and usage of this teleconferencing device, and FIG. 1 (A) is an
external perspective view of the teleconferencing device, and FIG. 1 (B) is a bottom view AA of
the teleconferencing device. FIG. Further, FIG. 1C is a view showing a use form of the
teleconference device.
[0024]
As shown in FIG. 1A, the teleconference device 1 is provided with a long rectangular
parallelepiped device main body and legs 11. The main body of the teleconferencing device 1 is
supported by the legs 11 so as to float upward from the installation surface by a predetermined
distance. On the bottom surface of the teleconference device 1, a speaker array SPA in which a
plurality of speakers SP1 to SP4 are linearly arranged in the longitudinal direction of the device
main body which is a rectangular parallelepiped is provided downward. Audio is output
downward from the bottom of the teleconference device 1 by the speaker array SPA, and the
audio is reflected by the installation surface of the conference desk or the like to reach the
conference participants (see FIG. 1C).
[0025]
In addition, as shown in FIGS. 1A and 1B, both side surfaces in the longitudinal direction of the
device main body (hereinafter, the both side surfaces are the right side surface (upper side of FIG.
10-05-2019
8
1B) and left side surface (FIG. 1B) Bottom of the)). ) Is provided with a microphone array in which
the microphones are linearly arranged. That is, on the right side of the device body, a
microphone array MR consisting of the microphones MR1 to MR4 is provided, and on the left
side of the device body, a microphone array ML consisting of the microphones ML1 to ML4 is
provided. The teleconferencing device 1 picks up the speech of the conference participant who is
the speaker and detects the position of the speaker using the microphone arrays MR and ML.
[0026]
Although not shown in FIG. 1, the voice picked up from the microphone arrays MR and ML is
processed in the teleconferencing device 1 to obtain the position of the speaker (not only the
human voice but also the object It may be a voice coming out of. The same applies hereinafter,
the transmitter 2 (see FIG. 4) that estimates this position and the microphone array MR and the
voice collected from ML and multiplexes it, and the voice received from the other party's device
from the speakers SP1 to SP1. A receiver 3 (see FIG. 6) is provided which outputs a beam from
SP4.
[0027]
Although the microphone arrays MR and ML are provided at symmetrical positions with respect
to the center line 101 of the speaker array SPA in FIG. 1, the devices according to the first
embodiment do not necessarily have to be arranged symmetrically. Even if the microphone
arrays MR and ML are asymmetrical, the signal processing may be performed by the transmitting
unit (see FIG. 4) so that the left and right sound collecting areas (see FIG. 3) are formed
symmetrically. .
[0028]
Next, the usage of the teleconference device 1 will be described using FIG. 1 (C). The
teleconferencing device 1 is usually placed at the center of the conference desk 100 and used. A
speaker 998 or / and a speaker 999 sit on the left and right sides or one side of the conference
desk 100. The sound output from the speaker array SPA is reflected by the conference desk 100
and reaches the left and right speakers, but the speaker array SPA beam-forms and outputs the
sound, thereby specifying the sounds for the left and right speakers It can be localized at the
position. The details of the sound beam forming process by the speaker array SPA will be
10-05-2019
9
described later.
[0029]
Also, the microphone arrays MR and ML pick up the voice of the speaker. A signal processing
unit (transmission unit) connected to the microphone arrays MR and ML detects the position of
the speaker based on the difference in the timing of the speech input to each of the microphone
units MR1 to 4 and ML1 to ML4.
[0030]
Further, although the number of speakers and the number of microphones are four in FIG. 1 for
ease of illustration, the number of speakers and the number of microphones are not limited to
four in order to use the apparatus of the first embodiment. A microphone may be provided, and a
plurality of microphone arrays MR, ML, and speaker array SPA may be provided instead of one.
Therefore, in the following description, for example, a speaker array or a microphone array is
indicated using subscript i in a manner of SPi (i = 1 to N) for the speakers SP1 to SPN and MLi (i
= 1 to N) for the microphones ML1 to MLN. Let's express each speaker and microphone of For
example, SPi (i = 1 to N) corresponds to SP1 for i = 1.
[0031]
Here, with reference to FIG. 2, a beam forming process of sound by the speaker array SPA, that is,
a sound beam and a sound collection beam formed by the microphone arrays ML and MR will be
described.
[0032]
The figure (A) is a figure explaining an audio | voice beam.
A signal processing unit (receiving unit) for supplying an audio signal to each of the speaker
units SP1 to SPN of the speaker array SPA delays the audio signal received from the other party
by a delay time DS1 to DSN as shown in the figure. It supplies to each speaker unit SP1-SPN. In
this figure, each speaker emits sound without delay time from the speaker closest to the virtual
10-05-2019
10
sound source position (focus point FS), and emits sound after a delay time corresponding to the
distance as it gets farther to the virtual sound source position. Delay patterns are given. Due to
this delay pattern, the sound outputted from each of the speaker units SP1 to SPN forms a wave
front similar to the sound emitted from the virtual sound source in the figure and spreads, and it
is possible to the meeting attendee who is the user. The voice can be heard as if the other party's
speaker is at the position of the virtual sound source.
[0033]
The figure (B) is a figure explaining a sound collection beam. As shown in the drawing, the voice
signals input to the microphone units MR1 to MRN are respectively delayed after being delayed
by the delay times DM1 to DMN. In this figure, the audio signal collected by each microphone is
input to the adding unit without any delay time, and the sound collected by the microphone
farthest to the collection area (focus FM) is closer to the collection area as it gets closer A delay
pattern which is input to the adder after being delayed according to the distance is given. Due to
this delay pattern, each audio signal becomes equidistant in sound wave propagation from the
pickup area (focus point FM), and each synthesized audio signal emphasizes the audio signal of
this pickup area in the same phase, and The audio signal in the area is offset by phase shift. As
described above, by delaying and synthesizing the voices input to the plurality of microphones so
as to be equidistant on sound wave propagation from a certain sound collecting area, it is
possible to pick up only the sound of the sound collecting area.
[0034]
In the teleconferencing device of this embodiment, the microphone arrays MR and ML
simultaneously form a sound collection beam for a plurality of (four in FIG. 3) sound collection
areas. Thus, the voice can be collected wherever the speaker is in the sound collection area, and
the position of the speaker can be detected by the sound collection area where the voice is
collected.
[0035]
Next, with reference to FIG. 3, the detection of the sound source position by the sound collection
beam and the sound collection operation from the sound source position will be described. FIG. 3
is a plan view of the teleconferencing device and the speaker looking down from above, that is, a
10-05-2019
11
B-B arrow view of FIG. 1 (C), and illustrates a mode of sound collection beam formation by the
microphone array.
[0036]
<< Description of Sound Source Position Detection and Sound Collection Method Excluding
Daemon Sound Source >> First, the principle of the sound source position detection and sound
pickup method of this remote conference device will be described. In this description, it is
assumed that no sound beam is output from the speaker array SPA.
[0037]
Here, the process for the sound collection signal of the microphone array MR on the right side
will be described. The transmission unit 2 (see FIG. 4) of the teleconference device 1 forms a
sound collection beam focusing on the four areas of the sound collection areas 411 to 414 by the
above-described delay combination. The plurality of sound collection areas are determined on the
assumption that there may be speakers present at the conference using the teleconference device
1.
[0038]
It is considered that a speaker (sound source) is present in the area where the level of the
collected voice signal is the largest among the collection areas 411R to 414R. For example, as
shown in FIG. 3, when the sound source 999 is present in the sound collection area 414R, the
sound signal collected from the sound collection area 414R as compared to the sound signal
collected from the other sound collection areas 411R to 413R. The level of
[0039]
Similarly, with regard to the microphone array ML on the left side, four systems of sound
collection beams are formed substantially in line symmetry with the right side surface, and
among the sound collection areas 411L to 414L, the area having the largest collected audio
signal level is selected. To detect. The symmetrical line of the line symmetry is formed so as to
10-05-2019
12
substantially coincide with the axis of the speaker array SPA.
[0040]
The above is the principle of the sound source position detection and sound collection method of
the teleconference device of this embodiment.
[0041]
In the state where no sound is output from the speaker array SPA and the microphone arrays MR
and ML do not pick up the wraparound sound, correct sound source position detection and
sound pick-up can be performed according to this principle, but the teleconferencing device is bidirectional The voice signal is transmitted and received, and the voice is emitted from the
speaker array SPA in parallel with the sound collection by the microphone arrays MR and ML.
[0042]
The sound signal supplied to each speaker of the speaker array SPA has a pattern of delay as
shown in FIG. 2A so as to form the same wave front as when sound comes from a virtual sound
source position set behind the speaker array. Is given.
On the other hand, the audio signals picked up by the microphone array MR are delayed and
synthesized in a pattern as shown in FIG. 2 (B) so that the timings of the audio signals coming
from the predetermined sound pickup area coincide.
[0043]
Here, when the virtual sound source position of the speaker array SPA matches any one of the
plurality of sound collecting areas of the microphone array MR, the delay pattern and the
microphone array MR provided to each of the speakers SP1 to SPN of the speaker array SPA. The
delay pattern given for the sound collection area of the sound signal collected by is exactly
reversed, and the sound signal emitted from the speaker array SPA and collected in the
microphone array MR is synthesized at a large level It will be done.
[0044]
When processed by the general sound source position detection method described above, there
is a problem that a wraparound speech signal synthesized at this large level is erroneously
10-05-2019
13
recognized as a sound source (daemon sound source) which is originally not present.
[0045]
Therefore, if this daemon sound source is not canceled, the voice signal coming from the other
device is returned as it is, which causes an echo, and it becomes impossible to detect and pick up
the voice of the original sound source (speaker) .
[0046]
The above is a description of the microphone array MR, but the same applies to the microphone
array ML (because it is symmetrical).
[0047]
That is, the daemon sound source is generated symmetrically in the same manner in the right
microphone array MR and the left microphone array ML because the voice beam is reflected by
the conference desk 100 and emitted symmetrically.
[0048]
Therefore, the volume levels of the left sound collection areas 411L to 414L and the right sound
collection areas 411R to 414R are compared, and even if the sound volume level is large and the
sound source is estimated to be present, the sound volume levels are similarly large in the
corresponding areas on the left and right In this case, as the sound source of the speaker array
SPA is a deamon sound source, it is detected as a daemon sound source, and it is possible to
detect and collect the sound of a true sound source by excluding it from the sound collection
target. It is intended to prevent echo due to voice.
[0049]
Therefore, in the transmission unit of the teleconference device, the audio signal level collected
from the collection areas 411L to 414L of the left microphone array ML and the audio signal
level collected from the collection areas 411R to 414R of the right microphone array MR. And
compare the levels in the left and right sound collection areas so that the sound source is present
in the larger sound collection area if the levels are largely different in the left and right sound
collection areas. There is.
[0050]
Then, only the larger audio signal is transmitted to the partner apparatus, and positional
10-05-2019
14
information indicating the position of the sound collection area where the audio signal is
detected is added to the subcode of the signal (digital signal) or the like.
[0051]
The configuration of the signal processing unit (transmission unit) that executes the above
daemon sound source exclusion process will be described below.
The narrowing sound collecting beams 431 to 434 in FIG. 3 will be described with reference to
the description of the second embodiment in FIG. 7.
[0052]
<< Configuration of Transmission Unit for Forming Sound Collection Beam >> FIG. 4 is a block
diagram showing a configuration of the transmission unit 2 of the teleconference device 1.
Here, thick arrows indicate that audio signals of a plurality of systems are being transmitted, and
thin arrows indicate that one audio signal is being transmitted.
Also, the broken arrow indicates that the instruction input is being transmitted.
[0053]
The first beam generation unit 231 and the second beam generation unit 232 in the figure are
signals forming four systems of sound collection beams focusing on the left and right sound
collection areas 411R to 414R and 411L to 414L shown in FIG. 3. It is a processing unit.
[0054]
The first beam generation unit 231 receives, via the A / D converter 211, audio signals collected
by the microphone units MR1 to MRN of the right microphone array MR.
10-05-2019
15
Similarly, the second beam generation unit 232 receives, via the A / D converter 212, an audio
signal collected by each of the microphone units ML1 to MLN of the left microphone array ML.
[0055]
The first beam generation unit 231 and the second beam generation unit 232 respectively form
four sound collecting beams and collect sounds from the four sound collecting areas 411R to
414R and 411L to 414L, and the collected sound signals Are output to the difference value
calculation circuit 22 and the selectors 271 and 272.
[0056]
FIG. 5 is a diagram showing a detailed configuration of the first beam forming unit 231. As
shown in FIG.
The first beam generation unit 231 has a plurality of delay processing units 45 j corresponding
to the respective sound collecting areas 41 j (j = 1 to K).
Each delay processing unit 45 j delays the audio signal for each microphone output based on the
delay pattern data 40 j in order to generate the collected sound beam output MBj having the
focal point in each of the collected sound areas 41 j.
Each delay processing unit 45 j inputs delay pattern data 40 j stored on the ROM, and sets a
delay amount to the delay 46 ji (j = 1 to K, i = 1 to N).
[0057]
Then, the adding unit 47 j adds the delayed digital audio signals and outputs the result as a
microphone beam output MBj (j = 1 to K).
The sound collecting beam outputs MBj are sound collecting beams focused on the sound
collecting area 41j shown in FIG.
10-05-2019
16
The collected sound beam output MBj calculated by each delay processing unit 45j is output to
the difference value calculation circuit 22 and the like.
[0058]
Moreover, although the 1st beam formation part 231 was demonstrated in FIG. 5, the 2nd beam
formation part 232 is also the structure similar to this.
[0059]
In FIG. 4, the difference value calculation circuit 22 compares the volume levels of sound signals
collected in the sound collection areas among the sound signals collected in the respective sound
collection areas and calculates the difference values thereof. Do.
That is, assuming that the signal level of the sound collection area A is represented by P (A), the
difference value calculation circuit 22 calculates D (411) = | P (411R) −P (411L) | D (412) = | P (
412R) -P (412L) .vertline.D (413) =. Vertline.P (413R) -P (413L) .vertline. D (414) =. Vertline.P
(414R) -P (414L) .vertline.
The calculated difference values D (411) to D (414) are output to the first estimation unit 251.
[0060]
Note that the difference value calculation circuit 22 may be configured to directly subtract the
signal waveform of the sound signal collected in the left and right sound collection areas and
output a difference value signal, and sound was collected in the left and right sound collection
areas. The value obtained by subtracting the volume level value obtained by integrating the
effective value of the audio signal for a fixed time may be output at each fixed time.
[0061]
When the difference value calculation circuit 22 outputs a difference value signal, the BPF 241
may be inserted between the difference value calculation circuit 22 and the first estimation unit
10-05-2019
17
251 in order to facilitate the estimation of the first estimation unit 251. .
The BPF 241 is set to pass, from the difference value signal, a frequency band around 1 k to 2
kHz where directivity control can be favorably performed by the sound collection beam in the
frequency domain of the conversational speech.
[0062]
As described above, by differentiating the volume levels of the sound collection signals of the left
and right sound collection areas located at symmetrical positions with respect to the center line
of the speaker array SPA, the left and right microphone arrays MR and ML from the speaker
array SPA are obtained. The voice component that has turned around in a symmetrical manner is
canceled, and the voice signal of the turn-around is not erroneously recognized as a daemon
sound source.
[0063]
The first estimation unit 251 selects the largest difference value among the difference values
input from the difference value calculation circuit 22, and selects a pair of sound collection areas
for which the largest difference value is calculated.
In order to input the sound collection area to the second estimation unit 252, the first estimation
unit 251 outputs a selection signal for outputting the audio signal of the sound collection area to
the second estimation unit 252 to the selectors 271 and 272.
[0064]
The selector 271 selects, based on the selection signal, the signal of the sound collection area
selected by the first estimation unit 251 among the signals of the four sound collection areas
collected and collected by the right beam generation unit 231 as the second estimation unit 252
And select a signal to be supplied to the signal selection unit 26.
Further, the selector 272 second-estimates the signal of the sound collection area selected by the
first estimation unit 251 among the signals of the four sound collection areas collected by the
10-05-2019
18
left beam generation unit 232 based on the selection signal. A signal to be supplied to the unit
252 and the signal selection unit 26 is selected.
[0065]
The second estimation unit 252 receives the audio signal of the sound collection area estimated
by the first estimation unit 251 and selectively output from the selectors 271 and 272. The
second estimation unit 252 compares the input audio signals of the left and right sound
collecting areas, and determines that the larger one of the levels is the audio signal of the true
sound source. The second estimation unit 252 outputs the information indicating the direction
and distance of the sound collection area where the true sound source is present as the position
information 2522 to the multiplexing unit 28, and the sound signal of the true sound source to
the signal selection unit 26. Are selectively input to the multiplexing unit 28.
[0066]
The multiplexing unit 28 multiplexes the position information 2522 input from the second
estimation unit 252 and the voice signal 261 of the true sound source selected from the signal
selection unit 26, and transmits the multiplexed signal to the partner apparatus. To send.
[0067]
In addition, these estimation parts 251 and 252 repeat and estimate a sound source position for
every fixed period.
For example, it repeats every 0.5 seconds. In this case, signal waveforms or amplitude effective
values for 0.5 seconds may be compared. As described above, when the sound source position is
repeatedly estimated for each predetermined period to switch the sound collection area, it is
possible to perform sound collection corresponding to the movement of the speaker.
[0068]
If the true sound source position and the daemon sound source position due to wraparound
overlap, a differential signal obtained by differentiating the left and right signal waveforms may
10-05-2019
19
be output to the partner apparatus as a sound collection signal. The difference signal is that only
the daemon sound source waveform is canceled and the signal waveform of the true sound
source is stored.
[0069]
Further, in order to cope with the case where the speaker exists across two sound collection
areas or the case where the speaker moves, the following alternative form is also conceivable.
The first estimation unit 251 selects two sound collection areas in descending order of the
strength of the difference signal, and outputs the strength ratio. The second estimation unit 252
compares the maximum pair or the two pairs of the signal strength to estimate which side the
true sound source is on. The signal selection unit 26 combines the two voice signals on one side
selected by the first estimation unit 251 and the second estimation unit 252 by applying weights
to the instructed intensity ratio, and outputs the result as an output signal 261. Do. As described
above, if voices at two positions are always synthesized with the weight of the signal strength
ratio, crossfading similar to the above is always applied to the movement of the speaker, and
sound image localization naturally moves.
[0070]
<< Configuration of Reception Unit 3 Forming Voice Beam >> Next, an internal configuration of
the reception unit 3 will be described with reference to FIG. The receiving unit 3 receives an
audio signal from the other device, and localizes the audio signal from the position information
separated by the audio signal receiving unit 31 that separates the position information from the
subcode of the audio signal and the sound signal receiving unit 31. Based on the parameters
input from the parameter calculation unit 32 that determines the position and calculates the
directivity control parameter for localizing the sound image at the position, the directivity of the
received voice signal is controlled based on the parameters From directivity control unit 33, D /
A converter 34i (i = 1 to N) for converting an audio signal whose directivity is controlled to an
analog signal, and D / A converter 34i (i = 1 to N) And an amplifier 35i (i = 1 to N) for amplifying
the output analog audio signal. The analog audio signal output from the amplifier 35i is supplied
to the external speaker SPi (i = 1 to N) shown in FIG.
[0071]
10-05-2019
20
The voice signal receiving unit 31 is a functional unit that communicates with a partner
apparatus via the Internet, a public telephone line, and the like, and includes a communication
interface, a buffer memory, and the like. The audio signal receiving unit 31 receives an audio
signal 30 including position information 2522 as a subcode from the other device. The position
information is separated from the subcode of the received audio signal and input to the
parameter calculation unit 32, and the audio signal is input to the directivity control unit 33.
[0072]
The parameter calculation unit 32 is a calculation unit that calculates parameters used in the
directivity control unit 33. The parameter calculation unit 32 generates a focus at a position
based on the received position information, and the sound signal is emitted from this focus The
delay amount to be given to the audio signal supplied to each speaker unit to give such
directivity is calculated.
[0073]
The directivity control unit 33 processes the audio signal received by the audio signal receiving
unit 31 for each output system of the speaker SPi (i = 1 to N) based on the parameter set by the
parameter calculating unit 32.
That is, a plurality of processing units corresponding to each of the speakers SPi (i = 1 to N) are
provided in parallel. Each processing unit sets the delay amount and the like for the audio signal
based on the parameters (delay amount parameter and the like) calculated by the parameter
calculation unit 32 and sets them in the D / A converter 34i (i = 1 to N). Output.
[0074]
The D / A converter 34i (i = 1 to N) converts the digital audio signal for each output system
output from the directivity control unit 33 into an analog signal and outputs it. The amplifier 35i
(i = 1 to N) amplifies the analog audio signal output from the D / A converter 34i (i = 1 to N) and
outputs it to the speaker SPi (i = 1 to N). .
[0075]
The position information of the audio signal from the speaker array SPA installed on the bottom
10-05-2019
21
of the main body of the device is provided so that the receiver 3 described above reproduces the
positional relationship of the sound source in the other device with the audio signal received
from the other device. , And outputs the beam, and reproduces the directivity as if the sound was
output from the virtual sound source position.
[0076]
Second Embodiment Next, a teleconference device according to a second embodiment will be
described with reference to FIG.
This embodiment is an application of the first embodiment shown in FIG. 4, and the same parts
are denoted by the same reference numerals and the explanation is applied mutatis mutandis.
Further, FIG. 3 will be supplementarily referred to in the explanation of the sound collection
beam.
[0077]
In the first embodiment, it is assumed that the true sound source is present in either of the sound
pickup area pairs where the difference signal is large, and the second estimation unit 252
estimates where the true sound source is present. In the embodiment, the detailed position
search beam (narrow beam) generation for detecting the sound source position accurately by
further searching for the sound collection area where the true sound source estimated by the
second estimation unit 252 exists is further detailed. The functions 2313 and 2323 are provided.
[0078]
When the second estimation unit 252 estimates that the true sound source 999 exists in the
sound collection area 414R as illustrated in FIG. 3, the second estimation unit 252 notifies the
first beam generation unit 231 of this estimation result. .
As described above, since the second estimation unit 252 estimates which side of the
microphone arrays MR and ML the true sound source is present, the notification 2523 or 2524
of the estimation result is input to only one of them. If it is estimated that a true sound source
exists in the left area, the second estimation unit 252 notifies the second beam generation unit
10-05-2019
22
232 of the estimation result.
[0079]
Based on the notification, the first beam generation unit 231 operates the detailed position
search beam generation function 2313 to generate a narrow beam focused on the narrow pickup
areas 431 to 434 in FIG. The position of the sound source 999 is searched.
[0080]
Further, the apparatus of the second embodiment includes a third estimation unit 253 and a
fourth estimation unit 254.
From the sound collection beams of the detailed position search beam generation functions 2313
and 2323, two are selected in descending order of signal strength. However, it is only the side
estimated by the second estimation unit 252 that operates among the estimation units 253 and
254.
[0081]
In the example of FIG. 3, the sound signal is collected from the collected sound beam directed to
the narrow collection areas 431 to 434, and the true sound source 999 straddles the collection
area 434 and the collection area 433. Exist in position. In this case, the third estimation unit 253
selects audio signals collected from the collection areas 434 and 433 in descending order of
signal strength. The third estimation unit 253 proportionally distributes the focus position of the
selected sound collecting area according to the signal strengths of the two selected sound signals
to estimate and output the position of the speaker, and also selects the selected two sounds. The
signals are weighted and combined and output as a speech signal.
[0082]
Although the first beam generation unit 231 (detail position search beam generation function
2313) and the third estimation unit 253 in the right area have been described above, the second
beam formation unit 232 in the left area (detail position search beam generation function) 2323)
and the fourth estimation unit 254 have the same configuration and execute the same processing
operation.
10-05-2019
23
[0083]
The function of the detailed position search of the device of the second embodiment described
above may not catch up with the process if the speaker moves frequently.
Therefore, it is conceivable to use this function only when the position of the speaker output
from the second estimation unit 252 remains for a predetermined time. In this case, if the
position of the speaker output from the second estimation unit 252 moves within a
predetermined time, even if the configuration shown in FIG. 7 is provided, it is the same as in the
first embodiment shown in FIG. The operation of should be performed.
[0084]
The estimation units 253 and 254 performing this narrowing estimation correspond to the "third
sound source position estimation means" of the present invention.
[0085]
Third Embodiment Next, a transmitter of a teleconference device according to a third
embodiment of the present invention will be described with reference to FIG.
FIG. 8 is a block diagram of this transmission unit. The transmission unit 2 of the apparatus of
this embodiment generates a sound collection beam using the point that the input of the
difference value calculation circuit 22 is the output of the A / D converters 211 and 212 and the
output signal of the difference value calculation circuit 22. The difference is that the third beam
generation unit 237 is provided, the fourth beam generation unit 238 and the fifth beam
generation unit 239 are provided, and the selectors 271 and 272 are not provided. The other
parts are denoted by the same reference numerals and the above description is applied
correspondingly. Hereinafter, only differences and important points of the apparatus of this
embodiment will be described.
[0086]
10-05-2019
24
As shown in FIG. 8, the outputs of the A / D converters 211 and 212 are directly input to the
difference value calculation circuit 22. Therefore, in the apparatus of the second embodiment,
the numbers N of the microphones MRi and the microphones MLi are the same and provided at
symmetrical positions. The difference value calculation circuit 22 calculates “(voice signal of the
microphone MRi) − (voice signal of the microphone MLi)” (i = 1 to N). As a result, as in the
apparatus of the embodiment shown in FIG. 4, it is possible to cancel the amount of sound that
has entered from the speaker array SPA being input to the microphone arrays MR and ML.
[0087]
Here, in the device of the third embodiment, the respective microphones MRi and MLi need to be
substantially symmetrical about the longitudinal center line of the speaker array SPA. This is
because the difference value calculation circuit 22 cancels the wraparound sound between the
microphones. The difference value calculation circuit 22 always performs calculation while the
microphone arrays MR and ML of the teleconference device 1 are activated.
[0088]
Similar to the first beam generation unit 231 and the second beam generation unit 232, the third
beam generation unit 237 focuses four virtual sound collection areas based on the bundle of
output signals of the difference value calculation circuit 22. Output a sound collection beam. This
virtual sound collecting area corresponds to a pair of sound collecting areas (411R and 411L,
412R and 412L, 413R and 413L, 414R and 414L: see FIG. 3) set symmetrically with respect to
the center line 101 of the speaker array SPA. The audio signals output by the third beam
generation unit 237 are the same as the difference signals D (411), D (412), D (413), and D (414)
in the first embodiment. If this difference signal is output to the first estimation unit 251 through
the BPF 241, estimation of the sound source position can be performed similarly to the first
estimation unit 251 of the apparatus shown in FIG. The estimation results 2511 and 2512 are
output to the fourth beam generation unit 238 and the fifth beam generation unit 239.
[0089]
The fourth beam generator 238 and the fifth beam generator 239 in FIG. 8 will be described. The
digital audio signals output from the A / D converters 211 and 212 are directly input to the
10-05-2019
25
fourth beam generator 238 and the fifth beam generator 239. Based on this digital audio signal,
a sound collection beam having a focus on the sound collection area indicated by the estimation
results 2511 and 2512 input from the first estimation unit 251 is generated, and an audio signal
of the sound collection area is extracted. That is, the sound collection beams generated by the
fourth beam generation unit 238 and the fifth beam generation unit 239 correspond to the
sound collection beams selected by the selectors 271 and 272 in the first embodiment.
[0090]
As described above, the fourth beam generation unit 238 and the fifth beam generation unit 239
output only one system of sound output collected by the sound collection beam instructed from
the first estimation unit 251. An audio signal collected by the fourth beam generation unit 238
and the fifth beam generation unit 239 from the sound collection area which is the focal point of
each sound collection beam is input to the second estimation unit 252.
[0091]
The following operation is similar to that of the first embodiment. The second estimation unit
252 compares the two audio signals, and determines that the sound source is present in the
sound collection area having the larger level. The second estimation unit 252 outputs the
information indicating the direction and distance of the sound collection area where the true
sound source is present as the position information 2522 to the multiplexing unit 28, and the
sound signal of the true sound source to the signal selection unit 26. Are selectively input to the
multiplexing unit 28. The multiplexing unit 28 multiplexes the position information 2522 input
from the second estimation unit 252 and the voice signal 261 of the true sound source selected
from the signal selection unit 26, and transmits the multiplexed signal to the partner apparatus.
To send.
[0092]
In the third embodiment shown in FIG. 8 as well, as in the second embodiment, estimation can be
performed in multiple stages, and the position of the sound source can be narrowed by
narrowing again at first. . In that case, the second estimation unit 252 outputs instruction inputs
2523 and 2524 instructing to search a narrower range to the fourth and fifth beam generation
units 238 and 239 when one search is completed. This operation is output only to the beam
10-05-2019
26
generation unit on the side where the sound source is present. When receiving this instruction
input, the beam generation unit that has received this instruction input reads the delay pattern
corresponding to the narrower range inside, and rewrites the data 40 j of the delay pattern from
the ROM.
[0093]
In the first and third embodiments, the first estimation unit 251 selects one sound collecting area
(41 jR, 41 j L) from each of the left and right sound collecting areas 411 R to 414 R and 411 L to
414 L, and further The second estimation unit 252 estimates which of the 41 j R and 41 j L the
true sound source is present, but the second estimation unit does not necessarily have to be
provided.
[0094]
For example, when there is no noise source on the opposite side of the true sound source, for
example, when the teleconferencing device is used only on the right side or the left side, a
synthetic signal (or a difference between both voices of the sound collecting areas 41jR and 41jL)
It is because there is no problem if the signal) is output as it is to the partner apparatus as a
collected signal.
[0095]
The numerical values and the like shown in these embodiments do not limit the present
invention.
Further, in the above diagram, even when there is an exchange of signals between the
configurations of blocks that exhibit functions, even in the configuration in which a part of the
functions of these blocks are processed by the other block, the embodiment described above and
Similar effects may be exhibited.
[0096]
A diagram showing an appearance and a usage pattern of the teleconference device according to
the first embodiment of the present invention A diagram illustrating an audio beam and a sound
collection beam of the teleconference device A sound collection area set in a microphone array of
the teleconference device The block diagram of the transmission unit of the teleconference
device The block diagram of the first beam generation unit of the teleconference device The
10-05-2019
27
block diagram of the reception unit of the teleconference device The teleconference device of the
second embodiment of the present invention Block diagram of transmission unit Block diagram
of transmission unit of the teleconference device of the third embodiment of the present
invention
Explanation of sign
[0097]
DESCRIPTION OF SYMBOLS 1 ... Remote conference apparatus, 2 ... Transmission part, 22 ...
Difference value calculation circuit, 231 ... 1st beam production part, 232 ... 2nd beam
production part, 237 ... 3rd beam production part, 238 ... 4th beam production part, 239 ... fifth
beam generation unit, 251 ... first estimation unit, 252 ... second estimation unit, 253 ... third
estimation unit, 254 ... fourth estimation unit, 26 ... signal selection unit, 271, 272 ... selector, 28
... Multiplexing unit 3 reception unit 31 data reception unit 32 parameter calculation unit 33
directivity control unit 45 j (j = 1 to K) delay processing unit 40 j (j = 1 to K) delay pattern
Memory, 461i (i = 1 to N): delay, 47j (j = 1 to K): microphone input combining unit SPi (i = 1 to
M): speaker, SPA: speaker array ML, MR: microphone array, MLi i = 1 to N), MRi (i = 1 to 1) N) ...
microphone 100 ... desk, 101 ... center line 411R ~ 414R, 411L ~ 414L ... sound collecting area,
999 ... sound source (speaker)
10-05-2019
28
Документ
Категория
Без категории
Просмотров
0
Размер файла
44 Кб
Теги
jp2007142595
1/--страниц
Пожаловаться на содержимое документа