close

Вход

Забыли?

вход по аккаунту

?

JP2011071702

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011071702
[PROBLEMS] To clearly pick up simultaneous sounding by a plurality of sounding bodies. A
sound collection processing unit (30) is a signal of an output sound whose sound collection
directivity is controlled based on a plurality of collected sound signals collected by a microphone
array (2) having a plurality of microphones whose relative positions are fixed. Generate The
sounding body information acquisition unit 10 acquires an input of information on the number
and arrangement of sounding bodies present in the sound collection range of the microphone
array 2. The sound collection directivity range setting unit 20 sets the direction of the sound
collection directivity directed from the microphone array 2 to the sounding body for each of the
sounding bodies based on the information on the arrangement of the sounding bodies acquired
by the sounding body information acquisition unit 10 Do. Furthermore, the sharpness of sound
collection directivity directed to the sounding body is also set for each of the sounding bodies
based on the information on the number of sounding bodies acquired by the sounding body
information acquisition unit 10. The sound collection processing unit 30 generates and outputs a
signal of an output sound whose sound collection directivity is controlled to the direction and
sharpness set by the sound collection directivity range setting unit 20. [Selected figure] Figure 1
Sound pickup processing device, sound pickup processing method, and program
[0001]
The embodiments discussed herein relate to audio signal processing techniques.
[0002]
Several techniques for controlling the sound collection directivity with a microphone (hereinafter
sometimes referred to as a "microphone") that picks up a speaker's voice are known as
04-05-2019
1
techniques for clearly picking up a speaker's voice. .
[0003]
In such a technology, a camera with an angle of view equal to the directional beam width of a
single sound collection directional microphone is prepared, and the microphone and camera are
integrated so that the directivity and the angle of view match. There are also cameras that are
used in camera integrated microphones for camera conferencing.
With this camera integrated microphone, the image of the face of the speaker is detected from
the image captured by the camera.
Then, control is performed to direct the camera so that the detected face image is positioned at
the center of the captured image, and the center of the sound collection directivity of the
microphone (which may hereinafter simply be referred to as "directivity") Is directed to the
speaker. In this camera integrated microphone, there is known a technique of controlling the
directivity direction of the microphone in accordance with the number of face images detected
from a captured image. This first technique directs the microphone directivity to the closest
speaker to the microphone if the number is odd, and the second closest speaker to the
microphone closest to the microphone if the number is even The directionality of the microphone
between the two, to accurately capture the voice of the speaker at the meeting.
[0004]
Moreover, there exists a technique of controlling the sharpness of the directivity of a microphone
based on the magnitude | size on the said captured image of the image of the person contained in
a captured image as another one of such a technique. In this second technique, when the image
of a person is large on the captured image, it is determined that the imaging intention
emphasizes the person, and the directivity of the microphone is selected in order to clearly pick
up the voice of the person. Control to sharpen. On the other hand, when the image of a person is
small on the captured image, it is determined that the imaging intention is the entire surrounding
environment including the person, and the ambient sound is collected together with the voice of
the person. Control so as to make the directivity of the microphone blunt.
04-05-2019
2
[0005]
Besides the above, as a technique related to the embodiment discussed in the present
specification, the voice emitted from the sound source in a predetermined direction is
emphasized in the audio signal obtained by collecting the voices from the sound sources present
in a plurality of directions. A third technique for suppressing ambient noise is known. In this
technique, voices from sound sources present in a plurality of directions are collected by a
plurality of microphones (microphone arrays), and a voice signal on a time axis output from each
microphone is subjected to, for example, Fourier transform, Each is converted to the upper audio
signal. For each audio signal on the frequency axis, a phase difference at the same frequency is
calculated for each frequency, and based on the phase difference, the probability that a sound
source exists in a predetermined direction is specified for each frequency. Then, based on the
probability, a suppression function for suppressing an audio signal component based on a sound
source other than the sound source in the predetermined direction is determined, and the
obtained suppression function is multiplied by the audio signal on the frequency axis. Thereafter,
when the multiplication result is, for example, inverse Fourier transformed to be restored to a
signal on the time axis, an audio signal based on a sound source can be obtained in a
predetermined direction.
[0006]
JP, 2009-49734, A JP, 2009-65587, A JP, 2007-318528, A
[0007]
In the case where there are a plurality of speakers within the sound pickup range of the
microphone, the speaker is not the one closest to the microphone even if the directivity direction
of the microphone is controlled as in the first technique described above In some cases, it may
not be possible to accurately capture the speaker's voice.
In addition, in the second technique described above, it is difficult to clearly collect the
simultaneous utterances of a plurality of persons.
[0008]
The present invention has been made in view of the above-mentioned problems, and the problem
to be solved is to clearly pick up simultaneous sounding by a plurality of sounding members.
04-05-2019
3
[0009]
One of the sound collection devices described later in this specification includes a sound
collection processing unit, an acquisition unit, and a sound collection directivity range setting
unit.
Among these, the sound collection processing means generates an output sound signal whose
sound collection directivity is controlled based on a plurality of sound collection signals collected
by a microphone array having a plurality of microphones whose relative positions are fixed. . In
addition, the acquisition means acquires an input of information on the number and arrangement
of sounding members present in the sound collection range of the microphone array. Then, the
sound collection directivity range setting means sets the direction of the sound collection
directivity directed from the microphone array to the sounding body for each of the sounding
bodies based on the information on the arrangement of the sounding bodies acquired by the
acquisition means. In addition, the sound collection directivity range setting means sets the
sharpness of the sound collection directivity directed to the sounding body for each of the
sounding bodies based on the information on the number of sounding bodies acquired by the
acquiring means. In the sound collection device having these, the above-described sound
collection processing means generates and outputs an output sound signal whose sound
collection directivity is controlled to the direction and sharpness set by the sound collection
directivity range setting means.
[0010]
Also, one of the sound collection methods described later in this specification is an output in
which sound collection directivity is controlled based on a plurality of sound collection signals
collected by a microphone array provided with a plurality of microphones whose relative
positions are fixed. It generates a sound signal.
[0011]
In this method, first, input of information on the number and arrangement of sounding members
present in the sound collection range of the microphone array is obtained.
04-05-2019
4
Next, the direction of sound collection directivity directed from the microphone array to the
sounding body is set for each of the sounding bodies based on the information on the
arrangement of the acquired sounding bodies. Further, together with this, the sharpness of sound
collection directivity to be directed to each of the sounding bodies is set for each of the sounding
bodies based on the information of the number about the acquired sounding bodies. Then, next, a
signal of an output sound whose sound collection directivity is controlled to the set direction and
sharpness is generated and output.
[0012]
In addition, one of the programs described later in this specification is an output sound in which
sound collection directivity is controlled based on a plurality of collected sound signals collected
by a microphone array including a plurality of microphones whose relative positions are fixed. A
program for causing a computer to generate a signal. The program causes the computer to
execute acquisition processing, sound collection directivity range setting processing, and sound
collection processing by being executed by the computer. Here, the sound collection process is a
process of acquiring an input of information on the number and arrangement of sounding
members present in the sound collection range of the microphone array. In addition, the sound
collection directivity range setting process is a process of setting the direction of the sound
collection directivity directed from the microphone array to the sounding body for each of the
sounding bodies based on the information on the arrangement of the sounding bodies acquired
in the acquisition process. It is. In addition, the sound collection directivity range setting process
includes a process of setting the sharpness of sound collection directivity directed to the sound
generator for each of the sound generators based on the information on the number of sound
generators acquired by the acquisition process. . The sound collection process is a process of
generating and outputting a signal of an output sound whose sound collection directivity is
controlled to the direction and sharpness set by the sound collection directivity range setting
process.
[0013]
The sound collection device described later in this specification can clearly pick up simultaneous
sounding by a plurality of sounding bodies.
[0014]
It is a first example of the configuration of a sound collection system.
04-05-2019
5
It is an example of the frequency characteristic of the phase difference range between two
microphones of a collection signal. It is an example of data which a face position detection
system acquires from a photography picture. It is explanatory drawing (the 1) of the setting of
the sharpness of sound collection directivity. It is explanatory drawing (the 2) of the setting of
the sharpness of sound collection directivity. It is explanatory drawing (the 3) of the setting of
the sharpness of sound collection directivity. It is explanatory drawing (the 1) of extraction of the
sounding body excluded from the sound source of an output audio | voice. It is explanatory
drawing (the 2) of extraction of the sounding body excluded from the sound source of an output
audio | voice. It is a 2nd example of a structure of a sound collection system. It is a configuration
of a computer operated as a sound collection device. It is the flowchart which illustrates the
processing contents of the control processing which is executed by the computer.
[0015]
First, FIG. 1 will be described. FIG. 1 illustrates a first example of the configuration of a sound
collection system. This sound collection system includes a sound collection device 1, a
microphone array 2, a camera 3, and a face position detection system 4.
[0016]
The sound collection device 1 performs signal processing on the sound collection signal of the
microphone array 2 and outputs an output sound whose sound collection directivity is
controlled. The microphone array 2 is configured by arranging a plurality of microphones, for
example, in a horizontal direction. The relative position between the microphones constituting
the microphone array 2 is fixed.
[0017]
The sound collection device 1 includes a sounding body information acquisition unit 10, a sound
collection directivity range setting unit 20, a sound collection processing unit 30, and an
excluded sounding body extraction unit 40. The sounding body information acquisition unit 10
acquires an input of information on the number and arrangement of sounding bodies present in
the sound collection range of the microphone array 2. The sound generator may be, for example,
04-05-2019
6
an animal such as a dog or a cat, as long as it emits a sound, and further, a sound emitting device
provided with a speaker, or as a noise not intended for sound generation, operates as noise It
may be a machine that emits a sound. However, in the sound collection system of FIG. 1, a human
who vocalizes is assumed as a sound producing body present in the sound collection range of the
microphone array 2, and the sound generation body information acquiring unit 10 collects the
sound collection range of the microphone array 2. An input of information on the number and
arrangement of people present in the inside is acquired from the face position detection system
4.
[0018]
The sound collection directivity range setting unit 20 sets the direction of the sound collection
directivity directed from the microphone array 2 to the information on the arrangement of the
sounding body (human in this embodiment) acquired by the sounding body information
acquiring unit 10. Based on, it sets about each of the said sounding body (human). The sound
collection directivity range setting unit 20 further sets the sharpness of the sound collection
directivity to be directed to the sounding body (human) from the microphone array 2 by the
number of the sounding body (human) acquired by the sounding body information acquiring unit
10. It sets about each of the said pronunciation body (human) based on information. The method
of setting the direction and sharpness of the sound collection directivity performed by the sound
collection directivity range setting unit 20 will be described later.
[0019]
The sound collection processing unit 30 collects the sound collection directivity in the direction
and sharpness set by the sound collection directivity range setting unit 20 based on the plurality
of collected sound signals collected by each of the microphones 2. To generate an output sound
(output voice) signal that has been controlled. In the present embodiment, the sound collection
processing unit 30 performs the control of the sound collection directivity using the known
method disclosed by Patent Document 3 described above as follows.
[0020]
The sound collection processing unit 30 includes a directional sound reception processing unit
31 and an output sound signal generation unit 32. The directional sound reception processing
04-05-2019
7
unit 31 first performs analog-digital conversion on each of the plurality of collected sound
signals collected by the microphone array 2 to obtain collected sound signal data in the time
domain. Next, frequency spectrum data of each sound collection signal is obtained by performing
time-frequency conversion such as fast Fourier transform on the sound collection signal data.
[0021]
Next, the directional sound reception processing unit 31 determines the spectral phase difference
between the frequency spectrum data of each of the other collected signals and the frequency
spectrum data of one of the collected signals as a reference. , The process which calculates about
each spectral frequency is performed.
[0022]
Next, the directional sound reception processing unit 31 obtains a weighting value to be given to
the frequency spectrum of the sound collection signal in order to obtain the sound collection
directivity of the direction and sharpness set by the sound collection directivity range setting unit
20.
Then, with respect to the frequency spectrum of the collected sound signal of the reference
described above, this weighting value is multiplied for each spectrum frequency to perform
weighting processing. The weighting value for each spectral frequency is determined, for
example, as follows.
[0023]
First, based on the information on the direction and sharpness of the sound collection directivity
set by the sound collection directivity range setting unit 20, the directional sound reception
processing unit 31 receives the sound coming from within the range of the sound collection
directivity, the microphone array 2 The frequency characteristic of the phase difference range
between microphones of the sound collection signal that may occur when sound is collected by
[0024]
Here, FIG. 2 will be described.
04-05-2019
8
FIG. 2 shows an example of the frequency characteristic of the phase difference range between
the two microphones of the collected signal, and the sound coming from within the range of the
collection directivity with a sharpness of a beam width of ± θdef is collected by the two
microphones. The frequency characteristic of the phase difference range between microphones
of the sound collection signal which may occur when it is sounded is shown. In FIG. 2, the
horizontal axis is the frequency of the sound emitted from the sound source, and the vertical axis
is the phase difference between the two microphones when this sound is collected. The
relationship between the frequency indicated by the frequency characteristic and the phase
difference can be geometrically calculated using as a parameter the direction angle of the sound
source around the middle point of the arrangement positions of the two microphones.
[0025]
In order to obtain this frequency characteristic, for example, a table in which the relationship
between the sharpness of the sound collection directivity and the frequency characteristic of the
phase difference range between the microphones of the sound collection signal is shown can be
obtained for each direction of the sound collection directivity. It is also possible to make a
database in advance and store it in the directional sound reception processing unit 31. In this
case, the directional sound reception processing unit 31 refers to this database and reads out
from the table the information associated with the information set by the sound collection
directivity range setting unit 20, thereby The frequency characteristic of the phase difference
range is determined.
[0026]
Next, weighting values based on the phase difference of each spectrum, which are given to the
frequency spectrum of the collected signal, are set for each frequency of the spectrum. This
weighting value given to each spectrum is determined as follows.
[0027]
First, with reference to the frequency characteristic of the phase difference range of the collected
sound signal obtained previously, the phase difference range in the frequency of the spectrum to
which the weighting is given is obtained from the frequency characteristic. Next, weighting
04-05-2019
9
values are set based on the relationship between the phase difference previously obtained for
each spectral frequency and the phase difference range. For example, for a spectrum whose
phase difference is within the phase difference range and within a predetermined value from the
center of the range, this weighting value is set to "1.0", and the phase difference is within the
phase difference range. This weighting factor is set to "0.0" for the outer spectrum. In addition,
when the phase difference is within the phase difference range but is separated from the center
of the range by the predetermined value or more, the phase difference is in the range of “1.0”
to “0.0” according to the distance from the center For example, linear interpolation is
performed, and a weighting value is set so as to be continuous with the above-described setting
value at the boundary of the range.
[0028]
The weighting value given to each spectrum is determined as described above. In addition, this
weighting value is corresponded to what is called "a suppression function" in patent document 3.
FIG. Note that a table in which the relationship between the phase difference and the phase
difference range and the above-described weighting setting value is shown in advance may be
database-stored for each spectral frequency and stored in the directional sound reception
processing unit 31. . In this case, the directional sound reception processing unit 31 refers to this
database to read out and set from the table the weighting value associated with the phase
difference and the phase difference range in each spectrum.
[0029]
In the present embodiment, with respect to weighting values obtained as described above by the
number of phase differences calculated above, addition averaging is performed for each
spectrum frequency to give the frequency spectrum of the collected sound signal as a reference.
The weight value for each spectral frequency to be
[0030]
The output sound signal generation unit 32 converts the frequency spectrum of the sound
collection signal to which the above-described weighting is given by the directional sound
reception processing unit 31 into inverse transform (for example, reverse fast Fourier transform)
with respect to the conversion in the directional sound reception processing unit 31. Conversion)
to convert into time domain audio signal data and output.
04-05-2019
10
The sound signal data is generated based on a plurality of collected sound signals collected by
the microphone array 2, and the output sound whose sound collection directivity is controlled to
the direction and sharpness set by the sound collection directivity range setting unit 20 It is a
signal. The sound pickup processing unit 30 is configured as described above.
[0031]
The excluded sound generator extraction unit 40 excludes from the sound source of the output
voice generated by the sound collection processing unit 30 based on the information of the
arrangement of the sound generator (in this embodiment, the human being) acquired by the
sound generator information acquisition unit 10 Extract the phonetic body (human). The sound
collection directivity range setting unit 20 sets the direction and sharpness of the sound
collection directivity to be directed to the sounding body (human) from the microphone array 2
when the excluded sounding body extraction unit 40 performs this extraction. Among (human),
the setting is made for each of those except the ones extracted by the excluded sound generator
extraction unit 40. The method of extracting the excluded sound generator by the excluded
sound generator extraction unit 40 will be described later. The sound collection device 1 is
provided with the above components.
[0032]
The camera 3 repeatedly performs photographing of an image within the sound collection range
of the microphone array 2 at a fixed magnification at predetermined time intervals. In the
present embodiment, the camera 3 is disposed at substantially the same position as the
microphone array 2.
[0033]
The face position detection system 4 (hereinafter simply referred to as “detection system 4”) is
within the sound collection range of the microphone array 2 by performing image processing on
the image captured by the camera 3 Information on the number and arrangement of sounding
objects (humans in this embodiment) is obtained. This information is input to the sound
collection device 1 and acquired by the sound generator information acquisition unit 10.
04-05-2019
11
[0034]
Here, image processing by the detection system 4 will be described. First, the detection system 4
performs detection processing of an image of a human face from an image captured by the
camera 3. A known technique is used for the face detection method. In the present embodiment,
processing is performed to compare the image of the partial region extracted from the image
with each of the face pattern images read from the database of face patterns prepared in advance
and calculate the degree of correlation between the two. The degree of correlation is
comprehensively calculated based on, for example, the contour of the face, the relative positions
of the eyes, the nose, and the mouth, the color of the face, and the like. Then, if there is one
whose degree of correlation is higher than a predetermined value, the partial region is taken as
the detection result of the image of the human face. By performing this process over the entire
captured image while changing the position and size of the partial region, the number of face
images included in the captured image, and the position and size of the image of each face in the
captured image can be obtained. To detect. The detection result of the number of face images
included in the photographed image is the number of sounding members present in the sound
collection range of the microphone array 2 obtained from the image obtained by photographing
the sound collection range of the microphone array 2. The information is output from the
detection system 4 as information.
[0035]
Next, the detection system 4 determines an orientation angle from the microphone array 2 to the
face represented in the partial area based on the position on the captured image of the partial
area (that is, the image of the face) detected from the photographed image. Do the process of
asking for In order to obtain this direction angle, for example, a table created by measuring the
relationship between the position of the partial area and the direction angle can be stored in the
detection system 4 in advance. In this case, the detection system 4 refers to this table and reads
the direction angle associated with the position of the partial area from the table to obtain the
direction angle from the microphone array 2 to the face.
[0036]
Next, the detection system 4 calculates the distance from the microphone array 2 to the face
represented in the partial area based on the size of the partial area (that is, the image of the face)
04-05-2019
12
detected from the captured image. Do. In order to obtain this distance, for example, a table
created by measuring the relationship between the size of the partial area and the distance can
be stored in advance in the detection system 4. In this case, the detection system 4 refers to this
table and reads the distance associated with the size of the partial area from the table to obtain
the distance from the microphone array 2 to the face.
[0037]
Next, the detection system 4 performs processing for obtaining the direction of the face based on
the positional relationship between the eyes, nose, and mouth shown in the image of the face
detected from the photographed image. In this process, for example, based on the positions of
both eyes, nose, and mouth included in the image of the face, the distance from the straight line
passing through the position of the nose and the position of the mouth to the position of the
right eye and the position of the left eye First find the distance of Then, based on the ratio of the
two distances, an angle indicating the direction of the face is determined. In order to obtain this
angle, for example, a table created by measuring the relationship between the above-mentioned
distance ratio and the above-mentioned angle can be stored in advance in the detection system 4.
In this case, the detection system 4 refers to this table and reads out from the table the distance
associated with the above-described distance ratio obtained based on the photographed image,
thereby indicating the angle indicating the direction of the face. I ask for it.
[0038]
Data acquired by the detection system 4 from the photographed image as described above is
shown in FIG. FIG. 3 shows the case where there are two persons (person A and person B) within
the shooting range of the camera 3 (that is, within the sound collection range of the microphone
array 2). Here, the detection system 4 detects the direction angle θA, the distance dA, the face
angle θ2A of the person A, the direction angle θB of the person B, the distance dB, and the face
shown in FIG. The angle θ2B is determined.
[0039]
Here, instead of the detection system 4 determining the direction angle θA and the distance dA
of the person A, and the direction angle θB and the distance dB of the person B, two-dimensional
coordinate values (XA, YA) and (XB, YB) may be obtained from the photographed image.
04-05-2019
13
[0040]
The detection system 4 obtains the direction angle θ, the distance d, and the face for each
person whose face image is included in the captured image obtained as described above from the
image obtained by capturing the sound collection range of the microphone array 2. Each data of
the angle .theta..sub.2 is output as the information of the arrangement of the sounding members
present in the sound collection range.
[0041]
As described above, the detection system 4 detects the information on the number and
arrangement of sounding members present in the sound collection range of the microphone
array 2 from the image each time the camera 3 captures an image in the sound collection range.
Acquire and output to the sound collection device 1.
The sound output body information acquisition unit 10 of the sound collection device 1 acquires
the information output from the detection system 4.
The sound collection system of FIG. 1 has the above components.
[0042]
Next, the method of setting the direction and sharpness of the sound collection directivity
performed by the sound collection directivity range setting unit 20 of the sound collection device
1 will be described.
[0043]
In the sound collection directivity range setting unit 20, an angle table in which the relationship
between the number of persons present in the sound collection range of the microphone array 2
and the angle indicating the sharpness of the sound collection directivity is set is stored in
advance. There is.
In this embodiment, the setting of the angle table relates the maximum value (the bluntest value)
θMAX (for example, 90 °) of the sharpness of the sound collection directivity when there is
04-05-2019
14
only one person in the sound collection range. In the case of two persons, it is assumed that the
specified value θdef (for example, 30 °) is related.
[0044]
The sound collection directivity range setting unit 20 first uses the information on the number of
persons described above as information on the number and arrangement of sounding members
(humans) present in the sound collection range of the microphone array 2, and the direction
angle θ for each person. A process of acquiring each data of distance d and face angle θ 2 from
the sound generator information acquisition unit 10 is performed. Here, the operation of the
excluded sound generator extraction unit 40 is not considered.
[0045]
Next, the sound collection directivity range setting unit 20 performs setting processing of
sharpness of sound collection directivity directed to each person based on the information of the
number of people among them. This setting process will be described with reference to FIG.
When there is only one person in the sound collection range of the microphone array 2, the
sound collection directivity range setting unit 20 executes the setting process to set the direction
of the sound collection directivity to the direction of the person. Set an angle of ± θ MAX
centered on. In the example of (1) of FIG. 4, the sound collection directivity range setting unit 20
sets the sharpness of the sound collection directivity to the angle of ± θ MAX for the person A
whose direction angle θA is located at 0 °.
[0046]
On the other hand, when there are two or more persons present in the sound collection range,
the sound collection directivity range setting unit 20 executes the setting processing to center
the sharpness of the sound collection directivity and the direction to the person. Set to an angle
of ± θdef. In the example of FIG. 4 (2), the sound collection directivity range setting unit 20
respectively corresponds to the person A located at the direction angle θA (<0) and the person B
located at the direction angle θB (> 0). The sharpness of the sound collection directivity is set to
an angle of ± θdef around the direction angle.
04-05-2019
15
[0047]
In this manner, the sound collection directivity range setting unit 20 sets the sharpness of the
sound collection directivity to be directed to each sound generator based on the information on
the number of sound generators present in the sound collection range of the microphone array 2.
Do. Since the sound collection processing unit 30 generates a signal of the output voice whose
sound collection directivity is controlled to the sharpness set by the sound collection directivity
range setting unit 20, such a setting is performed by the sound collection directivity range
setting unit 20. By doing this, the sound collection device 1 can clearly collect the simultaneous
sounding by a plurality of sounding bodies.
[0048]
Note that, by this setting process, the sound generation directivity information setting unit 20
further obtains the sharpness of the sound collection directivity to be directed to each of the
sounding bodies (humans) by the sound collection directivity range setting unit 20 from the
detection system 4 It may be set based on the information of the arrangement of the body
(human).
[0049]
For example, as shown in (1) of FIG. 5, the arrangement interval between the people A and B
present in the sound collection range of the microphone array 2 is farther than in the case of (2)
of FIG. A part of the range of the sound directivity may exceed the sound collection possible
range of the microphone array 2.
That is, the case of θA−θdef <−θMAX θB + θdef> θMAX. (The distance from the
microphone array 2 to the person A and the distance from the microphone array 2 to the person
B are the same. In such a case, the sound collection directivity range setting unit 20 executes
azimuth setting processing to set azimuth angles α and β indicating the sound collection
directivity angle range for each of the person A and the person B as follows: Set within the range
indicated by the formula. −θMAX <α <θA + θdef θB−θdef <β <θMAX
[0050]
04-05-2019
16
On the other hand, as shown in (2) of FIG. 5, the arrangement interval between the people A and
B existing in the sound collection range of the microphone array 2 is closer than in the case of
(2) of FIG. Some parts of the sound directivity range may overlap. (The distance from the
microphone array 2 to the person A and the distance from the microphone array 2 to the person
B are the same. ), That is, −θA + θB <2θdef. In such a case, the sound collection directivity
range setting unit 20 executes the setting processing to set the angle ranges α and β of the
sound collection directivity for each of the person A and the person B to the range represented
by the following equation. It is inside. θA−θdef <α <(− θA + θB) / 2 (−θA + θB) / 2 <β
<θ <θB + θdef
[0051]
As described above, the sound collection directivity set for the sound collection directivity range
setting unit 20 directs each sound generator to the information on the arrangement interval
between the sound generation bodies existing in the sound collection range of the microphone
array 2. You may make it set based on.
[0052]
Further, by this setting processing, the sharpness of the sound collection directivity directed to
each sounding body by the sound collection directivity range setting unit 20 is also based on the
information of the distance between the sounding body and the microphone array 2 as follows. It
may be set.
[0053]
In the sound collection directivity range setting unit 20, a reference value ddef of the distance
between a person present in the sound collection range of the microphone array 2 and the
microphone array 2 is stored in advance.
Here, when the distance between a person and the microphone array 2 matches the reference
distance ddef, the sound collection directivity range setting unit 20 sets the sharpness of the
sound collection directivity set in the above-described angle table. The value of the angle shown
is set as it is as the sharpness of sound collection directivity directed to the person.
[0054]
On the other hand, in the example of FIG. 6, the sound collection directivity for the person A
04-05-2019
17
whose distance to the microphone array 2 is a distance dA shorter than the reference distance
ddef, the sound collection directivity range setting unit 20 It narrowly sets in the range of the
angle of ± θdef × (dA / ddef) centering on the azimuth angle θA of
[0055]
Further, in the example of FIG. 6, the sound collection directivity for the person B whose distance
to the microphone array 2 is a distance dB longer than the reference distance ddef, the sound
collection directivity range setting unit 20 It is widely set in the range of ± θdef × (dB / ddef)
around the azimuth θB of
[0056]
The reason that the sound collection directivity range setting unit 20 sets the sound collection
directivity narrower as the distance to the microphone array 2 becomes shorter in this way is
that good sound collection is possible when the distance is short. This is because it is intended to
suppress noise other than the target sound.
On the other hand, the reason why the sound collection directivity is set wider as the distance is
longer is because, depending on the sound collection frequency band, if the distance is long, the
attenuation due to the propagation of the sound will be large, so a little sound volume can be
obtained. It is.
[0057]
In addition, since the sound collection system according to the present embodiment does not aim
to significantly increase the sound collection possible distance, control for narrowing the sound
collection directivity is not performed in order to extend the sound collection possible distance.
The setting of the direction and sharpness of the sound collection directivity by the sound
collection directivity range setting unit 20 is performed as described above.
[0058]
04-05-2019
18
Next, a method of extracting a sound generator to be excluded from the sound source of the
output sound generated by the sound collection processing unit 30, which is performed by the
excluded sound generator extraction unit 40 will be described.
The excluded sound generator extraction unit 40 first uses, as information on the arrangement of
sound generators (humans) present in the sound collection range of the microphone array 2, the
directional angle θ, the distance d, and the like for each person described using FIG. And each
face angle .theta..sub.2 data is acquired from the sounding body information acquiring unit 10.
FIG.
[0059]
Next, the excluded sound generator extraction unit 40 performs an extraction process of
extracting a sound generator (human) to be excluded from the sound source of the output sound
generated by the sound collection processing unit 30 based on the information of the
arrangement. The extraction process will be described.
[0060]
First, FIG. 7 will be described. FIG. 7 shows a state in which the person B is moving out of the two
persons (person A and person B) within the imaging range of the camera 3 (that is, within the
sound collection range of the microphone array 2).
[0061]
In the case of this example of FIG. 7, the value of the information on the person B in the
arrangement information acquired by the sounding body information acquiring unit 10 from the
detection system 4 changes each time the camera 3 captures an image. The excluded sound
generator extraction unit 40 changes the amount of change in this value, more specifically, the
amount of change in the arrangement position of the person B at the time of photographing of
each image, which is obtained from the direction angle θB and the distance dB for the person B
Calculate the distance). Then, when the amount of change exceeds a predetermined threshold
value, it is determined that it is difficult for the person B to clearly pick up the voice, and the
04-05-2019
19
person B is excluded as a sound source to be excluded from the sound source of the output voice.
Extract
[0062]
As described above, in the case of the example of FIG. 7, the excluded sounding body extraction
unit 40 uses the sound source of the output sound signal based on the amount of change in the
information of the arrangement of the sounding body acquired by the sounding body information
acquisition unit 10. Extract phonetic entities to exclude.
[0063]
Next, FIG. 8 will be described.
FIG. 8 shows that the face of the person B is the camera 3 (ie, the microphone array 2) of the two
persons (person A and the person B) within the shooting range of the camera 3 (ie, within the
sound collection range of the microphone array 2). It expresses the state of facing to the side.
[0064]
In the case of the example of FIG. 8, attention is paid to the face angle θ 2 B for the person B.
And when this value is outside the threshold range which can be said to be facing camera 3 (that
is, microphone array 2), which has been determined in advance, it is considered that clear sound
collection by the person B is difficult. The person B is extracted as a sound generator which is
judged and excluded from the sound source of the output sound.
[0065]
As described above, in the case of the example of FIG. 8, the excluded sounding body extraction
unit 40 determines the direction of the face of the human among the information on the
arrangement of the human beings that is the sounding body acquired by the sounding body
information acquisition unit 10. Based on the information, the sound generator to be excluded
from the sound source of the output sound signal is extracted.
[0066]
04-05-2019
20
The excluded sound generator extraction unit 40 notifies the sound collection directivity range
setting unit 20 of the information on the sound generator extracted as described above.
The sound collection directivity range setting unit 20 sets the direction and sharpness of the
sound collection directivity to be directed to the sounding body from the microphone array 2
with respect to the number of sounding bodies other than those extracted by the excluded
sounding body extraction unit 40. Or set based on the information of arrangement.
[0067]
The excluded sound generator extraction unit 40 can also use another method as a method for
extracting a sound generator to be excluded from the sound source of the signal of the output
voice. For example, in the case where the detection system 4 outputs the information on the
presence or absence of the movement of the mouth in the image of the face of the person
detected from the photographed image, the sound collection directivity range setting unit 20
determines the direction of the sound collection directivity described above. And the sharpness
can be set based on the information of the presence or absence of this mouth movement.
[0068]
For example, the detection system 4 performs processing of extracting the contour shape of the
mouth (lip) from the image of the face detected from the photographed image on each
photographed image photographed by the camera 3 at a time interval, and subsequently, A
process of calculating the amount of change in shape is performed. Then, if the amount of
change exceeds a predetermined threshold, it is determined that the mouth has movement, and if
the amount of change is less than the threshold, it is determined that the mouth has no
movement. The detection system 4 outputs, to the sound collection device 1, the judgment result
information on the movement of the mouth of each person thus detected from the photographed
image. The sound generator information acquisition unit 10 of the sound collection device 1
acquires this information output from the detection system 4.
[0069]
04-05-2019
21
Based on the judgment result information acquired by the sounding body information acquiring
unit 10, the excluded sounding body extracting unit 40 regards a person whose mouth is judged
to have no movement as a person who does not speak, and outputs an output sound signal. It is
extracted as a phonetic body to be excluded from the sound source of. The sound collection
directivity range setting unit 20 sets the direction and sharpness of the sound collection
directivity to be directed to the sounding body from the microphone array 2 with respect to the
number of sounding bodies other than those extracted by the excluded sounding body extraction
unit 40. Or set based on the information of arrangement.
[0070]
As described above, the excluded sound generator extraction unit 40 excludes from the sound
source of the signal of the output voice generated by the sound collection processor 30 based on
the information on the presence or absence of the movement of the human mouth acquired by
the sound generator information acquisition unit 10 You may make it extract the sounding body
to be.
[0071]
Alternatively, the excluded sounding body extraction unit 40 may be configured to perform the
process of determining the movement of the mouth of each person described above instead of
the detection system 4.
In the sound collection system configured as shown in FIG. 1, each component operates as
described above to enable clear sound collection of simultaneous pronunciation by a plurality of
sounding members.
[0072]
Next, FIG. 9 will be described. FIG. 9 illustrates a second example of the configuration of the
sound collection system. In FIG. 9, the same reference numerals are assigned to components that
perform the same operations as the first example illustrated in FIG. 1, and detailed descriptions
thereof will be omitted.
04-05-2019
22
[0073]
Similar to the first example illustrated in FIG. 1, the second example of the sound collection
system includes the sound collection device 1, the microphone array 2, the camera 3, and the
face position detection system 4. However, in the second example, the sound collection
processing unit 30 in the sound collection device 1 includes the sound generation detection unit
33 between the directional sound reception processing unit 31 and the output sound signal
generation unit 32. , Is different from the first example.
[0074]
The sound generation detection unit 33 detects the presence or absence of sound generation by
each of the sound generation bodies present in the sound collection range of the microphone
array 2 and notifies the output sound signal generation unit 32 of the detection result. The
output sound signal generation unit 32 generates a signal of output sound having as a sound
source only ones of the sound generation bodies present in the sound collection range of the
microphone array 2 in which the sound generation is detected by the sound generation detection
unit 33.
[0075]
The pronunciation detection unit 33 will be further described. The sound generation detection
unit 33 detects the presence or absence of sound generation by each of the sound generation
bodies based on the amplitude level of a predetermined frequency band in the output sound. As
described above, the directional sound reception processing unit 31 outputs the frequency
spectrum of the sound collection signal to which the weighting for obtaining the sound collection
directivity of the direction and sharpness set by the sound collection directivity range setting unit
20 is given. The output sound signal generation unit 32 converts this into sound signal data in
the time domain. Therefore, the frequency spectrum output from the directional sound reception
processing unit 31 is the frequency spectrum of the signal of the output sound output from the
output sound signal generation unit 32.
[0076]
04-05-2019
23
The sound generation detection unit 33 adds the level of the spectrum included in the
predetermined frequency band in the frequency spectrum of the signal of the output sound, and
obtains the sum value as the amplitude level of the predetermined frequency band in the output
sound. Then, it is judged whether or not this amplitude level exceeds a predetermined threshold
value, and if it exceeds, it is judged that there is a pronunciation by the sounding body, and if it is
not exceeded, the sounding by the sounding body is Determine that there is no.
[0077]
In the present embodiment, the frequency band for obtaining the amplitude level is a frequency
band (about 300 to 3400 Hz) of human-generated utterance sound. Instead of this, it may be a
frequency band (around 300 to 1000 Hz) of the first formant in the frequency spectrum of the
human vocal sound.
[0078]
Further, instead of the sound generation detection unit 33 detecting the presence or absence of
sound generation by each of the sound generation bodies based on the amplitude level of the
predetermined frequency band in the output sound, it can also be performed as follows.
[0079]
For example, the pronunciation detection unit 33 extracts a spectrum which is equal to or more
than a predetermined value from the frequency spectrum, adds the extracted spectra, and
obtains a sum value thereof.
Then, it is judged whether or not this total value exceeds a predetermined threshold value, and if
it exceeds it, it is judged that there is a pronunciation by the sounding body, and if it is not
exceeded, the sounding by the sounding body is Determine that there is no. It is also possible to
detect the presence or absence of the pronunciation by each of the pronunciation members by
the pronunciation detection unit 33 in this way.
[0080]
04-05-2019
24
Alternatively, the sound generation detection unit 33 obtains the maximum value of the spectrum
in this frequency spectrum. Then, it is judged whether or not the maximum value exceeds a
predetermined threshold value, and if it is exceeded, it is judged that there is a sound generation
by the sounding body, and if it is not exceeded, the sound generation by the sounding body is
Determine that there is no. It is also possible to detect the presence or absence of the
pronunciation by each of the pronunciation members by the pronunciation detection unit 33 in
this way.
[0081]
The sound generation detection unit 33 notifies the output sound signal generation unit 32 of
the determination result of the presence or absence of the sound generation by the sound
generation body, which is determined as described above. Based on the determination result
notified from the sound generation detection unit 33, the output sound signal generation unit 32
outputs a signal of the output sound when it is determined that there is a sound generation by
the sound generation body, and determines that there is no sound generation by the sound
generation body. Stop the output of the output voice signal when it is turned on.
[0082]
The sound generation detection unit 33 may be silent instead of stopping the output of the
output sound signal. Also, in order to reduce the sense of incongruity due to the occurrence of a
sudden silence part, white noise of a predetermined level may be output instead of silence, and
the steady state that this sound collection system is constantly generating Noise may be output.
The sound collection system configured as shown in FIG. 9 operates as described above.
[0083]
Note that the operation of the sound collection device 1 in the sound collection system illustrated
in each of FIGS. 1 and 9, that is, the output sound whose sound collection directivity is controlled
based on a plurality of sound collection signals collected by the microphone array 2 The signal
generation operation can also be performed by a computer.
[0084]
04-05-2019
25
First, FIG. 10 will be described.
FIG. 10 illustrates the configuration of the computer 50 that causes the sound collection device 1
to operate. The computer 50 includes an MPU 51, a ROM 52, a RAM 53, a hard disk drive 54, an
input unit 55, a display unit 56, an interface unit 57, and a recording medium drive unit 58.
These components are connected via a bus 59, and can exchange various data with each other
under the control of the MPU 51.
[0085]
An MPU (Micro Processing Unit) 51 is an arithmetic processing unit that controls the overall
operation of the computer 50. A ROM (Read Only Memory) 52 is a read only semiconductor
memory in which a predetermined basic control program is prerecorded. The MPU 51 reads out
and executes this basic control program when the computer 50 starts up, thereby enabling
operation control of each component of the computer 50.
[0086]
A random access memory (RAM) 53 is a semiconductor memory that can be written and read as
needed, which is used as a working storage area as needed when the MPU 51 executes various
control programs.
[0087]
The hard disk device 54 is a storage device for storing various control programs executed by the
MPU 51 and various data.
The MPU 51 can perform control processing described later by reading and executing a
predetermined control program stored in the hard disk device 54. In the present embodiment,
the relationship between the sharpness (angular value) of the sound collection directivity and the
frequency characteristic of the phase difference range between microphones of the sound
collection signal is shown for each direction (direction angle) of the sound collection directivity. It
is assumed that the database of the table being stored is stored in advance in the hard disk
device 54. In addition, it is assumed that the hard disk drive 54 stores in advance a database of
tables for each spectrum frequency, which indicates the relationship between the phase
04-05-2019
26
difference and the phase difference range and the above-described weighting setting value.
[0088]
The input device 55 is, for example, a keyboard device or a mouse device. When operated by the
user of the computer 50, the input device 55 acquires input of various information from the user
associated with the operation content, and acquires the acquired input information Is sent to the
MPU 51.
[0089]
The display device 56 is, for example, a liquid crystal display, and displays various texts and
images according to display data sent from the MPU 51.
The interface device 57 manages the exchange of various data with various devices connected to
the computer 50. More specifically, reception of data sent from the detection system 4, analogto-digital conversion of a collected signal output from each of the microphones constituting the
microphone array 2, and Temporary buffering, transmission of output audio data to subsequent
devices, etc.
[0090]
The recording medium drive device 58 is a device that reads various control programs and data
recorded on the portable recording medium 60. The MPU 51 can also perform various control
processes described later by reading out and executing a predetermined control program
recorded on the portable recording medium 60 via the recording medium drive device 58. The
portable recording medium 60 is, for example, a compact disc read only memory (CD-ROM) or a
digital versatile disc read only memory (DVD-ROM).
[0091]
In order to operate such a computer 50 as the sound collection device 1, first, a control program
for causing the MPU 51 to execute the processing content of the control processing described
later is created. The created control program is stored in advance in the hard disk drive 54 or the
04-05-2019
27
portable recording medium 60. Then, the MPU 51 is given a predetermined instruction to read
out and execute this control program. By doing this, the MPU 51 functions as the sounding body
information acquisition unit 10, the sound collection directivity range setting unit 20, the sound
collection processing unit 30, and the excluded sounding body extraction unit 40. It is possible to
provide functions.
[0092]
Next, FIG. 11 will be described. FIG. 11 is a flowchart illustrating the contents of control
processing performed by the MPU 51 in the computer 50 of FIG. In FIG. 11, when execution of
this control process is started, first, at S101, an angle defining the relationship between the
number of sounding bodies within the sound collection range of the microphone array 2 and the
angle indicating the sharpness of the sound collection directivity Initial setting processing of the
table and the maximum value of the sharpness of the sound collection directivity and the
reference distance is performed. In this process, the above-described angle table, the maximum
value θMAX of the sharpness of sound collection directivity, and the reference value ddef of the
distance between the sounding body and the microphone array 2 are acquired from the input
device 55 and stored in the hard disk device 54. Processing is performed. This process is a
process for the operation of the sound collection directivity range setting unit 20.
[0093]
Next, in S102, it is determined whether or not the interface device 57 has received data
representing the information on the number and arrangement of the sounding members present
in the sound collection range of the microphone array 2 output by the detection system 4. A
determination process is performed. When the MPU 51 determines that the data has been
received (when the determination result is Yes), the process proceeds to S103. On the other
hand, when the data is not received (when the determination result is No), the MPU 51 ends the
control process of FIG.
[0094]
Next, in S103, sounding body information acquisition processing is performed. This process
acquires, from the interface device 57, data from the detection system 4 representing the
information of the number and arrangement of the sounding members present in the sound
04-05-2019
28
collection range of the microphone array 2 received by the interface device 57. Is a process of
storing in a predetermined area of This process is a process for the operation of the sound
generator information acquisition unit 10. Note that, when the detection system 4 outputs data
representing the information on the presence or absence of movement of the human mouth as a
sound generator, the MPU 51 performs processing for storing this data in a predetermined area
of the RAM 53 as well. .
[0095]
Next, in S104, excluded sound generator extraction processing is performed. This process is a
process of extracting a sound generator to be excluded from the sound source of the output
sound generated by the sound collection processing unit 30 based on the information of the
arrangement of the sound generator acquired by the sound generator information acquisition
unit 10. Note that this process is a process for an operation as the excluded sound generator
extraction unit 40.
[0096]
In this process, first, information on the number and arrangement of sounding bodies stored in a
predetermined area of the RAM 53 is read out. Then, next, based on the read information,
calculation processing of the amount of change of the arrangement position of the sound
generator or acquisition processing of the face direction of each person who is the sound
generator is performed. Then, determination processing is performed as to whether or not the
change amount exceeds a predetermined threshold, or determination processing as to whether or
not the orientation of the face is out of a predetermined threshold range. Note that the amount of
change in the arrangement position of the sound generator is the information of the arrangement
stored in the RAM 53 in the process of S103 executed most recently and the arrangement
information stored in the RAM 53 in the process of S103 executed earlier than that. Calculated
from information.
[0097]
When information on the presence or absence of movement of the mouth of a human being who
is a sound generator is stored in a predetermined area of the RAM 53, this information is read
out to determine the presence or absence of movement of the human mouth. Here, if the
04-05-2019
29
calculation result of the change amount exceeds the predetermined threshold, the face
orientation is outside the predetermined threshold range, or if there is no human mouth
movement, such a case Processing for extracting the sound generator that corresponds to the
above as one to be excluded from the sound source of the signal of the output sound.
[0098]
Next, in S105, target sounding body determination processing is performed. This process is a
process for an operation as the sound collection directivity range setting unit 20. This process is
a process of updating various information of the sounding body stored in the predetermined area
of the RAM 53 by the process of the latest S103 based on the information of the exclusion of the
sounding body obtained by the process of S104. In this updating process, of the various types of
information of the sounding body, a process of subtracting the number of sounding bodies
extracted in the process of S104 from this number is performed. Here, the number of target
sounding bodies which is the subtraction result is further substituted into a variable n. Further,
as for the information on the arrangement position of the sounding body and the presence or
absence of the movement of the mouth, the information on the sounding body extracted in the
process of S104 is deleted. The various information of the target sounding body which has been
subjected to the update processing is stored in another predetermined area of the RAM 53.
[0099]
Next, in S106, a reading process of audio data is performed. This process is also a process for an
operation as the sound collection directivity range setting unit 20. This process is a process of
reading the collected sound signal data output from each of the microphones constituting the
microphone array 2 temporarily buffered by the interface device 57 and collectively storing them
in a predetermined area of the RAM 53. It is.
[0100]
Next, in S107, it is determined whether the current value of the variable n is a positive value.
Here, when the MPU 51 determines that the value of the variable n is a positive value (when the
determination result is YES), the process proceeds to S108. On the other hand, when the MPU 51
determines that the value of the variable n is not a positive value (when the determination result
is NO), the process returns to S 102, and the next sound pickup buffered by the interface device
04-05-2019
30
57 is performed. Execute processing concerning signal data again.
[0101]
The subsequent processes from S108 to S113 are the n-th target in each sounding body (target
sounding body) excluding sounding bodies present in the sound collection range of the
microphone array 2 extracted by the process of S104. It is a process to be performed on a
sounding body.
[0102]
First, in S108, sound collection directivity range setting processing is performed.
This process is also a process for an operation as the sound collection directivity range setting
unit 20. In this processing, the sharpness of the sound collection directivity directed to the nth
target sounding body is set based on the number of target sounding bodies, and the direction of
the sound collection directivity is arranged for the nth target sounding body It is the processing
set up based on the information of.
[0103]
In this process, first, information on the number of target sounding objects and data of the
direction angle θ and the distance d for the n-th target sounding object (human) are acquired
from the RAM 53. Next, with reference to the angle table acquired in the process of S101, a
process of acquiring an angle value associated with the number of target sound producing bodies
is performed. The angle value and the direction angle θ indicate the sharpness and direction of
sound collection directivity, respectively.
[0104]
At this time, as described above, the sharpness of the sound collection directivity directed to the
n-th target sound producing body is further set based on the information of the arrangement of
the sound producing body (human). May be
[0105]
04-05-2019
31
In this case, first, processing is performed to acquire, from the RAM 53, data on the direction
angle of the target sounding body adjacent to the n-th target sounding body.
Then, using the acquired data of the direction angle and the maximum value θMAX of the
sharpness of the sound collection directivity acquired in the process of S101, as described with
reference to FIG. A process of setting the sharpness of sound collection directivity directed to the
body is performed.
[0106]
Furthermore, as described with reference to FIG. 6, processing may be performed to set the
sharpness of the sound collection directivity directed to the n-th target sound generator. In this
case, the process of acquiring from the RAM 53 the data of the distance for the target sounding
body adjacent to the n-th target sounding body is divided first. Next, using the obtained distance
data and the reference distance ddef acquired in the process of S101, the process of setting the
sharpness of the sound collection directivity is performed as described with reference to FIG. .
[0107]
Note that the angle value and the direction angle indicating the sharpness and the direction of
the sound collection directivity set by the sound collection directivity range setting process of
S108 are stored in a predetermined area of the RAM 53.
[0108]
Next, in S109, directional sound reception processing is performed.
This process is a process for an operation as the sound collection directional sound reception
processing unit 31 in the sound collection processing unit 30. In this process, first, the collected
sound signal data for each microphone stored in the RAM 53 is read out by the process of S106,
and each of them is subjected to time-frequency conversion (for example, Fourier transform) to
obtain the frequency spectrum of each collected sound signal. Processing for obtaining data is
performed. Next, the process of calculating the phase difference of the spectrum with respect to
04-05-2019
32
the frequency spectrum data of each of the other collected sound signals with respect to each of
the spectrum frequencies when the frequency spectrum data of one of the collected sound
signals is used as a reference To be done.
[0109]
Next, a process of reading the angle value and the direction angle stored in the RAM 53 is
performed by the process of S108. Next, referring to the database in the hard disk drive 54,
processing for reading out the frequency characteristic of the phase difference range between
microphones of the collected sound signal, which is associated with the read angle value, from
the table for the read direction angle. Is done. Then, processing is performed to acquire the phase
difference range in each spectrum frequency in the frequency spectrum data of each collected
signal from this table.
[0110]
Next, with reference to the database in the hard disk drive 54, processing is performed to
acquire, from the table for each spectrum frequency, the phase difference in each spectrum and
the weighting value associated with the phase difference range. Then, with respect to the
frequency spectrum of the reference collected sound signal, the weighting value is multiplied for
each spectral frequency to perform weighting.
[0111]
The processes of S110, S111, and S113 that follow are processes for operations as the sound
generation detection unit 33 in the sound collection processing unit 30 of FIG. Therefore, when
the sound collection device 1 of FIG. 1 is realized by the computer 50, the processing of S110,
S111, and S113 is not required to be performed, and the processing of S112 to be described
later is performed after S109 and thereafter. The processing of S114 described later may be
executed.
[0112]
04-05-2019
33
First, in S110, a process for acquiring a sound generation detection level is performed. In this
process, of the frequency spectrums weighted by the directional sound reception process of
S109, the level of the spectrum included in the above-mentioned predetermined frequency band
is added, and the total value is added to the predetermined frequency band in the output voice A
process is performed to obtain the amplitude level of. The amplitude level thus obtained is
treated as a sound generation detection level.
[0113]
Next, in S111, a process is performed to determine whether the sound generation detection level
obtained by the process of S110 exceeds a predetermined value which is a threshold. When the
MPU 51 determines that the sound generation detection level exceeds the predetermined value
(when the determination result is Yes), the process proceeds to S112, and the sound generation
detection level does not exceed the predetermined value. If it is determined that the condition is
(when the determination result is No), the process proceeds to S113.
[0114]
In the processes of S110 and S111, the detection of the presence or absence of the sound
generation by each of the sound generators can be performed as follows, instead of based on the
sound generation detection level determined as described above.
[0115]
For example, in S110, the MPU 51 extracts a spectrum that is equal to or greater than a
predetermined value from the frequency spectrum to which weighting has been given, and adds
the extracted spectra to obtain a sum value thereof.
Then, in the subsequent S111, the MPU 51 performs processing of determining whether this
total value exceeds a predetermined threshold value. Here, if it is determined that the sound
quality is exceeded, it is determined that the sound generation by the sounding body is present,
and the process proceeds to S112. If it is determined that the sound generation is not exceeded,
it is determined that the sound generation by sounding body is not The process then proceeds to
S113.
04-05-2019
34
[0116]
Alternatively, in S110, the MPU 51 performs processing for obtaining the maximum value of the
spectrum in the weighted frequency spectrum. Then, in the subsequent S111, the MPU 51
performs a process of determining whether the maximum value exceeds a predetermined
threshold. Here, if it is determined that the sound quality is exceeded, it is determined that the
sound generation by the sounding body is present, and the process proceeds to S112. If it is
determined that the sound generation is not exceeded, it is determined that the sound generation
by sounding body is not The process then proceeds to S113.
[0117]
Even if the processes of S110 and S111 are performed as described above, it is possible to detect
the presence or absence of sound generation by each of the sound generation members. In S112,
an output sound generation process is performed. This process is a process for an operation as
the output sound signal generation unit 32. In this process, the frequency spectrum of the sound
collection signal to which weighting is given by the directional sound reception process of S109
is subjected to inverse conversion (for example, inverse fast Fourier transform) to the conversion
performed in the directional sound reception process, and time is obtained. A process of
converting into audio signal data of a region and outputting it is performed. When the processing
of S112 is completed, the MPU 51 advances the processing to S114.
[0118]
On the other hand, in S113, non-voice processing is performed. This process is a process of
stopping the output of the output sound signal. Note that, instead of the process of stopping the
output of the output sound signal, the MPU 51 may perform a process of outputting silence data.
Alternatively, the MPU 51 may perform processing of outputting white noise data of a
predetermined level, or processing of outputting data corresponding to stationary noise
generated regularly by the sound collection system. You may make it MPU51 do.
[0119]
Next, in S114, the process of decrementing the value of the variable n, that is, the process of
subtracting 1 from the current value of the variable n, substituting the value of the subtraction
result into the variable n, is performed. Processing is returned, and processing based on the new
04-05-2019
35
value of the variable n is performed again.
[0120]
By causing the MPU 51 to perform the above-described control processing, the computer 50 of
FIG. 10 can function as the sound collection device 1.
The present invention is not limited to the embodiments described above, and at the
implementation stage, various modifications or combinations can be made without departing
from the scope of the invention.
[0121]
For example, although the camera 3 is disposed at substantially the same position as the
microphone array 2 in the above-described embodiment, the camera 3 and the microphone array
2 may be disposed at separate positions. When arranging in this manner, for example, a
conversion table for converting the positional relationship between the camera 3 and the
microphone array 2 is prepared in the sounding body information acquiring unit 10 of the sound
collecting device 1. Then, the sounding body information acquiring unit 10 refers to this
conversion table and arranges the arrangement information of the position, angle and distance
detected by the position detection system 4 from the photographed image of the camera 3 at the
position of the microphone array 2 It may be converted into information.
[0122]
In addition, the following additional remarks are disclosed regarding the embodiment described
above. (Supplementary Note 1) A sound pickup processing means for generating an output sound
signal of which sound collection directivity is controlled based on a plurality of collected sound
signals collected by a microphone array having a plurality of microphones whose relative
positions are fixed. Acquisition means for acquiring information on the number and arrangement
of sounding members present in the sound collection range of the microphone array; acquisition
direction of the sound collection directivity to be directed to the sounding body from the
microphone array; The setting of each of the sounding bodies based on the information on the
arrangement of the sounding bodies acquired by the user, and the sharpness of the sound
04-05-2019
36
collecting directivity directed to the sounding bodies is the number of the sounding bodies
acquired by the acquisition means. Sound collection directivity range setting means set for each
of the sounding bodies based on the information; and the sound collection processing means sets
the sound collection directivity to the direction and sharpness set by the sound collection
directivity range setting means Output signal with controlled nature Forms and outputs sound
collection device, characterized in that. (Supplementary Note 2) The information on the number
and arrangement of sounding members present in the sound collection range of the microphone
array acquired by the acquisition means is obtained from an image obtained by photographing
the sound collection range of the microphone array The sound collection device according to
supplementary note 1, characterized in that (Supplementary Note 3) The sound collection
directivity range setting means further acquires the sharpness of the sound collection directivity
to be directed to the sound generator based on the information on the number of the sound
generator acquired by the acquisition means and the acquisition The sound collection device
according to appendix 1 or 2, wherein the setting is made for each of the sounding bodies based
on the information on the arrangement of the sounding bodies acquired by the means.
(Supplementary Note 4) Information on the arrangement of the sound producing body on which
the sound collecting directivity setting means sets the sharpness of the sound collecting
directivity directed to the sound producing body is within the sound collecting range of the
microphone array. The sound collection device according to Additional remark 3, wherein the
sound collection device is information of an arrangement interval between existing sounding
bodies. (Supplementary Note 5) Information on the arrangement of the sounding body on which
the sound collection directivity is set by the sound collection directivity range setting means to
set the sharpness of the sound collection directivity includes the sounding body and the
microphone array. The sound collection device according to appendix 3, characterized in that it is
distance information. (Supplementary Note 6) The sound collection directivity range setting
means sets the sharpness of the sound collection directivity to a narrower angle when the
distance is short as compared to the case where the distance between the sound generator and
the microphone array is long. The sound pickup device according to claim 5, characterized in
that: (Supplementary Note 7) The system further includes excluded sounding body extraction
means for extracting a sounding body to be excluded from the sound source of the output sound
generated by the sound collection processing means based on the information of the
arrangement of the sounding body acquired by the acquisition means. The sound collection
directivity range setting means sets the direction and sharpness of the sound collection
directivity based on information on ones of the sound producing bodies other than those
extracted by the excluded sound producing body extracting means. The sound pickup device
according to any one of supplementary notes 1 to 6, characterized in that
(Supplementary Note 8) The excluded sounding body extraction unit excludes from the sound
source of the signal of the output sound generated by the sound collection processing unit based
on the amount of change in the information of the arrangement of the sounding body acquired
04-05-2019
37
by the acquisition unit. 15. The sound collection device according to appendix 7, characterized in
that (Supplementary Note 9) The sounding body is a human, and the excluded sounding body
extracting means collects the sound based on the information of the face direction of the human
among the information of the arrangement of the human acquired by the acquiring means. The
sound collection device according to appendix 7, wherein a sound generator to be excluded from
the sound source of the signal of the output sound generated by the processing means is
extracted. (Supplementary note 10) The sounding body is a human, and the excluded sounding
body extracting means generates an output voice generated by the sound collection processing
means based on the information on the presence or absence of the movement of the human
mouth acquired by the acquiring means. The sound pickup apparatus according to appendix 7,
wherein a sound generator to be excluded from the sound source of the signal of is extracted.
(Supplementary note 11) A sound generation detection means for detecting the presence or
absence of sound generation by a sound generation body present in the sound collection range of
the microphone array is further included, and the sound collection processing means detects the
sound generation by the sound generation detection means. 15. The sound collection device
according to any one of appendices 1 to 10, which outputs a signal of the output sound at a time.
(Supplementary note 12) The sound collection device according to supplementary note 11,
wherein the sound generation detection means detects the presence or absence of the sound
generation by the sound generator based on the amplitude level of a predetermined frequency
band in the output sound. (Supplementary note 13) The sound collection device according to
supplementary note 12, wherein the sounding body is a human, and the predetermined
frequency band is set to the human's first formant frequency band. (Supplementary Note 14) The
sound collection processing means obtains the sound collection directivity of the direction and
sharpness set by the sound collection directivity range setting means with respect to the
frequency spectrum of the sound collection signal collected by the microphone array. Weighting
of each of the spectra and converting the weighted frequency spectrum into time axis
information to generate a signal of the output sound, wherein the sound generation detecting
means performs the weighting in the weighted frequency spectrum. The sound collection device
according to appendix 11, wherein presence or absence of sound generation by the sound
generator is detected based on an addition total value of the spectra whose spectrum is equal to
or more than a predetermined value. (Supplementary Note 15) The sound collection processing
means obtains the sound collection directivity of the direction and sharpness set by the sound
collection directivity range setting means with respect to the frequency spectrum of the sound
collection signal collected by the microphone array. Weighting of each of the spectra and
converting the weighted frequency spectrum into an audio signal in the time domain to generate
a signal of the output sound, and the pronunciation detection means determines the frequency to
which the weighting is applied. The sound collection device according to appendix 11, wherein
the presence or absence of sound generation by the sound generator is detected based on the
maximum value of the spectrum in the spectrum.
04-05-2019
38
(Supplementary note 16) A sound collection method for generating an output sound signal in
which sound collection directivity is controlled based on a plurality of sound collection signals
collected by a microphone array having a plurality of microphones whose relative positions are
fixed. The input of information on the number and arrangement of sounding members present in
the sound collection range of the microphone array, and the direction of the sound collection
directivity directed from the microphone array to the sounding member is the acquired sound
generation The sharpness of the sound collecting directivity set for each of the sounding bodies
based on the information of the arrangement about the body and the sound collection directivity
directed to the sounding bodies is each of the sounding bodies based on the acquired information
about the sounding bodies And a signal of an output sound whose sound collection directivity is
controlled to a set direction and sharpness. (Supplementary Note 17) To cause a computer to
generate an output sound signal whose sound collection directivity is controlled based on a
plurality of collected sound signals collected by a microphone array having a plurality of
microphones whose relative positions are fixed. An acquisition process for acquiring input of
information on the number and arrangement of sounding members present in the sound
collection range of the microphone array by executing the program on the computer; The
direction of the sound collection directivity is set for each of the sounding bodies based on the
information on the arrangement of the sounding bodies acquired by the acquisition processing,
and the sharpness of the sound collection directivity to be directed to the sounding bodies is set.
Sound collection directivity range setting processing set for each of the sound generators based
on the information on the number of the sound generators acquired by the acquisition
processing; A program for causing the computer to perform sound collection processing of
generating and outputting a signal of an output sound whose sound collection directivity is
controlled in a fixed direction and sharpness.
[0123]
DESCRIPTION OF SYMBOLS 1 sound collection apparatus 2 microphone array 3 camera 4 face
position detection system 10 sounding object information acquisition unit 20 sound collection
directivity range setting unit 30 sound collection processing unit 31 directivity sound reception
processing unit 32 output sound signal generation unit 33 pronunciation detection Part 40
Exclusion body extraction part 50 Computer 51 MPU 52 ROM 53 RAM 54 Hard disk drive 55
Input device 56 Display device 57 Interface device 58 Recording medium drive device 59 Bus 60
Portable recording medium
04-05-2019
39
Документ
Категория
Без категории
Просмотров
0
Размер файла
59 Кб
Теги
jp2011071702
1/--страниц
Пожаловаться на содержимое документа