close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2016025469

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2016025469
An object of the present invention is to make it possible to experience the current situation of
various places in a remote place with full presence. According to the present invention, a
microphone array necessary for collecting sound in each area in a space is selected, and the
entire area is collected using the selected microphone array for each area, and the collected
sound area is selected. An area sound selection unit for selecting an area sound of an area
corresponding to a designated listening position and an area sound of an area around the area
according to a listening direction among area sounds according to a sound reproduction
environment, and area sound selection The area volume control unit adjusts the volume of each
area sound selected by the unit according to the distance from the specified listening position,
and the transfer function according to the sound reproduction environment for each area sound
adjusted in volume by the area volume control unit And a three-dimensional sound processing
unit that performs three-dimensional sound processing. [Selected figure] Figure 1
Sound pickup reproduction system, sound pickup reproduction apparatus, sound pickup
reproduction method, sound pickup reproduction program, sound pickup system and
reproduction system
[0001]
The present invention relates to a sound pickup reproduction system, a sound pickup
reproduction device, a sound pickup reproduction method, a sound pickup reproduction
program, a sound pickup system, and a reproduction system, and, for example, sounds existing in
a plurality of areas Etc. Can be applied to processing, mixing and three-dimensionally
reproducing the sound of each area.
11-04-2019
1
[0002]
With the development of ICT, there is a growing demand for technology that uses video and
sound information from remote areas and lets you experience the feeling as if you were at a
remote area.
[0003]
Non-Patent Document 1 proposes a telework system capable of connecting a plurality of offices
located in remote places, mutually exchanging images, sounds, and various sensor information,
and communicating smoothly with remote places. ing.
In this system, a plurality of cameras and a plurality of microphones are arranged all over the
office, and video / sound information obtained from the camera / microphone is transmitted to
another remote office. The user can freely switch the camera at the remote place, and every time
the camera is switched, the sound picked up by the microphone placed near the camera is
reproduced, and the situation at the remote place can be known in real time .
[0004]
Further, Non-Patent Document 2 proposes a system in which a plurality of cameras and
microphones are arranged in an array in a room, and a user can freely select a viewing position
and view content such as orchestra performance recorded and recorded in the room. There is. In
this system, sounds recorded using a microphone array are separated for each sound source by
independent component analysis (hereinafter ICA: Independent Component Analysis). Usually,
ICA separation needs to solve a permutation problem in which components of each separated
sound source are output after being replaced for each frequency component. In this system,
frequency components are grouped based on spatial similarity. By doing this, the sound sources
that are present at close positions are collectively separated. There is a possibility that a plurality
of sound sources may be mixed in the separated sound, but the influence is small since all sound
sources are finally reproduced. By estimating the position information of the separated sound
source and adding a stereophonic sound effect to the sound source according to the angle of
view of the selected camera and reproducing it, it is possible to hear a sound having a sense of
presence for the user.
11-04-2019
2
[0005]
Nonaka et al., "Office Communication System Using Multiple Video, Sound, and Sensor
Information," Human Interface Society Research Report Vol. 13 No. 10, 2011 Niwa et al., "Multimicrophone array signal coding using blind source separation for listening position selective
sound field reproduction", IEICE Technical Report, EA, Applied Acoustics 107 (532),2008
[0006]
However, even using the systems described in Non-Patent Document 1 and Non-Patent
Document 2, there are insufficient points to allow the user to experience the current situation of
various places in a remote place with full presence.
[0007]
With the system described in Non-Patent Document 1, the user can view the inside of a remote
office from any direction in real time, and can also hear the sound of that place.
However, with regard to sound, since what is simply picked up by the microphone is simply
reproduced as it is, all the sounds (voice and sound) present in the surroundings are mixed, and
there is no sense of direction, and in addition there is a sense of presence I miss it.
[0008]
Moreover, if the system described in Non-Patent Document 2 is used, stereophonic sound
processing and reproduction of the separated sound source allows the user to hear the sound of
a remote place with a sense of reality.
However, it is difficult to simultaneously perform sound collection and reproduction processing
in real time because it requires many calculations such as ICA and estimation of virtual sound
source components and estimation of position information in order to separate sound sources. In
addition, it is difficult to obtain stable performance under any circumstances, since the output
changes depending on the setting of the number of sound sources, the number of virtual sound
sources, and the number of groupings that are actually present.
11-04-2019
3
[0009]
Therefore, a sound collecting and reproducing system, a sound collecting and reproducing
apparatus, a sound collecting and reproducing method, a sound collecting and reproducing
program, a sound collecting system and a reproducing system capable of making the present
conditions of various places in remote places rich in reality It is done.
[0010]
In order to solve such a problem, the sound collection and reproduction system according to the
first aspect of the present invention collects area sounds of all divided areas in space using a
plurality of microphone arrays arranged in space, A sound collection and reproduction system
for reproducing three-dimensional sound, comprising: (1) a microphone array selection unit for
selecting a microphone array necessary for sound collection in each area in the space; and (2) an
area selected by the microphone array selection unit. Of the area sounds of the entire area picked
up by the area pickup unit, and the area sound pickup of the area corresponding to the
designated listening position, using the microphone array of each area And an area sound
selection unit for selecting an area sound of the surrounding area of the area according to the
listening direction according to the sound reproduction environment, and (4) for each area sound
selected by the area sound selection unit An area volume control unit that adjusts the amount
according to the distance from the specified listening position, and (5) for each area sound whose
volume is adjusted by the area volume control unit, using a transfer function according to the
sound reproduction environment And a three-dimensional sound processing unit for
[0011]
A sound collection and reproduction apparatus according to a second aspect of the present
invention collects and reproduces three-dimensional sound by collecting area sounds of all areas
divided in space using a plurality of microphone arrays arranged in space. The apparatus
comprises: (1) a microphone array selection unit for selecting a microphone array necessary for
picking up sound in each area in the space; and (2) a microphone array for each area selected by
the microphone array selection unit. (3) Of the area sounds of the entire area collected by the
area collection section (3), the area sounds of the area corresponding to the designated listening
position and the area sounds of the area corresponding to the listening direction An area sound
selection unit that selects the area sound of the area around the area according to the sound
reproduction environment, (4) Specify the volume of each area sound selected by the area sound
selection unit according to the distance from the listening position And (5) providing a
stereophonic sound processing unit that performs stereophonic sound processing on each area
sound adjusted in volume by the area volume control unit using a transfer function according to
the acoustic reproduction environment. It features.
11-04-2019
4
[0012]
A sound pickup reproduction method according to a third aspect of the present invention is a
sound pickup reproduction method for collecting a three-dimensional sound by picking up area
sounds of all divided areas in space using a plurality of microphone arrays arranged in space. In
the method, (1) the microphone array selection unit selects a microphone array necessary for
sound collection in each area in the space, and (2) the area sound collection unit is selected for
each area selected by the microphone array selection unit. Using the above microphone array,
the entire area is picked up, and (3) the area sound selection unit selects an area of the area
corresponding to the designated listening position among the area sounds of all areas picked up
by the area pickup unit. The sound and the area sound of the surrounding area of the area
according to the listening direction are selected according to the sound reproduction
environment, and (4) the area sound volume adjustment unit determines the volume of each area
sound selected by the area sound selection unit Designated listening Adjust according to the
distance from the position, and (5) The stereophonic sound processing unit performs
stereophonic sound processing on each area sound whose volume is adjusted by the area volume
control unit using the transfer function according to the sound reproduction environment It is
characterized by
[0013]
A sound pickup reproduction program according to a fourth aspect of the present invention is a
sound pickup reproduction apparatus for picking up three-dimensional sound by picking up area
sound of all divided areas in space using a plurality of microphone arrays arranged in space. The
program includes a computer, (1) a microphone array selection unit for selecting a microphone
array necessary for collecting sound in each area in space, and (2) a microphone array for each
area selected by the microphone array selection unit. (3) Of the area sounds of the entire area
picked up by the area pickup unit, the area sound of the area corresponding to the designated
listening position, and in the listening direction An area sound selection unit that selects the area
sound of the surrounding area of the corresponding area according to the sound reproduction
environment, and (4) the volume of each area sound selected by the area sound selection unit. (3)
A three-dimensional sound processing is performed on each area sound adjusted in volume by
the area volume adjustment unit according to the distance from the listening position using a
transfer function according to the sound reproduction environment It is characterized in that it
functions as an acoustic processing unit.
[0014]
A sound pickup system according to a fifth aspect of the present invention is a sound pickup
system for picking up area sound of all divided areas in space using a plurality of microphone
arrays arranged in space, and (1) space A microphone array selection unit for selecting a
11-04-2019
5
microphone array necessary for picking up sound in each area in the area, and (2) an area sound
pickup for picking up an entire area using the microphone array for each area selected by the
microphone array selection unit And a unit.
[0015]
A reproduction system according to a sixth aspect of the present invention is a reproduction
system for collecting three-dimensional sound by collecting area sounds of all divided areas in
space using a plurality of microphone arrays arranged in space. (1) Of the area sounds of the
entire area, an area sound for selecting the area sound of the area corresponding to the
designated listening position and the area sound of the surrounding area of the area according to
the listening direction according to the acoustic reproduction environment (2) an area volume
control unit for adjusting the volume of each area sound selected by the area sound selection
unit according to the distance from the designated listening position; and (3) the volume control
performed by the area volume control unit Each area sound is provided with a three-dimensional
sound processing unit that performs three-dimensional sound processing using a transfer
function according to the sound reproduction environment.
[0016]
According to the present invention, it is possible for the user to experience the current situation
of various places in a remote place with full presence.
[0017]
It is a block diagram showing composition of a sound collection reproducing device concerning
an embodiment.
It is a block diagram which shows the internal structure of the area sound collection part which
concerns on embodiment.
It is the model which showed selecting the area sound which divided and collected the space of
the remote place according to the embodiment into nine areas according to the user's designated
position and acoustic reproduction environment, and reproduces.
It is an explanatory view explaining the situation where it picks up from two sound pickup areas
11-04-2019
6
using two 3 channel microphone arrays concerning an embodiment.
[0018]
(A) Main Embodiment In the following, the embodiments of the sound pickup reproduction
system, the sound pickup reproduction apparatus, the sound pickup reproduction method, the
sound pickup reproduction program, the sound pickup system and the reproduction system
according to the present invention will be described in detail with reference to the drawings.
Explain to.
[0019]
(A-1) Description of Technical Concept of Embodiment First, the technical concept of the
embodiment according to the present invention will be described.
The inventor of the present invention proposes a sound collection system that collects a space
for each area using a microphone array which divides the space of the remote area into a
plurality of areas and is arranged in the space of the remote area (Reference 1: Japanese Patent
Application No. 2013-179886 and specification).
The sound pickup reproduction system according to this embodiment uses the sound pickup
method proposed by the inventor of the present invention.
In this sound collecting method, the size of the area to collect sound can be changed by changing
the arrangement of the microphone array, so the space of the remote place can be divided
according to the environment of the remote place.
In addition, this sound collection method can simultaneously collect area sounds of all the
divided areas.
[0020]
Therefore, the sound pickup reproduction system according to the embodiment picks up the area
sound of all areas in the space of the remote place at the same time, and according to the viewing
position and direction of the remote place selected by the user, the sound reproduction
11-04-2019
7
environment of the user The area sound according to is selected, and the selected area sound is
subjected to three-dimensional sound processing and output.
[0021]
(A-2) Configuration of Embodiment FIG. 1 is a block diagram showing the configuration of a
sound collection and reproduction apparatus (sound collection and reproduction system)
according to the embodiment.
In FIG. 1, the sound collection and reproduction apparatus 100 according to the embodiment
includes microphone arrays MA1 to MAm (m is an integer), data input unit 1, space coordinate
data holding unit 2, microphone array selection unit 3, area sound collection unit 4, A position /
direction information acquisition unit 5, an area sound selection unit 6, an area volume
adjustment unit 7, a three-dimensional sound processing unit 8, a speaker output unit 9, a
transfer function data holding unit 10, and speaker arrays SA1 to SAn (n is an integer) .
[0022]
In the sound collection and reproduction system 100 according to the embodiment, the portions
shown in FIG. 1 excluding the microphone arrays MA1 to MAm and the speaker arrays SA1 to
SAn may be constructed by connecting various circuits in a hardware manner, and a CPU A
general-purpose device or unit having a ROM, a RAM, etc. may be constructed to realize a
corresponding function by executing a predetermined program, and even if any construction
method is adopted, it is functional. Can be represented in FIG.
[0023]
In addition, the sound collecting and reproducing apparatus 100 may be a sound collecting and
reproducing system capable of transmitting information between a remote place and a place
viewed by the user. For example, the sound by the microphone arrays MA1 to MAm may be
transmitted to the remote place. A sound collecting portion (including voice and sound) may be
constructed, and a portion for selecting an area sound and reproducing a sound in accordance
with the user's acoustic reproduction environment may be constructed as a viewing location.
In that case, the remote location and the viewing location on the user side may be provided with
11-04-2019
8
a communication unit (not shown) for transmitting information between the remote location and
the viewing location on the user side.
[0024]
The microphone arrays MA1 to MAm are arranged so as to be able to pick up sounds (including
voice and sound) from sound sources present in all areas obtained by dividing the remote space
into a plurality of parts.
In each of the microphone arrays MA1 to MAm, one microphone array includes two or more
microphones, and picks up an acoustic signal captured by each microphone.
Each of the microphone arrays MA1 to MAm is connected to the data input unit 1, and each of
the microphone arrays MA1 to MAm supplies the collected sound signal to the data input unit 1.
[0025]
The data input unit 1 converts an acoustic signal from the microphone arrays MA1 to MAm from
an analog signal to a digital signal, and outputs the digital signal to the microphone array
selection unit 3.
[0026]
The space coordinate data holding unit 2 holds position information of (the center of) an area,
position information of each of the microphone arrays MA1 to MAm, distance information of
microphones constituting each of the microphone arrays MA1 to MAm, and the like. is there.
[0027]
The microphone array selection unit 3 uses a combination of microphone arrays MA1 to MAm
used to pick up each area as positional information of the area held by the spatial coordinate
data holding unit 2 and positional information of the microphone arrays MA1 to MAm. It is
decided based on it.
11-04-2019
9
Further, when the microphone arrays MA1 to MAm are configured by three or more
microphones, the microphone array selection unit 3 selects microphones necessary to form
directivity.
[0028]
Here, an example of a microphone selection method for forming the directivity of each
microphone array by the microphone array selection unit 3 will be described.
FIG. 4 illustrates an example of a microphone selection method for forming directivity by the
microphone array selection unit 3 according to the embodiment.
[0029]
For example, the microphone array MA1 shown in FIG. 4 has microphones M1, M2, and M3
which are three omnidirectional microphones on the same plane. The microphones M1, M2 and
M3 are arranged at the vertices of a right triangle. The distance between the microphones M1
and M2 and the distance between the microphones M2 and M3 are assumed to be the same. The
microphone array MA2 also has the same configuration as the microphone array MA1, and
includes three microphones M4, M5, and M6.
[0030]
For example, in FIG. 4, the microphone array selection unit 3 selects the microphones M2 and
M3 of the microphone array MA1 and the microphones M5 and M6 of the microphone array
MA2 in order to pick up the sound from the sound source present in the sound pickup area A.
Do. Thus, the directivity of the microphone array MA1 and the directivity of the microphone
array MA2 can be formed in the sound collection area A direction. In addition, when collecting
the sound from the sound source present in the sound collection area B, the microphone array
selection unit 3 changes the combination of the microphones of the microphone arrays MA1 and
MA2 to set the microphones M1 and M2 of the microphone array MA1, The microphones M4
and M5 of the microphone array MA2 are selected. Thereby, the directivity of each of the
microphone arrays MA1 and MA2 can be formed in the sound collecting area B direction.
11-04-2019
10
[0031]
The area sound pickup unit 4 picks up the area sound of the entire area for each combination of
the microphone arrays selected by the microphone array selection unit 3.
[0032]
FIG. 2 is a block diagram showing an internal configuration of the area sound pickup unit 4
according to this embodiment.
As shown in FIG. 2, the area sound collection unit 4 includes a directivity formation unit 41, a
delay correction unit 42, an area sound power correction coefficient calculation unit 43, and an
area sound extraction unit 44.
[0033]
The directivity forming unit 41 is also referred to as a beam former (hereinafter referred to as
BF) in each of the microphone arrays MA1 to MAm. ) To form a directional beam in the direction
of the sound collecting area. Here, the beam former (BF) is also referred to as addition-type
delay-sum method, subtraction-type spectrum subtraction method (hereinafter also referred to as
SS). Etc.) can be used. Further, the directivity forming unit 41 changes the intensity of directivity
according to the range of the sound collection area to be a target.
[0034]
The delay correction unit 42 calculates the propagation delay time generated due to the
difference in distance between each of all the areas and all the microphone arrays used for sound
collection in each area, and corrects the propagation delay time of the all microphone arrays It is
Specifically, the delay correction unit 42 acquires the position information of the area from the
space coordinate data holding unit 2 and the position information of all the microphone arrays
MA1 to MAm used for sound collection in the area, and from the area, The difference (the
propagation delay time) of the arrival time of the area sound to all the microphone arrays MA1 to
MAm used for sound collection in the area is calculated. Then, the delay correction unit 42
outputs the beamformer output signals from all the microphone arrays so that the area sound
11-04-2019
11
simultaneously reaches all the microphone arrays with reference to the microphone array
arranged at the farthest position from the area. Add the propagation delay time to correct the
delay. In addition, the delay correction unit 42 performs delay correction on beamformer output
signals from all the microphone arrays used for sound collection in each area for all the areas.
[0035]
The area sound power correction coefficient calculation unit 43 sets a power correction
coefficient for making the power of the area sound included in each beamformer output signal
from each microphone array used for sound collection in each area the same. It is calculated.
Here, in order to obtain the power correction coefficient, for example, the area sound power
correction coefficient calculation unit 43 calculates the ratio of the amplitude spectrum for each
frequency between the beamformer output signals. Next, the area sound power correction
coefficient calculation unit 43 calculates the mode value or the median from the ratio of the
amplitude spectrum of each of the obtained frequencies, and uses that value as the power
correction coefficient.
[0036]
The area sound extraction unit 44 performs spectrum subtraction on each beamformer output
data corrected by the power correction coefficient corrected by the area sound power correction
coefficient calculation unit 43 for all areas, and generates noise existing in the sound collection
area direction. Extract. Furthermore, the area sound extraction unit 44 extracts an area sound by
subtracting the extracted noise from each beamformer output. The area sound of each area
extracted by the area sound extraction unit 44 is output to the area sound selection unit 6 as an
output of the area collection unit 4.
[0037]
The position / direction information acquisition unit 5 refers to the space coordinate data storage
unit 2 to acquire a position (designated listening position) and a direction (listening direction)
desired by the user. For example, when the user designates a target area using a GUI or the like
or switches the target area based on an image of a remote place projected at a viewing location
of the user, a camera that projects the designated position according to the user specification Can
be switched to In this case, the position / direction information acquisition unit 5 sets the
11-04-2019
12
position of the designated area as the position of the target area, and acquires the direction in
which the target area is projected from the position of the camera.
[0038]
The area sound selection unit 6 selects an area sound to be used for sound reproduction based
on the position information and the direction information acquired by the position / direction
information acquisition unit 5. Here, the area sound selection unit 6 first sets an area sound
closest to the position designated by the user as a reference (that is, a center sound source). The
area sound selection unit 6 detects the area sound of each area of the target area including the
center sound source according to the direction information, and further in the oblique direction
of the target area Set the area sound of each area located as a sound source. Further, the area
sound selection unit 6 selects an area sound to be used for sound reproduction in accordance
with the sound reproduction environment on the user side.
[0039]
The area sound adjustment unit 7 sets the volume of the area sound selected by the area sound
selection unit 6 according to the distance from the center position of the target area according to
the position (center position of the target area) and direction information specified by the user.
Adjust the As to the volume adjustment method, the volume of the area sound is decreased as the
distance from the center position of the target area increases, or the volume of the area sound of
the target area, which is the central sound source, is maximized. The volume of the area sound of
the area may be reduced. More specifically, for example, the volume of the area sound of the
surrounding area is set to a predetermined value a (0 <a <1 so that the volume of the area sound
of the surrounding area is smaller than the volume of the area sound of the target area. ) May be
adjusted, or, for example, a predetermined value may be subtracted from the volume of the area
sound of the surrounding area.
[0040]
The three-dimensional sound processing unit 8 performs three-dimensional sound processing on
each area sound according to the environment of the user. The three-dimensional sound
processing unit 8 can appropriately perform various three-dimensional sound processing
according to the sound reproduction environment on the user side. That is, the three-dimensional
11-04-2019
13
sound processing performed by the three-dimensional sound processing unit 8 is not particularly
limited.
[0041]
For example, when the user uses headphones and earphones, the stereophonic sound processing
unit 8 causes the area sound selected by the area sound selection unit 3 to move in each
direction from the viewing position held by the transfer function data holding unit 10. Convolute
the corresponding head related transfer function (HRTF) to create a binaural sound source. In
addition, for example, when using a stereo speaker, the stereophonic sound processing unit 8
transaural the binaural sound source by the crosstalk canceller designed using the room transfer
function between the user held by the transfer function data holding unit 10 and the speaker.
Convert to sound source. When three or more speakers are used, the stereophonic sound
processing unit 8 does not perform processing if the position of the speaker is the same as the
position of the area sound, or combines it with the transaural sound source, and adds the same
number of new speakers. Create a sound source.
[0042]
The speaker output unit 9 outputs the sound source data subjected to the stereophonic sound
processing in the stereophonic sound processing unit 8 to the corresponding speakers.
[0043]
The transfer function data holding unit 10 holds a transfer function on the user side necessary to
perform stereophonic sound processing.
The transfer function data holding unit 10 holds, for example, a head-related transfer function
(HRTF) corresponding to each direction, a room transfer function between a user and a speaker,
and the like. Further, the transfer function data holding unit 10 may be able to hold data
obtained by learning the data of the indoor transfer function according to, for example, a change
in the indoor environment.
[0044]
11-04-2019
14
The speaker arrays SA1 to SAn are speakers that are sound reproduction systems on the user
side. The speaker arrays SA1 to SAn enable three-dimensional sound reproduction, and may be,
for example, an earphone, a stereo speaker, three or more speakers, and the like. In this
embodiment, in order to reproduce three-dimensional sound, the speaker arrays SA1 to SAn are,
for example, two or more speakers, and are arranged in front of or around the user.
[0045]
(A-3) Operation of Embodiment Next, the operation of the sound collection and reproduction
apparatus 100 according to the embodiment will be described in detail with reference to the
drawings.
[0046]
Here, a case where the present invention is applied to a remote system in which a user views and
listens to video and audio in a remote space will be exemplified.
The space of the remote place is divided into a plurality of parts (in this embodiment, for
example, the case of being divided into nine parts is illustrated. It is assumed that a plurality of
cameras and a plurality of microphone arrays MA1 to MAm are arranged so as to be able to pick
up an image of each area divided into a plurality and a sound source present in each area.
[0047]
The microphone arrays MA1 to MAm are arranged so as to be able to pick up a plurality of all
areas obtained by dividing the space at a remote place into a plurality. One microphone array is
composed of two or more microphones, and each microphone picks up an acoustic signal.
[0048]
An acoustic signal collected by each of the microphones constituting each of the microphone
arrays MA1 to MAm is given to the data input unit 1. In the data input unit 1, acoustic signals
from the microphones of the microphone arrays MA1 to MAm are converted from analog signals
11-04-2019
15
to digital signals.
[0049]
The microphone array selection unit 3 acquires position information of each of the microphone
arrays MA1 to MAm held in the space coordinate data holding unit 2 and position information of
each area, and microphones used to collect each area A combination of arrays is determined.
Furthermore, in the microphone array selection unit 3, together with the selection of the
combination of microphone arrays used to pick up each area, microphones necessary for forming
directivity in the direction of each area are selected.
[0050]
The area pickup unit 4 picks up all the areas for each combination of the microphone arrays
MA1 to MAm used to pick up the areas selected by the microphone array selection unit 3.
[0051]
The combination of the microphone array for picking up each area selected by the microphone
array selection unit 3 and the information on the microphone for forming directivity in each area
direction are the directivity forming unit 41 of the area pickup unit 4. Given to
[0052]
The directivity forming unit 41 acquires, from the space coordinate data holding unit 2,
positional information (distances) of the microphones of the microphone arrays MA <b> 1 to
MAm for forming directivity in each area direction.
Then, the directivity forming unit 41 uses a beam former (BF) for outputs (digital signals) from
the microphones of the microphone arrays MA1 to MAm to direct directional beams directed in
the direction of the sound collecting area for each of all the areas. Form.
That is, the directivity forming unit 41 forms a directional beam for each combination of the
microphone arrays MA1 to MAm used to pick up each area of all the areas at a remote place.
11-04-2019
16
[0053]
Further, the directivity forming unit 41 may change the strength of the directivity in accordance
with the range of the sound collection area to be a target. For example, when the range of the
sound collection area to be targeted is wider than a predetermined value, the directivity
formation unit 41 may lower the intensity of directivity, or conversely, the range of the sound
collection area is more than a predetermined value. When narrow, the directivity may be made
stronger.
[0054]
Various methods can be widely applied as a method of forming a directional beam to each area
by the directivity forming unit 41. For example, the directivity forming unit 41 can apply the
method described in Reference 1 (Japanese Patent Application No. 2013-179886 and the
drawings). For example, noise is extracted using outputs from three omnidirectional microphones
arranged at the apex of a right-angled triangle on the same plane, which constitute microphone
arrays MA1 to MAm, and the noise is subjected to spectrum subtraction from an input signal
Therefore, a sharp directional beam may be formed only in the target direction.
[0055]
The delay correction unit 42 acquires from the space coordinate data holding unit 2 the position
information of each of the microphone arrays MA1 to MAm and the position information of each
area, and the difference in the arrival time of the area sound reaching each of the microphone
arrays MA1 to MAm Calculate (propagation delay time). Then, based on the microphone arrays
MA1 to MAm arranged at the farthest position from the position information of the sound
collection area, area sound simultaneously reaches all the microphone arrays MA1 to MAm from
the directivity forming unit 41. Propagation delay time is added to the beamformer output signal
from each microphone array.
[0056]
The area sound power correction coefficient calculation unit 43 calculates a power correction
coefficient for making the power of the area sound included in each beamformer output signal
the same.
11-04-2019
17
[0057]
First, the area sound power correction coefficient calculation unit 43 obtains the ratio of the
amplitude spectrum for each frequency among the beamformer output signals in order to obtain
the power correction coefficient.
At this time, in the directivity forming unit 41, when the beam former is performed in the time
domain, the area sound power correction coefficient calculating unit 43 converts it into the
frequency domain.
[0058]
Next, the area sound power correction coefficient calculation unit 43 calculates the mode value
from the ratio of the amplitude spectrum for each frequency obtained according to the equation
(1), and uses the value as the area sound power correction coefficient. As another method, the
area sound power correction coefficient calculation unit 43 may calculate a median from the
ratio of the amplitude spectrum for each frequency obtained according to the equation (2), and
use it as the area sound power correction coefficient.
[0059]
Here, X ik (n) and X jk (n) are the output data of the beamformer of the microphone array i, j
selected by the microphone array selection unit 3, k is the frequency, N is the total number of
frequency bins, α ij (N) is a power correction coefficient for beamformer output data.
[0060]
The area sound extraction unit 44 corrects each beamformer output signal using the power
correction coefficient calculated by the area sound power correction coefficient calculation unit
43.
Then, spectrum subtraction is performed on each beamformer output data after correction to
11-04-2019
18
extract noise present in the sound collection area direction. Furthermore, the area sound
extraction unit 44 extracts the area sound of the target area by subtracting the extracted noise
from each beamformer output data.
[0061]
In order to extract the noise N ij (n) present in the sound pickup area direction viewed from the
microphone array i, as shown in equation (3), the beamformer output X i (n) of the microphone
array i to the microphone array j The spectrum of the beamformer output X j (n) multiplied by
the power correction coefficient α ij is subjected to spectrum subtraction. Thereafter, the area
sound is extracted by spectrally subtracting noise from each beamformer output according to
equation (4). γ ij (n) is a coefficient for changing the intensity at the time of spectral subtraction.
[0062]
In the equation (3), the area sound extraction unit 44 extracts the noise component N ij (n)
present in the sound collection area direction as viewed from the microphone array i. The area
sound extraction unit 44 is a spectrum obtained by multiplying the beamformer output data X j
(n) of the microphone array j by the power correction coefficient α ij (n) from the beamformer
output data X i (n) of the microphone array i I am subtracting. That is, power correction is
performed between the beamformer output Xi (n) of the microphone array i selected to pick up
sound from the target area to be targeted and the beamformer output Xj (n) of the microphone
array j, and It is intended to determine the noise component by subtracting the former output Xi
(n) and the beamformer output Xj (n).
[0063]
In the equation (4), the area sound extraction unit 44 extracts an area sound using the noise
component N ij (n) thus obtained. The area sound extraction unit 44 multiplies the determined
noise component N ij (n) by the coefficient γ ij (n) for changing the intensity at the time of
spectrum subtraction to obtain the beamformer output data X i (n of the microphone array i).
The spectrum is subtracted from). That is, it is intended to obtain the area sound of the target
area by subtracting the noise component obtained by the equation (3) from the beam former X i
(n) of the microphone array i. Although the area sound seen from the microphone array i is
11-04-2019
19
obtained in the equation (4), the area sound seen from the microphone array j may be obtained.
[0064]
The position / direction information acquisition unit 5 refers to the space coordinate data
holding unit 2 to acquire the position and direction of the target area desired by the user. For
example, the position / direction information acquisition unit 5 refers to the space coordinate
data storage unit 2 from the camera position of the image currently being viewed by the user, the
position at which the camera is focused, etc., and the position of the target area that the user
wants to view And get directions. The position and direction in this case may be acquired by the
user through, for example, the GUI of the remote system.
[0065]
The area sound selection unit 6 uses the position information and direction information of the
target area acquired by the position / direction information acquisition unit 5 to select an area
sound to be used for reproduction according to the sound reproduction environment.
[0066]
First, the area sound selection unit 6 sets, for example, the area sound of the area closest to the
user's viewing position as the central sound source.
For example, assuming that "area E" in FIG. 3A is the viewing position, the area sound of "area E"
is the central sound source.
[0067]
The area sound selection unit 6 is an area sound of the area around the center sound source area
from the same direction as the direction in which the camera projects (for example, the direction
from the area B to the area E in the example of FIG. 3). Area sound of "front sound source", area
sound of "area B" as "back sound source", area sound of "area F" as "left sound source", area
sound of "area D" as "right sound source" Do. Furthermore, the area sound selection unit 6 selects
the area sound of "area I" as "sound source in front of diagonal left" and the area sound of "area
11-04-2019
20
G" as "sound source in front of diagonal right" according to the direction information related to
area pickup. The "area C" area sound may be set to "diagonal left rear sound source" and the
"area A" area sound may be set to "diagonal right rear sound source".
[0068]
Next, the area sound selection unit 6 selects an area sound to be used for reproduction according
to the sound reproduction environment on the user side. In other words, depending on the
acoustic environment, such as whether the user side reproduces stereophonic sound with
headphones or earphones or stereophonic sound with stereo speakers, and further with stereo
speakers, how many loudspeakers do it reproduces , Select the area sound to use for playback.
Here, information relating to the sound reproduction environment on the user side is set in
advance, and the area sound is selected according to the sound reproduction environment in
which the area sound selection unit 6 is set. Furthermore, even when the information related to
the sound reproduction environment is changed, the area sound selection unit 6 may select the
area sound based on the information on the sound reproduction environment after the change.
[0069]
The area volume control unit 7 adjusts the volume of each area sound in accordance with the
distance from the viewing position (the position of the target area). The volume is decreased as
the area is farther from the viewing position. Alternatively, the central area sound may be
maximized and the surrounding area sound may be reduced.
[0070]
The three-dimensional sound processing unit 8 acquires transfer function data held in the
transfer function data holding unit 10 according to the user's acoustic reproduction environment,
and performs three-dimensional acoustic processing of the area sound using the transfer
function data. Output.
[0071]
Then, the sound source speaker output unit 9 outputs the sound source data subjected to the
three-dimensional sound processing by the three-dimensional sound processing unit 8 to the
corresponding speaker arrays SA1 to SAn.
11-04-2019
21
[0072]
In the following, the state of the reproduction processing in which the selection of the area sound
of the remote place and the stereophonic sound processing are performed by the sound
collection reproduction system 100 according to the embodiment will be described.
[0073]
FIG. 3A is a top view of the remote space divided into nine spaces.
It is assumed that a plurality of cameras for projecting areas A to I and a plurality of microphone
arrays MA1 to MAm are arranged in the space of the remote place so that sounds of areas A to I
can be picked up. .
[0074]
For example, when the area E is selected as the viewing position by the user among the plurality
of areas in FIG. 3A and the camera projects the area E in the direction from the area B to the area
E, the area sound selection unit 6 Is a sound (area sound E) existing in the area E which is a
viewing position as a sound source (center sound source), the area sound H is “forward sound
source”, the area sound B is “back sound source”, and the area sound D is “ The right sound
source "and the area sound F are referred to as" left sound source ".
[0075]
Thereafter, the three-dimensional sound processing unit 8 selects an area sound to be used for
reproduction according to the user's sound reproduction environment, performs threedimensional sound processing on the selected area sound, and outputs it.
[0076]
For example, when the user's sound reproduction environment is a 2ch reproduction system, the
area sound selection unit 6 uses an area sound E as a center sound source, an area sound D as a
right sound source, an area sound F as a left sound source, and an area as a front sound source
Select H
11-04-2019
22
In addition, control is performed so that the volume of the area sound gradually decreases as the
distance from the center of the area E which is the viewing position is increased.
In this case, for example, the volume of the area sound H located farther than the area E which is
the viewing position is adjusted weakly.
In addition, the sound collection and reproduction system creates a binaural sound source in
which a head transfer function (HRTF) corresponding to each direction is convoluted for the
sound source selected as the area sound to be used for reproduction.
[0077]
More specifically, when the user's sound reproduction environment is a reproduction system
such as headphones or earphones, the binaural sound source created by the sound collection
reproduction system is output as it is.
However, in the case of the reproduction system such as the stereo speakers 51 and 52 as shown
in FIG. 3B, if the binaural sound source is reproduced as it is, the performance of the threedimensional sound is degraded. For example, when the left speaker (speaker located on the right
when viewed from the user) 51 in FIG. 3B reproduces the binaural sound source for the right ear,
the binaural sound source for the right ear output from the speaker 51 is the user. Cross talk that
can be heard even to the left ear degrades the performance of 3D sound. Therefore, the sound
collection and reproduction system 100 according to this embodiment measures the indoor
transfer function between the user and each of the speakers 51 and 52 in advance, and designs a
crosstalk canceller based on the indoor transfer function value. By applying a crosstalk canceller
to a binaural sound source, converting it into a transaural sound source, and reproducing the
same, it is possible to obtain the same three-dimensional sound effect as binaural reproduction.
[0078]
Also, for example, in the case of a reproduction system having three or more sound reproduction
environments (for example, when using three or more speakers), the area sound used for
reproduction is subjected to three-dimensional acoustic processing to match the arrangement of
11-04-2019
23
the speakers. Do. Furthermore, for example, in the case where the sound reproduction
environment is a 4ch reproduction system (for example, when a total of four speakers are
arranged one by one on the front, rear, left and right of the user), area sound E is reproduced
simultaneously from all the speakers Front, rear, left and right area sounds H, B, D, F are
reproduced from speakers corresponding to the respective directions. Furthermore, the area
sound I and the area sound G that exist diagonally forward to the area sound E, and the area
sound C and the area sound A that exists diagonally to the area sound E are converted into
transaural sound sources and reproduced. You may do so. Thereby, for example, the area sound I
is reproduced from the speakers located in front of and on the left side of the user, so that the
area sound I can be heard from between the front speaker and the left speaker.
[0079]
As described above, since the sound collection and reproduction system 100 according to this
embodiment collects sound in each area, the total number of sound sources present in the space
of the remote place does not matter. Further, since the positional relationship of the sound
collection area is determined in advance, the direction of the area can be easily changed
according to the viewing position of the user. Furthermore, the area collecting method described
in the reference 1 proposed by the inventor of the present invention has a small amount of
calculation, and the system can be operated in real time even if stereophonic sound processing is
added.
[0080]
(A-3) Effects of the Embodiment As described above, according to the embodiment, the space at
the remote location is divided into a plurality of areas, and sound is collected for each area, and
each area sound is selected according to the position designated by the user. By performing the
three-dimensional sound processing, reproducing the sound and operating these processing in
real time, it is possible to experience the current situation of various places in a remote place
with full presence.
[0081]
(B) Other Embodiments Although various modified embodiments are mentioned in the abovedescribed embodiment, the present invention is also applicable to the following modified
embodiments.
[0082]
11-04-2019
24
In the embodiment described above, a case has been described in which the present invention is
illustrated as a remote system in which a plurality of cameras and a plurality of microphone
arrays are disposed in a space at a remote place and played back stereophonic sound in
cooperation with camera images. The present invention can be applied to a system that
reproduces stereo sound of a remote place without cooperation with video.
[0083]
In the embodiment described above, the microphone array for picking up each area uses the
microphone is disposed at the apex of a right-angled isosceles triangle, but the microphone is
disposed at the apex of an equilateral triangle. It may be.
As a method of area pickup in that case, area pickup can be performed using the method
described in Reference 1.
[0084]
The sound collection and reproduction system according to the above-described embodiment is
divided into a sound collection system (sound collection device) provided on the remote side and
a reproduction system (reproduction device) provided on the user side, and the sound collection
system and reproduction system And may be connected by a communication line.
In that case, the sound collection system can include the microphone arrays MA1 to MAm
illustrated in FIG. 1, the data input unit 1, the space coordinate data holding unit 2, the
microphone array selection unit 3, and the area sound collection unit 4.
In addition, the reproduction system may include the position / direction information acquisition
unit 5, the area sound selection unit 6, the area volume adjustment unit 7, the three-dimensional
sound processing unit 8, and the transfer function data holding unit 10 illustrated in FIG.
[0085]
100: sound collecting and reproducing apparatus (sound collecting and reproducing system) 1:
data input unit 2: space coordinate data holding unit 3: microphone array selection unit 4: area
11-04-2019
25
sound collecting unit 5 position and direction information acquiring unit 6 area sound selection
unit 7 area volume control unit 8 three-dimensional sound processing unit 9 speaker output unit
10 transfer function data holding unit MA1 to MAm microphone array SA1 to SAn speaker array
41 ... directivity formation unit 42 ... delay correction unit 43 ... area sound power correction
coefficient calculation unit 44 ... area sound extraction unit.
11-04-2019
26
Документ
Категория
Без категории
Просмотров
0
Размер файла
40 Кб
Теги
jp2016025469, description
1/--страниц
Пожаловаться на содержимое документа