close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2012242597

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012242597
The present invention supports an operation that optimizes the arrangement of microphones and
speakers according to the environment in an environment where CAD data of an acoustic system
is not available. SOLUTION: Sound data actually recorded in a space constructing an acoustic
system, position information based on actual measurement of acoustic equipment used at the
time of the recording of the acoustic data, a sound source assumed after system construction, a
microphone and a speaker The acoustic characteristic at the installation position of the
microphone is estimated based on the position information etc., and the estimation result is
presented to the user. [Selected figure] Figure 3
Acoustic simulator, acoustic consulting device and processing method thereof
[0001]
The present invention relates to a simulation technique for supporting the appropriate
installation of audio equipment according to the use environment, and a consulting technique
using the simulation result.
[0002]
In recent years, the teleconferencing system is in widespread use.
By using the video conference system, it is possible to realize two-way communication by voice
and video even between remote places.
15-04-2019
1
[0003]
In the system, voice is recorded through a microphone, transmitted to the far side, and
reproduced through a speaker. However, various noises may be mixed into the sound recorded
by the microphone, and the mixing of the noises causes the quality deterioration of the sound
reproduced from the speaker.
[0004]
In general, ambient noise and acoustic echo, including wind noise generated by air conditioners
and projectors, are likely to be a problem. Acoustic echo is generated when sound reproduced
from a speaker installed in the same room as the microphone mixes into the microphone.
[0005]
However, when the sound recorded by the microphone is to be transmitted to the distance
without being processed (when the acoustic echo is also transmitted), the voice of the user's own
voice is delayed for the distant speaker. It feels like it's coming. In this case, the ease of
conversation is greatly impaired. For this reason, an acoustic signal processing system having a
function of removing ambient noise and acoustic echo has been conventionally proposed (see, for
example, Non-Patent Document 1).
[0006]
In addition to ambient noise and acoustic echo, the reverberant component of the speaker is also
a kind of noise. Therefore, suppression of these reverberation components is also required.
However, this reverberation component is highly correlated with the voice of the speaker.
Therefore, when removing the reverberation component, there is a possibility that the voice of
the speaker may be suppressed by mistake. In particular, removal of early reflections is difficult.
However, a technology has recently been developed for removing the rear reverberation
component that delays the arrival to the microphone by about several hundreds of ms for the
sound transmitted directly to the microphone from the mouth of the speaker (see, for example,
15-04-2019
2
Non-Patent Document 2).
[0007]
The solution to these problems requires tuning and optimization prior to using the video
conferencing system. For these tasks, acoustic characteristics recorded at the actual site are often
used. For example, the acoustic environment is regarded as a type of system, and an input sound
(sound from the speaker's mouth or the moment from the speaker) and an output sound from the
system (sound from the microphone) are considered. The relationship is often measured as an
impulse response and used for tuning and optimization. For example, TSP (Time Stretched Pulse)
is used to measure an impulse response.
[0008]
The techniques for tuning and optimization that have been proposed up to this point are all
changes in the parameters used mainly in acoustic signal processing, and the recorded impulse
response is used to simulate and evaluate the speech in the corresponding environment. It is
used for
[0009]
By the way, in the application of acoustic design at the time of design or construction of a
concert hall or other buildings, a system that simulates the acoustic environment from CAD data
of the target building and can know how to hear the sound has been developed (for example, ,
Patent Document 1).
[0010]
Patent No. 2846162 specification
[0011]
Togami, M. et al., "A Teleconferencing System with a Tabletop Speech Protrusion Removal
Function Using a Vertically Arranged Microphone Array", Transactions of the Institute of
Electronics, Information and Communication Engineers D, Vol.
15-04-2019
3
J93-D No. 10 pp.
2069-2084、2010/10.
K. Kinoshita, M. Delcroix, T. Nakatani and M. Miyoshi, “Suppression of late reverberation effect
on speech signal using long-term multiple-step linear prediction”, IEEE Transactions on Audio,
Speech and Language processing, 17 (4) , pp. 534-545, 2009
[0012]
However, the above-described acoustic signal processing technology is susceptible to the indoor
environment (acoustic conditions). Specifically, it is easily influenced by the reverberation time,
the type and generation position of noise, the positional relationship between the speaker and
the microphone, the reproduction volume of the speaker, and the like.
[0013]
Therefore, for tuning and optimization of a video conference system etc., it is necessary to
optimize not only the parameters used in acoustic signal processing but also the positional
relationship between microphones and speakers and the number of microphones according to
the usage environment There is.
[0014]
Moreover, the existing acoustic simulator needs to have acquired CAD data of the site in advance,
and when CAD data did not exist, it was not possible to simulate acoustic characteristics in the
first place.
[0015]
The inventor of the present invention, who has studied these technical problems in earnest, is a
simulation technology capable of optimizing the positional relationship between the microphone
and the speaker according to the usage environment even in an environment where CAD data
does not exist or can not be used. Invented acoustic consulting technology that uses
[0016]
15-04-2019
4
The sound simulator according to the present invention is assumed after sound system data is
actually recorded in the space where the sound system is constructed, position information based
on actual measurement of a microphone and a speaker used at the time of recording the sound
data, and system construction. The acoustic characteristic at the installation position of the
microphone is estimated based on the position information of the sound source, the microphone
and the speaker, and the like.
[0017]
Further, the sound consulting apparatus according to the present invention evaluates whether
the sound characteristics estimated by the sound simulator satisfy predetermined performance,
and presents the evaluation result to the user.
[0018]
ADVANTAGE OF THE INVENTION According to this invention, even when CAD data regarding the
space which constructs | assembles an acoustic system does not exist or it can not utilize,
construction of the acoustic environment suitable for space can be assisted.
Problems, configurations, and effects other than those described above will be apparent from the
description of the embodiments below.
[0019]
The figure explaining the installation environment of the video conference system which
concerns on a form example.
The figure which shows the hardware constitutions of the sound consulting apparatus (sound
simulator) which concerns on a form example.
The flowchart which shows the process sequence performed with the sound simulator which
concerns on a form example.
15-04-2019
5
The figure which shows the example of a microphone information table.
The figure which shows the example of a speaker information table. The figure which shows the
example of a table regarding a virtual speaker position. The figure which shows the structural
example of an acoustic characteristic measurement part. The figure which shows the structural
example of an ambient noise measurement part. The figure which shows the structural example
of an acoustic simulation part. The figure explaining the direct sound component of an impulse
response, and a reverberation component. The flowchart which shows the process sequence
performed with the sound consulting apparatus which concerns on a form example. The figure
which shows the example of a desired performance table. The figure which shows the structural
example of a performance evaluation part. The flowchart explaining the process sequence
performed when optimizing the number and arrangement | positioning of a microphone, and
arrangement | positioning of a speaker.
[0020]
Hereinafter, embodiments of the present invention will be described based on the drawings. In
addition, the aspect of implementation of this invention is not limited to the form example
mentioned later, In the range of the technical thought, various deformation | transformation are
possible.
[0021]
Hereinafter, the structure of a sound consulting apparatus suitable for use in constructing a
video conference system in a space where CAD data does not exist will be described. FIG. 1 shows
an example of an installation environment of a video conference system to which the audio
consulting apparatus is applied. The “television conference system” in the present
specification is also used to include a video conference system and a web conference system.
[0022]
(Components and Arrangement of Video Conference System) In the present embodiment, the
video conference system assumes a general-purpose video conference system provided with a
microphone and a speaker. However, it may be a video conference system optimized for a
15-04-2019
6
specific application.
[0023]
The teleconference system installation environment 101 is not particularly limited as long as it is
a space (environment) for constructing a teleconference system. Here, a conference room is
assumed. In the case of this embodiment, it is assumed that the conference microphone 104 is
installed on the desk 102 disposed in the conference room. In addition, it is assumed that the
conference speakers 103 are disposed in the same conference room. The conference speaker
103 and the conference microphone 104 may always be connected to a general-purpose video
conference system.
[0024]
The video conference system according to FIG. 1 assumes a case where there are two speakers.
In FIG. 1, assumed speaker positions are represented by assumed speaker positions 105-1 and
105-2. However, systemically, one speaker or three or more speakers may be used. Also, the
number of the conference speakers 103 and the conference microphones 104 is not limited to
one in a system, and may be a plurality.
[0025]
In FIG. 1, in addition to the conference speaker 103 for giving acoustic conditions when using the
video conference system, the conference microphone 104 and the assumed speaker positions
105-1 and 105-2, four units are used at the time of measurement of acoustic characteristics. The
measurement microphone array 106 and the two measurement speaker arrays 107 are
illustrated.
[0026]
In this embodiment, the measurement microphone array 106 and the measurement speaker
array 107 are arranged by the measurement user when measuring the acoustic characteristics of
the teleconference system installation environment 101.
15-04-2019
7
In the case of FIG. 1, the measurement microphone arrays 106 are disposed at the four corners
of the desk 102. In addition, the measurement speaker array 107 is disposed behind the
assumed speaker positions 105-1 and 105-2.
[0027]
Here, the measurement speaker array 107 is used to emit a reference sound when measuring
acoustic characteristics. In the case of this embodiment, the measurement speaker array 107 is
an assembly of a plurality of speakers, but may be configured by one speaker.
[0028]
In the case of FIG. 1, although a plurality of measurement microphone arrays 106 and a plurality
of measurement speaker arrays 107 are arranged, only one may be arranged. Also, the
measurement microphone 106 and the measurement speaker array 107 are removed from the
conference room after the measurement of the acoustic characteristics by the acoustic consulting
device and the output of the optimum condition. However, a part may be used also as the
conference speaker 103 or the conference microphone 104.
[0029]
(Hardware Configuration of Sound Consulting Device) FIG. 2 shows a hardware configuration of
the sound consulting device according to the embodiment. The acoustic simulation apparatus is
realized as part of the function of the acoustic consulting apparatus. Therefore, the hardware
configuration of the acoustic consulting device is common to the acoustic simulation device.
Below, it demonstrates as a hardware configuration of an audio consulting apparatus.
[0030]
The sound consulting apparatus according to the embodiment is configured by connecting the
measurement microphone array 106 and the measurement speaker array 107 to a computer.
[0031]
15-04-2019
8
An acoustic signal acquired by the measurement microphone array 106 is converted from an
analog signal to a digital signal by a multi-channel AD (Analog to Digital) converter 202.
The converted digital signal is provided to central processing unit 203.
[0032]
The central processing unit 203 executes various programs. In the case of this embodiment, the
central processing unit 203 executes a process of simulating acoustic characteristics when the
conference microphone 104 and the conference speaker 103 are arranged at an arbitrary
position in the conference room, and a process of evaluating the process result, etc. Do. In the
present specification, a program for realizing the processing function is referred to as “sound
signal processing program”. The sound signal processing program is stored in the non-volatile
memory 204 and read out to the central processing unit 203 as needed. Incidentally, a work
memory for executing the program is secured on the volatile memory 205.
[0033]
As described above, the acoustic consulting device according to the embodiment measures the
acoustic characteristics of the television conference system installation environment 101 with
the measurement microphone array 106 and the measurement speaker array 107 and uses them
as basic data for acoustic simulation. The acoustic characteristics here are impulse response
characteristics and ambient noise.
[0034]
When measuring acoustic characteristics, the central processing unit 203 transmits a reference
signal for impulse response measurement as a digital signal to the multi-channel DA (Digital to
Analog) converter 206. The reference signal is converted into an analog signal in the multichannel DA converter 206 and output to the measurement speaker array 107. The measurement
speaker array 107 emits a sound corresponding to the input reference signal to the video
conference system setting environment 101.
15-04-2019
9
[0035]
The central processing unit 203 is provided with a mouse 208 and a keyboard 209 as an
interface for the user. The user uses these interfaces to input information to the central
processing unit 203. Further, the result of the acoustic simulation is displayed on the display
210, and the user can visually confirm the simulation result and the evaluation result.
[0036]
(Process as Sound Simulator) First, the basic function (sound simulation function) of the sound
consulting device according to the present embodiment will be described. Here, the acoustic
simulation function is a function of simulating acoustic characteristics recorded when the
conference speaker 103 and the conference microphone 104 are virtually set at any position in
the teleconference system installation environment 101.
[0037]
However, in the present embodiment, it is assumed that CAD data of the teleconference system
installation environment 101 does not exist. For this reason, the acoustic consulting device
operating as an acoustic simulator measures the acoustic characteristics of the television
conference system installation environment 101 using the measurement microphone array 106
and the measurement speaker array 107, and uses the measurement results to measure at the
virtual position. Calculate acoustic characteristics. In the following, the main body providing the
acoustic simulation function is referred to as an acoustic simulator.
[0038]
FIG. 3 shows an outline of the processing procedure of the sound simulator. In the processes 301
to 303, the sound simulator registers the position information of the conference microphone 104
and the conference speaker 103 installed in the teleconference system installation environment
101 and the registration process of the assumed speaker positions 105-1 and 105-2. Run. Note
that the execution order of the processes 301 to 303 is an example, and may be performed in
any order.
15-04-2019
10
[0039]
In process 301, the sound simulator executes a process of registering information on the
conference microphone 104. The information here is input through the mouse 208, the keyboard
209 and other input devices so that the central processing unit 203 can process it. Because CAD
data does not exist, the registration (setting) operation is performed manually. The same applies
to registration of position information of other audio devices.
[0040]
The information on the conference microphone 104 includes the installation place in the
teleconference system installation environment 101, the directivity characteristic of the
microphone, the orientation of the microphone, and the like.
[0041]
FIG. 4 shows an example of registration of information of the conference microphone 104.
As described later, the registration items shown in FIG. 4 are common to the information of the
measurement microphone. However, the information for a meeting and the information for a
measurement are managed by another table.
[0042]
Each row in FIG. 4 corresponds to the information of each microphone. In the case of FIG. 4, the
use of three microphones is assumed. Each row is assigned a microphone ID that uniquely
identifies the microphone. In each row, the three-dimensional position (x, y, z) and the direction
of the microphone are stored. If the units are consistent, any unit system can be used. The
coordinate system is assumed to be an absolute coordinate, and it is assumed that the same
coordinate system is used for each teleconference system installation environment 101.
[0043]
15-04-2019
11
When the conference microphone 104 is not set, coordinate values to be installed may be input.
For example, an actual measurement value is input as a coordinate value specifying the location
of the conference microphone 104. If the information on the installation position can be referred
to, the information may be manually input. In the case of this embodiment, the conference
microphone 104 is already installed, and is used as a reference point for actually measuring the
installation positions of the measurement microphone array 106 and the measurement speaker
array 107.
[0044]
In addition, information on the directional characteristics of the microphone is given to each row.
The sound pressure level for each azimuth angle with respect to the front is uniquely determined
from the information of directivity characteristics. The directivity characteristics can generally be
known from a catalog of microphones or the like.
[0045]
In process 302, the sound simulator executes a process of registering information related to the
conference speaker 103. The information here is also input through the mouse 208, the
keyboard 209, and other input devices so that the central processing unit 203 can perform
processing.
[0046]
The information on the conference speaker 103 includes the installation location in the
teleconference system installation environment 101, the radiation characteristic of the speaker,
the orientation of the speaker, and the like.
[0047]
FIG. 5 shows an example of registration of information of the conference speaker 103.
15-04-2019
12
As described later, the registration items shown in FIG. 5 can also be used to register information
on the measurement speaker array. However, the information for a meeting and the information
for a measurement are managed by another table.
[0048]
Each row in FIG. 5 corresponds to the information of each speaker. In the case of FIG. 5, the use
of three speakers is assumed. Each row is assigned a speaker ID that uniquely identifies the
speaker. In each row, the three-dimensional position (x, y, z) and the orientation of the speaker
are stored. The unit system is arbitrary, but uses the same coordinate system as the microphone.
[0049]
When the conference speaker 103 is not provided, coordinate values to be installed may be
input. For example, an actual measurement value is input as coordinate values specifying the
location of the conference speaker 103. If the information on the installation position can be
referred to, the information may be manually input. When the conference speaker 103 is already
installed, the set position may be used as a reference point when the installation positions of the
measurement microphone array 106 and the measurement speaker array 107 are actually
measured.
[0050]
Besides, information on the radiation characteristic of the speaker is given to each row. The
sound pressure level for each azimuth angle with respect to the front is uniquely determined
from the information of the radiation characteristic. The radiation characteristics can generally
be known from a catalog of speakers or the like.
[0051]
In processing 303, the acoustic simulator executes registration processing of information related
to the assumed speaker position. The assumed speaker position is information for specifying a
range assumed as the seating position of the participant of the video conference. The information
15-04-2019
13
here is also input through the mouse 208, the keyboard 209, and other input devices so that the
central processing unit 203 can perform processing.
[0052]
FIG. 6 shows an example of registration of assumed speaker positions. Each row in FIG. 6
corresponds to the information on the assumed speaker position. If there are a plurality of
assumed speaker positions, a plurality of speaker positions are set. FIG. 6 shows the case where
there are three assumed speaker positions. Each row is provided with an assumed speaker
position ID for uniquely identifying the assumed speaker position. Further, in each line, a threedimensional position (x, y, z) giving the center position of the assumed speaker position and a
radius R giving the range for the center position are stored. The coordinate system is an absolute
coordinate and uses the same coordinate system as the microphone.
[0053]
Next, in the processes 304 to 305, the sound simulator sets the position information and the like
of the measuring device (sound device) for measuring the sound characteristics of the television
conference system installation environment 101. The execution order of the process 304 and the
process 305 is an example, and either may be executed first.
[0054]
In process 304, the acoustic simulator executes a process of registering information regarding
the measurement microphone 106. The information here is input through the mouse 208, the
keyboard 209 and other input devices so that the central processing unit 203 can process it.
[0055]
The information of the measurement microphone 106 includes the installation location in the
teleconference system installation environment 101, the directivity characteristic of the
microphone, the orientation of the microphone, and the like. As described above, the information
on the measurement microphone 106 is recorded in a table different from the information on the
15-04-2019
14
conference microphone 104. Note that the coordinate system is an absolute coordinate and uses
the same coordinate system as the microphone. Here, the coordinate values of the measurement
microphone 106 may be input as relative position information with respect to a reference point
(for example, the conference microphone 104) set in the teleconference system installation
environment 101. When this input method is adopted, the central processing unit 203 executes
processing of converting into absolute coordinates.
[0056]
In process 305, the acoustic simulator executes a process of registering information related to
the measurement speaker array 107. The information here is input through the mouse 208, the
keyboard 209 and other input devices so that the central processing unit 203 can process it.
[0057]
The information of the measurement speaker array 107 includes the installation place in the
television conference system installation environment 101, the radiation characteristic of the
speaker, the orientation of the speaker, and the like. As described above, the information on the
measurement speaker array 107 is recorded in a table different from the information on the
conference speaker 103. Note that the coordinate system is an absolute coordinate and uses the
same coordinate system as the microphone. Again, the coordinate values of the measurement
speaker array 107 may be input as relative position information with respect to a reference point
(for example, the conference microphone 104) set in the teleconference system installation
environment 101. When this input method is adopted, the central processing unit 203 executes
processing of converting into absolute coordinates.
[0058]
In process 306, the acoustic simulator measures acoustic characteristics specific to the video
conference system setting environment 101 using the measurement microphone array 106 and
the measurement speaker array 107 installed in the video conference system installation
environment 101. In the process 306, the acoustic simulator performs measurement of transfer
characteristics (impulse response) between the measurement speaker array 107 and the
measurement microphone array 106 and measurement of ambient noise.
15-04-2019
15
[0059]
FIG. 7 shows a functional block configuration of a program for realizing the processing function
corresponding to the process 306. In the following description, the program is called an acoustic
characteristic measurement unit 701. The acoustic characteristic measurement unit 701 includes
an impulse response measurement unit 702 and an ambient noise measurement unit 703.
[0060]
The impulse response measurement unit 702 measures an impulse response in the video
conference system setting environment 101 using, for example, the TSP method (see, for
example, Patent Document 1). In addition, sound including all frequency components such as
white noise is emitted from the measurement speaker array 107 and recorded by the
measurement microphone array 106, and the correlation coefficient between the signal recorded
by the microphone and the original signal of the emitted sound is calculated. The impulse
response may be measured by examining.
[0061]
As shown in FIG. 7, when measuring the impulse response, a multi-channel AD converter 202
connected to the measurement microphone array 106 and a multi-channel DA converter 206
connected to the measurement speaker array 107 are used.
[0062]
The multi-channel DA converter 206 receives a white signal or TSP signal (acoustic signal S3)
used for impulse response measurement from the impulse response measurement unit 702, and
converts the acoustic signal S3 from a digital signal to an analog signal.
The multi-channel AD converter 202 is controlled in synchronization with the multi-channel DA
converter 206, and converts an audio signal in impulse response measurement from an analog
signal to a digital signal (acoustic signals S1 and S2). The converted digital signal is supplied to
an impulse response measurement unit 702 and an ambient noise measurement unit 703.
15-04-2019
16
[0063]
The impulse response measurement unit 702 applies correlation coefficient estimation
processing or TSP (Time Stretched Pulse) inverse conversion processing to the given signal to
obtain an impulse response S4. Since these processes themselves are known, detailed description
is omitted.
[0064]
On the other hand, ambient noise measuring section 703 measures ambient noise in video
conference system installation environment 101 from a given signal. The ambient noise
recording controls the equipment of the teleconferencing system installation environment 101 so
that the noise in the actual teleconferencing is as close as possible. For example, when an air
conditioner or a projector is deployed in the television conference system installation
environment 101, ambient noise is recorded in a state where these devices are operated. Of
course, when ambient noise is recorded, no sound is output from the measurement speaker array
107. Similarly, when recording ambient noise, be careful not to erroneously record speaker
sounds. However, in the case where it is assumed that the sound of paper rubbing or the sound
of tapping the desktop is assumed as ambient noise, the recording environment may be devised
so that these sounds are generated during recording.
[0065]
FIG. 8 shows a detailed block configuration of the ambient noise measurement unit 703. The
ambient noise measurement unit 703 separates the sound signal S2 collected by the
measurement microphone array 106 into signals S11, S12,..., S1N for each sound source, and the
volume of each sound source from the signal for each sound source , And 803-N, which estimate
the spatial location. Here, the sound source localization units 803-1, 803-2, ..., 803-N output
information S5 (S21, S22, ... S2N) of the volume of each sound source and the sound source
position.
[0066]
The sound source separation unit 802 uses independent component analysis, minimum
dispersion beamformer, nonnegative matrix decomposition, and other common sound source
15-04-2019
17
separation processing techniques, and microphone input signals of a plurality of channels
correspond to the respective sound sources S11, S12,. To separate.
[0067]
The sound source localization units 803-1, 803-2, ..., 803-N obtain the positions of the respective
sound sources using an SRP-PHAT (Steered Response Power-Phase Transform) method or the
like based on the phase difference.
In addition to this, when the measurement microphone array 106 is disposed in a distributed
manner, a method of estimating the sound source position from the amplitude ratio between the
microphones may be used.
[0068]
In processing 307, the acoustic simulator executes registration processing of virtual information
on the conference microphone 104 and the conference speaker 103 for acoustic simulation. The
acoustic simulator performs acoustic simulation based on the information registered in this
process. The user can register virtual values for the information registered for the conference
microphone 104 and the conference speaker 103, respectively. That is, virtual values regarding
the installation position, the direction, the performance, and the like can be registered. For
example, the position and orientation information registered in the processes 301 and 302 may
be used as it is, and only the directivity characteristic of the conference microphone 104 may be
virtually changed.
[0069]
In the case of this embodiment, the user executes registration (setting) of the information using,
for example, a graphical user interface (GUI). The registration of the information may be
performed by directly inputting numerical values or the like, or a method of selecting from a list
defined in advance may be adopted.
[0070]
15-04-2019
18
In processing 308, the acoustic simulator performs acoustic simulation based on acoustic
characteristics measured for the video conference system setting environment 101 and
information regarding the virtually-set conference microphone 104 and conference speaker 103,
and performs simulation. Output the result and finish the process.
[0071]
FIG. 9 shows a functional block configuration of a program for realizing the processing function
corresponding to the process 308.
In the following description, the program is called an acoustic simulation unit 901. The acoustic
simulation unit 901 has a function of estimating the impulse response of the speech of the
conference participant, a function of estimating the amount of residual echo collected by the
conference speaker 104, and noise at the microphone position virtually set in simulation. Have
the ability to simulate
[0072]
The sound simulation unit 901 includes a direct sound / reverberation sound division unit 902,
an impulse response estimation unit 903 of an assumed speaker position, a residual echo
estimation unit 904, and a noise simulation unit 905 of an assumed microphone position.
[0073]
The direct sound / reverberation sound division unit 902 divides the impulse response S4
measured by the impulse response measurement unit 702 into a direct sound component and a
reverberation sound component.
FIG. 10 shows an example of the impulse response S4. The upper part in the figure is the
waveform of the impulse response acquired when the distance between the measurement
speaker 107 and the measurement microphone 106 is 1 m, and the lower part in the drawing is
the waveform of the impulse response acquired when the distance is 3 m. It is. Here, the
horizontal axis is time, and the vertical axis is signal strength.
15-04-2019
19
[0074]
As shown by the dashed line in the figure, the waveform appearing near the beginning of the
impulse response corresponds to the direct sound component, and the waveform appearing
thereafter corresponds to the reverberation component. As can be seen by comparing the two
waveforms, the direct sound component is clearly affected by the distance. It can be seen that the
peak value of the direct sound component is larger at 1 m than at 3 m. On the other hand, it can
be seen that the volume of the reverberation component does not change significantly.
[0075]
In the case of this example, the ratio of direct sound at distances of 1 m and 3 m was about 9.5
dB, and the ratio of reverberation sound was about 2 dB. When the distance changes by three
times from 1 m to 3 m, the volume is reduced by about 9.5 dB, so that it can be considered that
the direct sound component changes in volume in inverse proportion to the square of the
distance. On the other hand, the volume of reverberation is considered to be determined almost
independently of the change in distance.
[0076]
First, the direct sound / reverberation sound division unit 902 obtains the start point smax of the
direct sound of the impulse response S4 using the following equation.
[0077]
[0078]
The end point of the direct sound is given by smax + w.
Here, w is the window width and is set to a fixed value.
[0079]
Next, the direct sound / reverberation sound division unit 902 obtains the direct sound
15-04-2019
20
component hdirect of the impulse response using the start point Smax using the following
relational expression.
[0080]
[0081]
On the other hand, the direct sound / reverberation sound dividing unit 902 obtains the
reverberation component hreverb using the following equation.
[0082]
The impulse response estimation unit 903 of the assumed speaker position uses the direct sound
component hdirect and the reverberation component hreverb of the impulse response, and
receives an utterance from the assumed speaker position using the virtually arranged conference
microphone. Estimate the impulse response.
The impulse response estimation unit 903 is provided with information of the assumed speaker
position and information of the assumed conference microphone.
In FIG. 9, these pieces of information are indicated by S41.
The impulse response hsynth is given by the following equation.
[0083]
[0084]
Here, α is the attenuation factor of the direct sound component, which is given by the following
equation.
[0085]
15-04-2019
21
Here, rpre is the distance between the measurement speaker array 107 and the measurement
microphone array 106 used when measuring the impulse response.
rpost is a distance between the assumed speaker position and the conference microphone 104.
The assumed speaker position has a size rather than a single point.
For this reason, rpost which gives the largest rpost in the set range of assumed speaker positions
is set.
[0086]
βpre is a coefficient determined depending on the directivity characteristic of the measurement
microphone array 106 used for measuring the impulse response. In this embodiment, the
direction in which the measurement microphone array 106 is directed is taken as a reference
direction, and the directivity characteristic of the measurement microphone array 106
corresponding to the relative direction of the measurement speaker array 107 with respect to
that direction is taken as βpre.
[0087]
βpost is a coefficient determined depending on the directivity characteristic of the virtually
arranged conference microphone. In the case of this embodiment, the direction in which the
conference microphone is directed is taken as a reference direction, and the directivity
characteristic of the measurement microphone array 106 corresponding to the relative direction
of the phase communicator position with respect to that direction is taken as βpost.
[0088]
γpre is a coefficient determined depending on the radiation characteristic of the measurement
speaker array 107 used to measure the impulse response. In this embodiment, the direction in
15-04-2019
22
which the measurement speaker array 107 is directed is taken as a reference direction, and the
radiation characteristic of the measurement speaker array 107 corresponding to the relative
direction of the measurement microphone array 106 with respect to that direction is taken as
γpre.
[0089]
γpost is a coefficient determined depending on the radiation characteristics of the virtually
arranged assumed speaker position. In this embodiment, the direction in which the assumed
speaker is facing is taken as a reference direction, and the radiation characteristic of the assumed
speaker corresponding to the relative direction of the conference microphone with respect to
that direction is taken as γpost.
[0090]
In general, it is considered that the supposed speaker points in a direction in which the display
installed in the conference room can be viewed. Therefore, in the case of this embodiment, the
assumed speaker position is set to the facing position of the display. Further, it is desirable that
the radiation characteristics of the assumed speaker be measured beforehand by a dummy head
or the like and held in a database.
[0091]
The impulse response estimation unit 903 for the assumed speaker position outputs the impulse
response hsynth generated for each assumed speaker position, and ends the processing. If there
are a plurality of measured impulse responses, the impulse response estimation unit 903 selects
an impulse response that minimizes the difference between rpre and rpost.
[0092]
The residual echo estimation unit 904 estimates residual echo at the position of the virtually
arranged conference microphone. As this pre-processing, the residual echo estimation unit 904
generates an impulse response from the virtually arranged conference speaker to the virtually
15-04-2019
23
arranged conference microphone based on the direct sound component and the reverberation
component of the impulse response S4. . The generation of the impulse response by the residual
echo estimation unit 904 is performed according to the same processing procedure as the
impulse response estimation unit 903 of the assumed speaker position. Let the generated
impulse response be hecho. The residual echo estimation unit 904 is provided with information
of the assumed speaker position, the assumed conference microphone, and the conference
speaker. In FIG. 9, these pieces of information are indicated by S42.
[0093]
Next, the residual echo estimation unit 904 calculates an impulse response hresidual of the
residual echo from the following equation.
[0094]
[0095]
Here, λ and tspec are parameters determined based on the specifications of the echo canceller
to be used.
For example, in the case of an echo canceller having a performance of 20 dB and an echo
cancellation time T seconds, λ corresponds to 0.1 and t spec corresponds to T seconds.
These pieces of information are indicated by S43 in FIG. The residual echo estimation unit 904
outputs hecho and hresidual and ends the processing.
[0096]
The noise simulation unit 905 of the assumed microphone position estimates the noise level
Pnoise of the assumed microphone position using the information S21 to S2N of the sound
volume and position of the sound source separated by the sound source separation.
[0097]
15-04-2019
24
[0098]
Here, N is the number of sound sources.
Povserved (i) is the volume of the ith sound source separated by sound source separation.
rpre (i) is the distance to the ith sound source position and the microphone for measuring
acoustic characteristics, and rpost (i) is the distance to the ith sound source position and the
conference microphone virtually arranged. In FIG. 9, information of the assumed conference
microphone is indicated by S44. The noise simulation unit 905 outputs the estimated noise level
Pnoise and ends the processing.
[0099]
Note that, if necessary, the acoustic simulator displays on the display 210 the impulse responses
hsynth, hecho, hresidual and the noise level Pnoise calculated as a result of simulation on the
display 210 as characters and figures. The user checks in advance the contents of the screen
display to determine in advance what acoustic characteristics can be obtained when the
conference microphone 104 and the conference speaker 103 are used at the virtual position
targeted for simulation. can do.
[0100]
As described above, the acoustic simulator according to the present embodiment is obviously
different from the prior art in which the parameters of the audio signal processing are adjusted
in that the simulation is performed by the virtual adjustment of the installation position.
[0101]
(Process 1 as Sound Consulting Device) Subsequently, the processing operation as the sound
consulting device according to the present embodiment will be described.
Here, a case will be described in which it is determined each time whether the use condition of
15-04-2019
25
the conference microphone and the speaker virtually input by the user satisfies the desired
performance, and the determination result is notified to the user.
[0102]
The acoustic consulting device evaluates the processing result of the acoustic simulation (that is,
the estimated acoustic signal) described above, and determines whether or not the arrangement
of the conference microphone 104 and the conference speaker 103 suitable for the video
conference system setting environment 101 is optimal. Output the evaluation result of Of course,
in the sound consulting apparatus according to the embodiment, it is premised that CAD data
relating to the video conference system setting environment 101 can not be used.
[0103]
FIG. 11 shows an outline of the processing procedure of the sound consulting apparatus. In FIG.
11, the parts corresponding to those in FIG. 3 are shown with the same reference numerals. The
difference between FIG. 11 and FIG. 3 is processing 309, processing 310 and processing 311.
[0104]
In process 309, setting of evaluation performance for evaluating the result of acoustic simulation
is performed. The desired performance input here is also input by the user through the operation
of the mouse 208, the keyboard 209, and other input devices. FIG. 12 shows an example of the
desired performance. In FIG. 12, the reverberation ratio amount of the speaker's speech collected
by the conference microphone 104, the environmental noise ratio amount, and the residual echo
ratio amount after the acoustic echo canceler are defined. Both are defined in the form of SNR
(Signal To Noise Ratio).
[0105]
In addition, in the case of FIG. 11, although the process 309 is arrange | positioned between the
measurement process (process 306) of acoustic characteristics, and the setting process (process
307) of the virtual information of a microphone and a speaker, the determination process
(process) of a simulation result It may be placed at any time before the execution of 310).
15-04-2019
26
[0106]
Further, in the case of the audio consulting device shown in FIG. 11, the virtual information of
the conference microphone 104 and the conference speaker 103 registered in the process 307
merely provides an initial condition for evaluating the simulation result.
Therefore, in the case of the present embodiment, the information registered in the process 301
and the process 302 may be read out as it is and registered as virtual information.
[0107]
In process 310, the acoustic consulting device determines whether the simulation result of the
acoustic environment executed for the virtual conference microphone and the conference
speaker satisfies the desired performance preset by the user.
[0108]
Here, when it is determined that the performance is satisfied, the audio consulting device outputs
information virtually set for the conference microphone 103 and the conference speaker 104 as
a condition for satisfying the desired performance, and the process ends. Do.
For example, the position and orientation information of the conference microphone 104 and the
conference speaker 103 that can obtain desired performance is output.
[0109]
On the other hand, if it is determined that the performance is not satisfied, the acoustic
consulting device proceeds to a process 311. In the process 311, the acoustic consulting device
executes a process of accepting changes to the information of the conference microphone 104
and the conference speaker 103 which are virtually registered for simulation. The user interface
used in the process 307 is used to input a change to the registration information. Here, the input
of the change of the registration information may be a method in which the user manually inputs
it individually or a method in which it is automatically set. The automatic setting will be
15-04-2019
27
described later. In any case, when the completion of the change of the setting information is
instructed by the user, the acoustic consulting device returns to the process 308 and executes
the acoustic simulation based on the changed information.
[0110]
FIG. 13 shows a functional block configuration of a program used to determine whether the
acoustic simulation result satisfies the desired performance. In the following description, the
program is referred to as a performance evaluation unit 1301. The performance evaluation unit
1301 is composed of the following three evaluation units and one comparison unit.
[0111]
The direct sound / reverberation sound ratio evaluation unit 1302 evaluates the ratio of the
direct sound component hdirect and the reverberation sound component hreverb from the
impulse response hsynth of the assumed speaker position. First, the acoustic consulting device
separates the input impulse response hsynth into the direct sound component hdirect (t) and the
reverberation sound component hreverb (t) using Equations 2 and 3. When these components
are obtained, the direct sound / reverberation ratio evaluation unit 1302 calculates the ratio
Preverb of the direct sound component and the reverberation based on the following equation.
[0112]
[0113]
The direct sound / reverberation sound ratio evaluation unit 1302 outputs the minimum value of
the calculated ratio Preverb.
The speaker speech / residual echo ratio evaluation unit 1303 estimates the ratio Pecho based
on Equation 9 for each assumed speaker position.
[0114]
15-04-2019
28
[0115]
Here, ρ is obtained by Equation 10.
[0116]
Note that h1, direct represents an impulse response in the case where the distance between the
virtually set conference microphone position and the assumed speaker position is 1 m.
Also, A is an assumed speaker volume at a distance of 1 m.
μ is obtained by Equation 11.
[0117]
[0118]
Here, hsp represents an impulse response from the position of the virtually set conference
speaker to the assumed speaker position.
B is the sound pressure level of the speaker output signal at the assumed speaker position.
[0119]
The speaker speech / residual echo ratio evaluation unit 1303 obtains and outputs the minimum
value of Pecho. The speaker speech / noise ratio evaluation unit 1304 obtains Pn defined by
Expression 12 for each assumed speaker position, and obtains and outputs the minimum value of
Pn.
15-04-2019
29
[0120]
[0121]
The desired performance comparing unit 1305 compares the desired performance S51 set by the
user with the Preverb, Pecho, and Pn calculated in each unit of the previous stage, and
determines whether each value is within the desired performance.
The comparison result for each value is output as the determination result S52.
[0122]
The sound consulting device displays the result of the determination on the display 210 as
characters and figures. The user can know the determination result as to whether or not the use
of the meeting microphone 104 and the meeting speaker 103 satisfying the condition specified
virtually satisfies the desired performance by confirming the contents of the screen display. . In
addition, when the desired performance is not satisfied, it is possible to search for a condition
that can obtain the desired performance by repeating the designation of a new candidate.
[0123]
(Process 2 as a sound consulting apparatus) Here, a processing operation example of the sound
consulting apparatus according to the present embodiment will be described. Here, a case will be
described where the audio consulting device is equipped with a function of automatically
correcting the virtual conditions of the conference microphone and the speaker so as to satisfy
the desired performance.
[0124]
FIG. 14 shows an outline of the processing procedure of the sound consulting apparatus. In FIG.
14, the parts corresponding to those in FIG. 11 are denoted by the same reference numerals, and
the processing contents up to the processing 308 are the same as those in FIG. Therefore, in the
15-04-2019
30
following, the description starts from the time after the acoustic simulation of the process 308 is
performed.
[0125]
In process 311, the acoustic consulting device checks an error between the desired performance
and the simulation result set in advance by the user. This process is executed in the desired
performance comparing unit 1305 shown in FIG. After this process 311, the acoustic consulting
device proceeds to process 312.
[0126]
In process 312, the acoustic consulting device determines whether the difference between the
error in the immediately preceding simulation run and the error in the current simulation run is
less than or equal to a predetermined threshold (ie, whether the convergence condition is
satisfied). Do. If a negative result is obtained in this process 312, the acoustic consulting device
proceeds to process 313.
[0127]
In processing 313, the acoustic consulting device changes the virtual information of the
conference microphone and the speaker so that the evaluation function C defined in, for example,
Equation 13 transitions in the direction of the minimum gradient.
[0128]
[0129]
Here, a, b and c respectively represent the weight of the performance evaluation scale.
In the case of this specification, the change value of the cost function C when the position of the
microphone and the position of the speaker are respectively shifted by a minute direction is
taken as ΔC.
15-04-2019
31
Further, the minute directions in the case of moving in the direction of the minimum gradient are
taken as ΔM and ΔS, respectively.
[0130]
[Delta] M is given by a three-dimensional vector, and represents the amount of change of each of
coordinate values x, y, z specifying the position of the conference microphone. Similarly, ΔS is
given by a three-dimensional vector, and represents the amount of change of each of coordinate
values x, y, z specifying the position of the conference speaker.
[0131]
Also, the matrix [MS] new indicates the arrangement of the virtual conference microphone and
the speaker after moving in the direction of the minimum gradient as shown in the following
equation, and the matrix [MS] old indicates the arrangement of the microphone in the minimum
gradient direction The arrangement of the virtual conference microphones and the arrangement
of the speakers before moving are shown.
[0132]
[0133]
After automatically changing the virtual information in this manner, the acoustic consulting
device returns to performing the acoustic simulation of process 308.
[0134]
When a positive result is obtained in the process 312 (when the error is smaller than a
predetermined threshold and the convergence condition is satisfied), the acoustic consulting
device proceeds to a process 310.
That is, the acoustic consulting device determines whether the simulation result satisfies the
desired performance.
15-04-2019
32
This determination process itself is the same as in the case of FIG.
If the simulation result satisfies the desired performance, the acoustic consulting device ends the
process at that point.
[0135]
On the other hand, if a negative result is obtained, the acoustic consulting device increases the
number of conference microphones by one in process 314, and then returns to the execution of
the acoustic simulation of process 308.
[0136]
As described above, in the case of the present embodiment, the audio consulting apparatus
automatically sets the position of the conference microphone and the position of the conference
microphone (the number of conference microphones as needed) which are optimal for the video
conference. Can.
Of course, the determination result is displayed on the display 210 by characters or figures.
Therefore, even when the user does not have CAD data of the teleconference system installation
environment 101, the user can automatically obtain information on the optimal number and
position of the conference microphones and the speakers.
[0137]
In the above-described embodiment, when the condition satisfying the desired performance is
found, the information is output, and the virtual information change processing and the
simulation execution and evaluation based on the information after the change are stopped.
Repeat the simulation execution and evaluation based on the change of virtual information and
information after change within the variable range preset by the user, and display the space
arrangement and other conditions satisfying the desired performance within the variable range.
It may be displayed above. In this case, it is possible to selectively introduce an arrangement
reflecting the user's request even within the range satisfying the desired performance, and the
15-04-2019
33
usability can be improved.
[0138]
(Other Embodiments) The present invention is not limited to the above-described embodiment,
but includes various modifications. For example, the embodiment described above is described in
detail in order to explain the present invention in an easy-to-understand manner, and is not
necessarily limited to one having all the described configurations. In addition, it is possible to
replace part of one embodiment with the configuration of another embodiment, and it is also
possible to add the configuration of another embodiment to the configuration of one
embodiment. Moreover, it is also possible to add, delete, or replace other configurations for part
of the configurations of each embodiment.
[0139]
In addition, each configuration, function, processing unit, processing means, and the like
described above may realize part or all of them as an integrated circuit or other hardware, for
example. Further, each configuration, function, and the like described above may be realized by
the processor interpreting and executing a program that realizes each function. That is, it may be
realized as software. Information such as a program, a table, and a file for realizing each function
can be stored in a memory, a hard disk, a storage device such as a solid state drive (SSD), or a
storage medium such as an IC card, an SD card, or a DVD.
[0140]
Further, the control lines and the information lines indicate what is considered to be necessary
for the description, and do not represent all the control lines and the information lines necessary
for the product. In practice, it can be considered that almost all configurations are mutually
connected.
[0141]
101: video conference system installation environment 102: desk 103: conference speaker 104:
conference microphone 105-1: assumed speaker position 105-2: assumed speaker position 106:
measurement microphone array 107 ... speaker array for measurement, 202 ... multi-channel AD
15-04-2019
34
converter, 203 ... central processing unit, 204 ... non-volatile memory, 205 ... volatile memory,
206 ... multi-channel DA converter, 208 ... mouse, 209 ... keyboard, 210 ... Display 701 Acoustic
characteristic measurement unit 702 Impulse response measurement unit 703 Ambient noise
measurement unit 802 Sound source separation unit 803-1, 803-2, 803-N Sound source
localization unit 901 Acoustic simulation unit 902: direct sound / reverberation sound division
unit 903: impulse response estimation unit of assumed speaker position 90 ... Residual echo
estimation unit 905 ... Noise simulation unit of assumed microphone position 1301 ...
Performance evaluation unit 1302 ... Direct sound / reverberation sound ratio evaluation unit
1303 ... Speaker speech / residual echo ratio evaluation unit 1304 ... Speaker speech / Noise
ratio evaluation unit, 1305 ... desired performance comparison unit
15-04-2019
35
Документ
Категория
Без категории
Просмотров
0
Размер файла
50 Кб
Теги
description, jp2012242597
1/--страниц
Пожаловаться на содержимое документа