close

Вход

Забыли?

вход по аккаунту

?

JP2012093705

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012093705
An audio output device capable of appropriately suppressing a cocktail party effect is provided. A
microphone array 1 picks up the voice of a speaker H 1 and outputs it to a voice processing
device 3. The voice processing device 3 detects the position of the speaker H1 based on the voice
of the speaker H1 collected by each microphone of the microphone array 1. Further, the voice
processing device 3 generates a masker sound for masking the voice of the speaker H1 based on
the voice of the speaker H1 collected by each microphone of the microphone array 1, and
outputs the masker sound to the speaker array 2. . At this time, the voice processing device 3
controls the delay amount of the voice signal supplied to each speaker of the speaker array 2 so
that the position of the sound source (virtual sound source position) perceived by the third party
H3 is the position of the speaker H1. Set to [Selected figure] Figure 1
Voice output device
[0001]
The present invention relates to an audio output device that outputs a masker sound.
[0002]
Conventionally, a speaker is attached to a partition in an office or the like, and a voice having a
low relevance to the voice of the speaker is output as a mascara, thereby making it difficult for a
person who is present in another space to hear the speaker's voice Has been proposed (see, for
example, Patent Document 1).
09-05-2019
1
This makes it difficult to understand the speaker's remarks, so the privacy of the speaker can be
maintained.
[0003]
Japanese Patent Application Publication No. 06-175666
[0004]
However, in the method of Patent Document 1, since the mascara sound and the speaker's voice
are heard from different positions, the listener may hear the speaker's voice and understand the
contents of the utterance by the so-called cocktail party effect. There is.
[0005]
Then, an object of this invention is to provide the audio | voice output apparatus which can
suppress a cocktail party effect appropriately.
[0006]
According to the speech output apparatus of the present invention, a speaker position detection
unit for detecting the position of the speaker, a masker sound generation unit for generating a
masker sound, a plurality of speakers for outputting the masker sound, and a speaker position
detection unit A localization control unit configured to control the localization position so that
the speaker who made it becomes the virtual sound source position of the masker sound, and to
supply an audio signal relating to the masker sound to at least one of the plurality of speakers; .
[0007]
Specifically, the localization control unit sets the localization position of the masker sound so that
the masker sound comes from the same direction as the speaker when viewed from the third
party.
More preferably, the localization control unit sets the position of the speaker detected by the
speaker position detection unit and the localization position of the masker sound to the same
position.
09-05-2019
2
As a result, the masking sound and the speaker's voice can not be heard from different positions,
and the cocktail party effect can be appropriately suppressed.
[0008]
The method of detecting the speaker position may be any method. For example, a microphone
array in which a plurality of microphones for picking up voices are arranged is provided, and a
phase difference of voices collected by each microphone is detected. If so, it is conceivable to
detect the position of the speaker with high accuracy.
[0009]
In this case, the localization control unit preferably controls the localization position of the
masker sound in consideration of the positional relationship between the speaker array and the
microphone array.
The positional relationship may be a manual input by the user, or may be obtained, for example,
by collecting the sound output from each speaker with a microphone and measuring the arrival
time.
[0010]
In the case where the speaker array and the microphone array are integrated, the positional
relationship between the speaker array and the microphone array is fixed. Therefore, if the
positional relationship measured in advance is stored, the positional relationship is input each
time. There is no need to measure or measure.
[0011]
Further, it is desirable that the masker sound generation unit sets the level of the masker sound
high when the position of the speaker detected by the speaker position detection unit changes.
When the speaker position changes, it is conceivable that the speaker position and the
localization position of the masker sound are instantaneously different from each other.
09-05-2019
3
In this case, since a cocktail party effect may occur and the mask effect may decrease, the volume
of the masker sound is temporarily increased to prevent the decrease of the mask effect.
[0012]
The position of the microphone with the largest collected sound is set to the speaker position by
the speaker position detection means, and the localization control unit supplies the voice signal
related to the masker sound to the speaker closest to the microphone with the largest collected
sound. You may do it.
[0013]
Further, according to the voice output device of the present invention, a plurality of microphones
for picking up the voice, a masker sound generation unit for generating the masker sound, and a
plurality of speakers to which the voice signal related to the masker sound is supplied and which
emits the masker sound And a localization control unit configured to control the gain of the
audio signal related to the masker sound supplied to the plurality of speakers.
The localization control unit multiplies the levels of the collected sound signals of the plurality of
microphones by a gain setting coefficient that decreases as the distance between the plurality of
microphones and the plurality of speakers increases, thereby supplying maskers supplied to the
plurality of speakers. Adjust the gain of the audio signal related to the sound.
[0014]
With such a configuration, even if the speaker position is not detected, the masking sound can be
heard from the direction of the speaker position only with the positional relationship between
the plurality of microphones and the plurality of speakers and the level of the sound collection
signal of each microphone. You can emit a mascara sound.
[0015]
According to the present invention, since the masca and the speaker's voice are heard from the
same direction, it is possible to appropriately suppress the cocktail party effect.
09-05-2019
4
[0016]
It is a block diagram showing composition of a masking system.
It is a block diagram showing composition of a microphone array, a speaker array, and an audio
processing unit.
It is a figure which shows the speaker position detection method by a microphone array.
It is a figure which shows the virtual sound source localization method by a speaker array. It is a
figure which shows the positional relationship of a speaker array and a microphone array. It is a
flowchart which shows operation | movement of a speech processing apparatus. It is a figure
which shows the structure of the masking system which consists of another embodiment. It is a
block diagram which shows the structure of the microphone array of the masking system shown
in FIG. 7, a speaker array, and an audio processing apparatus. It is a flowchart which shows
operation | movement of the audio processing apparatus in the masking system shown in FIG. It
is a figure which shows the structure of the masking system which consists of another
embodiment. It is a block diagram which shows the structure of the microphone array of the
masking system shown in FIG. 10, a speaker array, and an audio processing apparatus.
[0017]
FIG. 1 is a block diagram showing the configuration of a masking system provided with the audio
output device of the present invention. A masking system is installed at a dialogue counter such
as a bank or a dispensing pharmacy, for example, and emits a mascara sound to the third party to
make it impossible for the third party to understand the contents of the person speaking on the
other side of the counter. It is
[0018]
In FIG. 1, a speaker H1 and a listener H2 exist across the counter, and a plurality of third parties
H3 exist at positions away from the counter. However, since H1 and H2 talk, H1 may be a
listener and H2 may be a speaker. The speaker H1 is, for example, a pharmacist who explains the
09-05-2019
5
medicine, the listener H2 is a patient who listens to the explanation of the medicine, and the third
person H3 is a patient who is waiting for the turn.
[0019]
The microphone array 1 is installed on the upper surface of the counter. A plurality of
microphones are arranged in the microphone array 1, and each microphone picks up sound
around the counter. In the direction in which the third party of the counter is present (downward
in the drawing), a speaker array 2 for outputting sound toward the third party is installed. The
speaker array 2 is installed under the desk or the like so that the listener H 2 can not easily hear
the sound output from the speaker array.
[0020]
The microphone array 1 and the speaker array 2 are connected to the audio processing device 3.
The microphone array 1 picks up the voice of the speaker H 1 with each of the arranged
microphones, and outputs the voice to the voice processing device 3. The voice processing device
3 detects the position of the speaker H1 based on the voice of the speaker H1 collected by each
microphone of the microphone array 1. Further, the voice processing device 3 generates a
masker sound for masking the voice of the speaker H1 based on the voice of the speaker H1
collected by each microphone of the microphone array 1, and outputs the masker sound to the
speaker array 2. . At this time, the voice processing device 3 controls the delay amount of the
voice signal supplied to each speaker of the speaker array 2 so that the position of the sound
source (virtual sound source position) perceived by the third party H3 is the position of the
speaker H1. Set to As a result, the third party H3 can hear the voice of the speaker H1 and the
mascara sound from the same position, and the cocktail party effect is appropriately suppressed.
[0021]
Hereinafter, specific configurations and operations for realizing the above-described masking
system will be described. FIG. 2 is a block diagram showing the configuration of the microphone
array 1, the speaker array 2, and the audio processing device 3. The microphone array 1 includes
seven microphones 11 to 17. The audio processing device 3 includes an A / D converter 51 to an
A / D converter 57, a sound pickup signal processor 71, a controller 72, a masker sound
generator 73, a delay processor 8, and a D / A converter 61 to a D / A converter. It has 68. The
09-05-2019
6
speaker array 2 includes eight speakers 21 to 28. The number of microphones in the microphone
array and the number of speakers in the speaker array are not limited to this example.
[0022]
The A / D converter 51 to the A / D converter 57 respectively input the sound collected by the
microphones 11 to 17 and convert the sound into a digital sound signal. The digital audio signals
converted by the A / D converter 51 to the A / D converter 57 are input to the sound pickup
signal processing unit 71.
[0023]
The sound collection signal processing unit 71 detects the position of the speaker by detecting
the phase difference of each digital sound signal. FIG. 3 is a diagram showing an example of a
speaker position detection method. As shown in the figure, when the speaker H1 emits a voice,
the voice first reaches the microphone closest to the speaker H1 (the microphone 17 in the
figure), and the voice reaches the microphone 11 sequentially from the microphone 16 as time
passes. Do. The sound collection signal processing unit 71 obtains the correlation between the
sounds collected by the respective microphones, and obtains a difference in timing (phase
difference) at which the sound from the same sound source arrives. Then, assuming that a
microphone is present at a virtual position (a position of a circle indicated by a dotted line in the
drawing) in which the phase difference is taken into consideration, the sound pickup signal
processing unit 71 is equidistant from the positions of these virtual microphones. The speaker
position is detected on the assumption that the sound source (speaker H1) is present at the
position. Information on the detected sound source position is output to the control unit 72. The
information of the sound source position is, for example, information indicating a distance and a
direction from the center position of the microphone array 1 (a shift angle when the front
direction is 0 degree).
[0024]
In addition, the sound collection signal processing unit 71 outputs, to the masker sound
generation unit 73, a digital sound signal related to the speaker sound collected from the
detected speaker position. The sound collection signal processing unit 71 may be configured to
output the sound collected by any one microphone of the microphone array 1, but the digital
09-05-2019
7
sound signal collected by each microphone in consideration of the phase difference described
above By delaying and aligning the phases and combining them, a characteristic having strong
sensitivity (directivity) may be realized at the position of the sound source, and a digital audio
signal after this combining may be output. As a result, the speaker voice is mainly picked up with
a high SN ratio, and it becomes difficult to pick up unnecessary noise and wraparound of the
masker sound output from the speaker array.
[0025]
Next, the masker sound generation unit 73 generates a masker sound for masking the speaker
voice based on the speaker voice input from the sound collection signal processing unit 71. The
musk sound may be any sound, but it is preferable to suppress the listener's discomfort. For
example, the speech of the speaker H1 is held for a predetermined time, and is modified on the
time axis or on the frequency axis so that it does not make any lexical meaning (the conversation
content can not be understood). Alternatively, general-purpose uttering voices that do not make
any sense lexically are stored in the built-in storage unit (not shown) in the form of voices of a
plurality of people including men and women, The frequency characteristics may be
approximated to the speech of the speaker H1. In addition, environmental sounds (such as
screeching sounds of a river) and stage sounds (such as a cry of a bird) may be added to the
mask sound. The generated masker sound is output to each of the delay 81 to the delay 88 of the
delay processing unit 8.
[0026]
The delay 81 to the delay 88 of the delay processing unit 8 are provided corresponding to the
speaker 21 to the speaker 28 of the speaker array 2, respectively, and individually change the
delay amount of the audio signal supplied to each speaker. The control unit 72 controls the delay
amounts of the delay 81 to the delay 88.
[0027]
The control unit 72 can set the virtual sound source at a predetermined position by controlling
the delay amounts of the delay 81 to the delay 88. FIG. 4 is a diagram showing a virtual sound
source localization method using a speaker array.
09-05-2019
8
[0028]
As shown in the figure, the control unit 72 sets the virtual sound source V1 at the position of the
speaker H1 input from the sound collection signal processing unit 71. Although the distances
from the virtual sound source V1 to the speakers of the speaker array 2 are different from each
other, the masker sound is output sequentially from the speaker closest to the virtual sound
source V1 (speaker 21 in the figure). By outputting the voice up to the third party (listener) H3, a
speaker is present at a position equidistant from the virtual sound source position to be focused
(the position of the speaker shown by the dotted line in the figure). It can be perceived that the
masker sound is emitted simultaneously from the position of the speaker. Therefore, the third
party H3 virtually perceives that a masker sound is emitted from the position of the speaker H1.
As shown in the figure, the position of the speaker H1 and the position of the virtual sound
source V1 do not have to be completely the same. For example, only the arrival directions of
sounds may be the same.
[0029]
The control unit 72 may set the delay amount of the audio signal supplied to each speaker on the
assumption that the microphone array and the speaker array are installed at the same position. It
is desirable to set the delay amount in consideration of the positional relationship. For example,
when the microphone array and the speaker array are arranged in parallel, the control unit 27
inputs the distance between the center positions of the microphone array and the speaker array,
corrects the positional deviation of the speakers of each speaker array, and delays Calculate the
quantity.
[0030]
The positional relationship between the microphone array and the speaker array may be such
that an operation unit (not shown) for the user to operate is provided to receive manual input
from the user. For example, from the speakers of the speaker array 2 It is also possible to detect
the positional relationship between the microphone array and the speaker array by outputting
sound and collecting the sound with each microphone of the microphone array 1 and measuring
the arrival time. In this case, for example, as shown in FIG. 5, measurement voice (impulse sound
etc.) is output from the end speaker 21 and the speaker 28 of the speaker array 2, and the
measurement is made to the end microphone 11 and the microphone 17 of the microphone
09-05-2019
9
array 1. It is assumed that the timing at which the sound is collected is measured. In this case, the
distance between the ends of the microphone array 1 and the speaker array 2 can be measured,
and the installation angle of the microphone array 1 and the speaker array 2 can be detected.
[0031]
In the case where the speaker array and the microphone array are integrated, the positional
relationship between the speaker array and the microphone array is fixed. Therefore, if the
positional relationship is stored in advance, the positional relationship can be input or measured
each time. There is no need to
[0032]
Next, FIG. 6 is a flowchart showing the operation of the speech processing device 3.
The voice processing device 3 starts this operation at the time of the first activation (power-on).
First, the voice processing device 3 measures (calibrates) the positional relationship between the
microphone array and the speaker array described above (s11). This process is unnecessary
when the microphone array and the speaker array are integrated in one body.
[0033]
Thereafter, the voice processing device 3 stands by until the speaker voice is collected (s12). For
example, it is determined that the speaker's voice is picked up when the voice of a predetermined
level or more that can be determined to be a sound is picked up. When the speaker's voice is not
collected and conversation is not performed, since the masker sound is not necessary, the
generation of the masker sound and the localization process are on standby. However, this
processing may be omitted, and masker sound generation and localization processing may be
performed at all times.
[0034]
When the speaker's voice is picked up, the voice processing device 3 detects the speaker position
by the collected signal processing unit 71 (s13). The speaker position is determined by detecting
09-05-2019
10
the phase difference of the voices collected by the microphones of the microphone array as
described above.
[0035]
Then, the voice processing device 3 causes the masker sound generation unit 73 to generate a
masker sound (s14). At this time, a voice signal (having directivity directed to the speaker
position) input from the collected signal processing unit 71 to the masker sound generation unit
73 with the phases of the microphones aligned and input is input according to the speaker's
voice. It is desirable to generate a masque tone.
[0036]
In addition, it is preferable that the masking sound be in a mode in which the volume changes in
accordance with the level of the collected speaker voice. When the level of the collected speaker
voice is low, the speaker voice reaches the third party H3 at a low level, and it is difficult to grasp
the contents of the conversation, so the level of the masker sound can be lowered. On the other
hand, when the level of the collected speaker voice is high, the speaker voice reaches the third
party H3 at a high level and it is easy to grasp the contents of the conversation, so it is preferable
to increase the masker sound level.
[0037]
Finally, in the voice processing device 3, the control unit 72 sets the delay amount so that the
masker sound is localized at the speaker position (s15).
[0038]
Preferably, when the speaker position detected by the sound collection signal processing unit 71
changes, the masker sound generation unit 73 performs processing to increase the level of the
masker sound.
In this case, when it is determined that the speaker position has changed, the sound pickup signal
processing unit 71 outputs a trigger signal to the masker sound generation unit 73, and the
09-05-2019
11
masker sound generation unit 73 temporarily outputs the trigger signal. Set the masker sound
level high.
[0039]
When the speaker position changes, it is conceivable that the speaker position and the virtual
sound source position of the masker sound are instantaneously different positions until the
calculation of the delay amount by the control unit 72 is completed. In this case, since a cocktail
party effect may occur and the mask effect may decrease, the volume of the masker sound is
temporarily increased to prevent the decrease of the mask effect.
[0040]
As described above, the speech processing device 3 causes the third party H3 to hear the voice of
the speaker H1 and the masker sound from the same position by causing the virtual sound
source position of the masker sound to be localized at the detected speaker position. As a result,
the cocktail party effect can be properly suppressed.
[0041]
In the present embodiment, an example in which the speaker position is detected by detecting
the phase difference of each microphone of the microphone array has been described, but the
speaker position detection method is not limited to this example.
For example, the speaker may own a remote control with a GPS function and transmit position
information to the voice processing apparatus. Alternatively, the remote control may be provided
with a microphone, and measurement voices may be output from a plurality of speakers in the
speaker array. It is also possible for the speech processing device to detect the speaker position
by measuring the arrival time.
[0042]
By the way, in the above description, an example using a speaker array in which a plurality of
speakers are arrayed and a microphone array in which a plurality of microphones are arrayed is
shown, but individual speakers and microphones are respectively disposed at predetermined
positions. , And may generate masque.
09-05-2019
12
[0043]
FIG. 7 is a diagram showing the configuration of a masking system according to another
embodiment.
FIG. 8 is a block diagram showing the configuration of the microphone array, the speaker array,
and the audio processing device of the masking system shown in FIG.
[0044]
As shown in FIG. 7, in the masking system of this aspect, microphones 1A, 1B, 1C, each of which
is an independent individual, are disposed in the area where the speakers H1A, H1B, H1C are
present. The microphone 1A is disposed near the speaker H1A, the microphone 1B is disposed
near the speaker H1B, and the microphone 1C is disposed near the speaker H1C.
[0045]
The speaker 2A is disposed in the vicinity of the microphone 1A, the speaker 2B is disposed in
the vicinity of the microphone 1B, and the speaker 2C is disposed in the vicinity of the
microphone 1C. These speakers 2A, 2B, 2C are installed to emit sound toward the area where the
third party H3 is present.
[0046]
The collected sound signals of the microphones 1A, 1B, and 1C are analog-to-digital converted by
the A / D converter 51 to the A / D converter 53 as in the above embodiment, and are input to
the collected sound signal processor 71A. The sound collection signal processing unit 71A
detects a microphone close to a speaker who is producing a sound from the volume level of each
sound collection signal, and outputs detection information to the control unit 72A.
09-05-2019
13
[0047]
Further, the collected sound signal is supplied to the masker sound generation unit 73A, and the
masker sound generation unit 73A generates a masker sound as described in the abovementioned embodiment using the collected sound collection signal and performs audio signal
processing. Output to the units 801, 802, and 803.
[0048]
The control unit 72A stores the correspondence between microphones and speakers that are
close to each other.
The control unit 72A controls the audio signal processing units 801, 802, and 803 so as to select
a speaker corresponding to the microphone detected by the sound collection signal processing
unit 71A and emit sound only from the speaker. Specifically, when the speaker H1A generates a
sound and the microphone 1A is detected, the control unit 72A controls only the audio signal
processing unit 801 so that the masker sound is emitted only from the speaker 2A in proximity
to the microphone. Output a masker sound from. If the speaker H1B produces a sound and the
microphone 1B is detected, the control unit 72B causes only the voice signal processing unit 802
to output a masking sound so that the masking sound is emitted only from the speaker 2B close
to the microphone . If the speaker H1C produces a sound and the microphone 1C is detected, the
control unit 72B causes only the audio signal processing unit 803 to output the masker sound so
that the speaker 2C is released only from the speaker 2C. .
[0049]
FIG. 9 is a flowchart showing the operation of the speech processing apparatus in the masking
system shown in FIG. The voice processing device 3A stands by until the speaker voice is
collected (s101: No). The method of detecting the collected voice is the same as that of the
flowchart shown in FIG. When the speaker's voice is detected (s101: Yes), the voice processing
device 3A analyzes the collected sound signals of the microphones 1A, 1B, and 1C, and specifies
the microphones that collected the speaker's voice (s102) .
[0050]
09-05-2019
14
Next, the voice processing device 3A detects a speaker corresponding to the specified
microphone (s103). Then, the voice processing device 3A emits the masker sound only from the
detected speaker (s104).
[0051]
Even with such configuration and processing, a mascara sound is emitted from the vicinity of the
extreme of the speaker position where the sound is produced, and the cocktail party effect can be
appropriately suppressed.
[0052]
Alternatively, a masking system having the following configuration may be used.
FIG. 10 is a diagram showing a configuration of a masking system according to an embodiment
different from the above-described masking systems. FIG. 11 is a block diagram showing the
configuration of the microphone array, the speaker array, and the audio processing device of the
masking system shown in FIG.
[0053]
In the masking system shown in FIG. 11, a table on which the microphones 1A, 1B, 1C, 1D, 1E, 1F
are placed is arranged in the area where the speakers H1A, H1B, H1C are present.
[0054]
The microphones 1A, 1B, 1C and the microphones 1D, 1E, 1F are arranged such that the opposite
directions are the sound collecting directions.
Specifically, in the example of FIG. 11, the microphones 1A, 1B, 1C pick up the side where the
speakers H1A, H1B are present, and the microphones 1D, 1E, 1F collect the side where the
speaker H1C is present Sound.
[0055]
09-05-2019
15
The speakers 2A, 2B, 2C, 2D are arranged between the area where the speakers H1A, H1B, H1C
are present and the area where the third party H3 is present, and the arrangement interval and
the positional relationship are constant. It does not have to be.
[0056]
The collected sound signals of the respective microphones 1A, 1B, 1C, 1D, 1E and 1F are analogto-digital converted by the A / D converter 51 to the A / D converter 56 in the same manner as
the above embodiment. Input to
The sound collection signal processing unit 71B detects a microphone close to the speaker who
is producing a sound from the volume level of each sound collection signal, and outputs the
detection information to the control unit 72B.
[0057]
Further, the collected sound signal is supplied to the masker sound generation unit 73B, and the
masker sound generation unit 73B generates a masker sound using the collected sound signal as
described in the above embodiment, and the sound signal processing unit 801 is generated.
Output to -804.
[0058]
The control unit 72B stores the positional relationship between the microphones 1A, 1B, 1C, 1D,
1E, 1F and the speakers 2A, 2B, 2C, 2D.
This positional relationship can be realized by a process called calibration in the above-described
embodiment.
[0059]
The control unit 72B controls the audio signal processing units 801 to 804 so as to select a
speaker closest to the microphone detected by the sound collection signal processing unit 71B
and emit sound only from the speaker.
09-05-2019
16
[0060]
Even if such a configuration and processing are performed, it is possible for the third party H3 to
be able to hear the masca sound from the direction of the speaker, and the cocktail party effect
can be appropriately suppressed.
[0061]
Note that the control unit 72B sets the distance between each speaker 2A, 2B, 2C, 2D and each
microphone 1A, 1B, 1C, 1D, 1E, 1F to the sound emission level from each speaker 2A, 2B, 2C, 2D.
It may be determined by using, and control may be performed to adjust the gain of the audio
signal processing units 801 to 804.
[0062]
In this case, the sound collection signal processing unit 71B detects the level of the sound
collection signal of each of the microphones 1A, 1B, 1C, 1D, 1E, and 1F, and outputs the level to
the control unit 72B.
[0063]
The control unit 72B measures in advance the distance between each of the microphones 1A, 1B,
1C, 1D, 1E, 1F and each of the speakers 2A, 2B, 2C, 2D.
This can be realized by the above-described calibration process.
[0064]
Next, the control unit 72B calculates a coefficient consisting of the reciprocal of the distance for
each combination of each microphone 1A, 1B, 1C, 1D, 1E, 1F and each speaker 2A, 2B, 2C, 2D, It
is stored for each pair with the speaker.
For example, the combination of the speaker 2A and the microphone 1A is stored as the
coefficient A11, and the combination of the speaker 2D and the microphone 1E is stored as the
coefficient A45.
09-05-2019
17
Thereby, a 5 × 4 coefficient matrix A shown below is set.
The coefficient may be calculated from the reciprocal of the square of the distance or the like,
and may be set so that the coefficient value decreases as the distance increases.
[0065]
[0066]
Then, the control unit 72B sets the sound collection signal levels of the microphones 1A, 1B, 1C,
1D, 1E, and 1F as sound collection signal level sequences of Ss = (Ss1, Ss2, Ss3, Ss4, Ss5) <T>. get.
Here, Ss1 is a sound pickup signal level of the microphone 1A, Ss2 is a sound pickup signal level
of the microphone 1B, Ss3 is a sound pickup signal level of the microphone 1C, and Ss4 is a
sound pickup signal level of the microphone 1D. , Ss5 is the sound pickup signal level of the
microphone 1E.
[0067]
The control unit 72B calculates a gain sequence G = (Ga, Gb, Gc, Gd) by multiplying the sound
collection signal level sequence Ss by the coefficient matrix A as in the following equation. Here,
Ga is a gain for the speaker 2A, Gb is a gain for the speaker 2B, Gc is a gain for the speaker 2C,
and Gd is a gain for the speaker 2D.
[0068]
[0069]
09-05-2019
18
By performing such processing, the masker sound emitted from each of the speakers 2A, 2B, 2C,
2D sounds to the third party H3 as if it came from the direction of the speaker position.
This makes it possible to appropriately suppress the cocktail party effect.
[0070]
Note that each of the above-described audio processing devices can be realized using hardware
and software of an information processing device such as a general personal computer, not a
device dedicated to the masking system shown in the present embodiment.
[0071]
H1 ... speaker H2 ... listener H3 ... third party 1 ... microphone array, 1A, 1B, 1C, 1D, 1E, 1F ...
microphone 2 ... speaker array, 2A, 2B, 2C, 2D ... speaker 3, 3A, 3B ... voice processing unit
09-05-2019
19
Документ
Категория
Без категории
Просмотров
0
Размер файла
30 Кб
Теги
jp2012093705
1/--страниц
Пожаловаться на содержимое документа