close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2012093594

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2012093594
An audio output device capable of appropriately suppressing a cocktail party effect is provided. A
microphone picks up a voice of a speaker and outputs the voice to a voice processing device. The
voice processing device 3 generates a masker sound for masking the voice of the speaker H 1
based on the voice of the speaker H 1 collected by the microphones of the microphone 1 and
outputs the masker sound to the speaker array 2. At this time, the sound processing device 3
dynamically changes the delay amount of the sound signal supplied to each speaker of the
speaker array 2 to dynamically change the position (virtual sound source position) of the sound
source perceived by the third party H3. Change to [Selected figure] Figure 4
Voice output device
[0001]
The present invention relates to an audio output device that outputs a masker sound.
[0002]
Conventionally, a speaker is attached to a partition in an office or the like, and a voice having a
low relevance to the voice of the speaker is output as a mascara, thereby making it difficult for a
person who is present in another space to hear the speaker's voice Has been proposed (see, for
example, Patent Document 1).
This makes it difficult to understand the speaker's remarks, so the privacy of the speaker can be
15-04-2019
1
maintained.
[0003]
Japanese Patent Application Publication No. 06-175666
[0004]
However, in the method of Patent Document 1, since the output position of the masker sound is
fixed, the listener gets used to the masker sound and listens to the speaker's voice and
understands the content of the talk by so-called cocktail party effect. There is a fear.
[0005]
Then, an object of this invention is to provide the audio | voice output apparatus which can
suppress a cocktail party effect appropriately.
[0006]
An audio output apparatus according to the present invention controls a masker sound
generation unit for generating a masker sound, a plurality of speakers for outputting a masker
sound, a localization position of the masker sound, and a plurality of speaker-related sound
signals. And a localization control unit for supplying.
Then, the localization control unit is characterized by dynamically changing the localization
position of the masker sound.
Specifically, the localization control unit randomly changes the localization position within a
predetermined range centered on the predetermined position.
The localization position can be changed by changing the delay amount of the audio signal
supplied to the plurality of speakers.
[0007]
15-04-2019
2
Further, the localization control unit sets the localization position with the highest probability to
the localization position centering on the predetermined position, and sets the localization
position with a low probability as it goes away from the center position. It is also possible to
change the For example, the localization position is dynamically changed with the probability
according to the Gaussian distribution. As for the localization position, the closer to the actual
speaker position, the sound source position of the speaker does not separate from the sound
source position of the masker sound, and the masking effect is enhanced. However, if a third
party always hears the musker sound from the same direction, it will become accustomed, and
the cocktail party effect will separate the speaker's voice and understand the content of the
statement. Therefore, in order to suppress the cocktail party effect while keeping the masking
effect high, the sound source position is dynamically changed, and the appearance probability of
the localization position is high near the speaker position, and the appearance probability as it
goes away Is preferably set to be low.
[0008]
Further, although any sound may be used as the masker sound, it is desirable to have a
microphone for picking up the speaker's voice and to generate the masker sound based on the
sound collected by the microphone. For example, the voice of the speaker is held for a
predetermined time, and is modified on the time axis or on the frequency axis so that it does not
make any lexical meaning (the conversation content can not be understood). Alternatively, output
general-purpose uttered voices that do not make any sense lexically in the voice of multiple
people including men and women, or approximate the frequency characteristics such as formants
of this general-purpose voice to the voice of the speaker It may be done.
[0009]
In this case, the voice output device includes an echo canceller which cancels a pseudo echo
signal in which the voice related to the masker sound simulates an echo component from the
speaker to the microphone from the voice collected by the microphone and supplies it to the
masker sound generator. Is desirable. As a result, it is possible to remove the masker sound
output from the speaker and looped into the microphone, and to generate the masker sound
based only on the voice of the speaker.
[0010]
15-04-2019
3
According to the present invention, since the output position of the masker sound dynamically
changes, it is possible to appropriately suppress the cocktail party effect.
[0011]
It is a layout which shows the structure of a masking system.
It is a block diagram showing composition of a microphone, a speaker array, and an audio
processing device. It is a figure which shows the virtual sound source localization method by a
speaker array. It is a figure explaining the dynamic change of a virtual sound source position. It is
a flowchart which shows operation | movement of a speech processing apparatus. It is a block
diagram which shows the structure of the speech processing apparatus at the time of providing
an echo canceller.
[0012]
FIG. 1 is a layout view showing the configuration of a masking system provided with the audio
output device of the present invention. A masking system is installed at a dialogue counter such
as a bank or a dispensing pharmacy, for example, and emits a mascara sound to the third party to
make it impossible for the third party to understand the contents of the person speaking on the
other side of the counter. It is
[0013]
In FIG. 1, a speaker H1 and a listener H2 exist across the counter, and a plurality of third parties
H3 exist at positions away from the counter. The speaker H1 is, for example, a pharmacist who
explains the medicine, the listener H2 is a patient who listens to the explanation of the medicine,
and the third person H3 is a patient who is waiting for the turn.
[0014]
A microphone 1 is installed on the upper surface of the counter. The microphone 1 mainly picks
15-04-2019
4
up the voice of the speaker H1 as the voice around the counter. In the direction in which the
third party of the counter is present (downward in the drawing), a speaker array 2 for outputting
sound toward the third party is installed. The speaker array 2 is installed under the desk or the
like so that the listener H 2 can not easily hear the sound output from the speaker array.
[0015]
The microphone 1 and the speaker array 2 are connected to the audio processing device 3. The
microphone 1 picks up the voice of the speaker H 1 and outputs it to the voice processing device
3. The voice processing device 3 generates a masker sound for masking the voice of the speaker
H 1 based on the voice of the speaker H 1 collected by the microphone 1 and outputs the masker
sound to the speaker array 2. At this time, the audio processing device 3 dynamically controls the
sound source position (virtual sound source position) of the masker sound perceived by the third
party H3 by controlling the delay amount of the audio signal supplied to each speaker of the
speaker array 2. Change. As a result, the sound source position of the masca sound sounds like
moving constantly to the third party H3, and it is possible to appropriately suppress the cocktail
party effect due to the familiarization of the ear.
[0016]
Hereinafter, specific configurations and operations for realizing the above-described masking
system will be described. FIG. 2 is a block diagram showing the configuration of the microphone
1, the speaker array 2, and the audio processing device 3. The audio processing device 3 includes
an A / D converter 51, a control unit 72, a masker sound generation unit 73, a delay processing
unit 8, and D / A converters 61 to 68. The speaker array 2 includes eight speakers 21 to 28. The
number of speakers in the speaker array is not limited to this example.
[0017]
The A / D converter 51 inputs the sound collected by the microphone 1 and converts it into a
digital sound signal. Each digital audio signal converted by the A / D converter 51 is input to the
masker sound generation unit 73.
[0018]
15-04-2019
5
The masker sound generation unit 73 generates a masker sound for masking the speaker voice
based on the speaker voice related to the input digital voice signal. The masca sound may be any
sound, but it is preferable that the masca sound should suppress the discomfort of a plurality of
third parties H3 present at positions away from the counter. For example, the speech of the
speaker H1 is held for a predetermined time, and is modified on the time axis or on the
frequency axis so that it does not make any lexical meaning (the conversation content can not be
understood). Alternatively, whether a general-purpose uttering voice that does not make any
lexical meaning is stored in the built-in storage unit (not shown) in a plurality of voices including
men and women, and this general-purpose voice is output The frequency characteristics of a
general-purpose voice such as a formant may be approximated to the voice of the speaker H1. In
addition, background sound like air conditioning sound may be mixed with the masker sound. By
listening to such a mascara sound simultaneously with the voice of the speaker H1, the third
party H3 can not easily understand the content of the speech of the speaker H1. The generated
masker sound is output to each of the delay 81 to the delay 88 of the delay processing unit 8.
[0019]
The delay 81 to the delay 88 of the delay processing unit 8 are provided corresponding to the
speaker 21 to the speaker 28 of the speaker array 2, respectively, and individually change the
delay amount of the audio signal supplied to each speaker. The control unit 72 controls the delay
amounts of the delay 81 to the delay 88.
[0020]
The control unit 72 can set the virtual sound source at a predetermined position by controlling
the delay amounts of the delay 81 to the delay 88. FIG. 3 is a diagram showing a virtual sound
source localization method using a speaker array.
[0021]
As shown in the figure, the control unit 72 sets a virtual sound source V at a predetermined
position (for example, the position of the speaker H1). Although the distances from the virtual
sound source V to the speakers of the speaker array 2 are different from each other, the masker
15-04-2019
6
sound is sequentially output from the speaker closest to the virtual sound source V (speaker 21
in the figure). By outputting the voice up to the point, a plurality of third parties H3 present at a
position distant from the counter have speakers at a position equidistant from the virtual sound
source position to be focused (the position of the speaker shown by the dotted line in the figure)
It can be perceived that the masker sound is emitted simultaneously from the positions of these
virtual speakers. Therefore, the third party H3 virtually perceives that a masker sound is emitted
from the position of the speaker H1.
[0022]
Here, the control unit 72 dynamically changes the position of the virtual sound source V by
dynamically changing the delay amount of the sound signal of the masker sound supplied to each
speaker. FIG. 4 is a diagram for explaining the dynamic change of the virtual sound source
position. The figure shows an example of changing the position of the virtual sound source V2 on
the left side toward the speaker H1 from the state where the position of the virtual sound source
V1 is set on the right side toward the speaker H1 as viewed from the third party H3. .
[0023]
The control unit 72 changes the delay amounts of the delay 81 to the delay 88 every
predetermined time (for example, every second). For example, as shown in FIG. 4, when setting
the virtual sound source V1 present on the right side toward the speaker H1 from the third party
H3, the delay amount of the audio signal supplied to the speaker 21 on the right side is reduced.
The delay amount of the audio signal supplied to the left speaker 28 is set large, but when setting
the virtual sound source V2 existing on the left side, the delay amount of the audio signal
supplied to the speaker 21 is large , The delay amount of the audio signal supplied to the speaker
28 is set small. Then, the third party H3 perceives that the output position of the masker sound
has moved from the position of the virtual sound source V1 to the position of the virtual sound
source V2. For this reason, even if the same masking sound is output, the sound source position
changes, and the synthesized sound with the speaker H1 (sound heard at the same time) changes
and sounds. Therefore, it is possible to prevent accustomed to the ears of the plurality of third
parties H3 present at positions away from the counter, and to appropriately suppress the cocktail
party effect.
[0024]
15-04-2019
7
Moreover, in the example of the figure, the control part 72 corresponds to the center position S
(in the example of the figure, the position of the microphone 1). The movement area Z is set
inside the circle centered on the), and the position of the virtual sound source is randomly
changed in the movement area Z. Of course, a virtual sound source may be set out of this
movement area Z, but as the speaker H1 moves away from the position, the listener is likely to
perceive the localization position of the masca and the speaker H1 as different positions. Since
the masking effect is low, it is desirable to change the position near the speaker H1 within a
certain range to suppress the cocktail party effect.
[0025]
Furthermore, the control unit 72 dynamically changes the virtual sound source position so as to
maximize the probability of setting the virtual sound source position at the center position S and
to set the probability as low as the distance from the center position S. Is also possible. For
example, the virtual sound source position is dynamically changed with the probability according
to the Gaussian distribution. In the example of FIG. 4, in the movement area Z, the virtual sound
source position appears with a higher probability as the black position, and the virtual sound
source position appears with a lower probability as the white position. Since the masking effect
can be increased when the position is closer to the position of the speaker H1, the appearance
probability of the virtual sound source is increased near the position of the speaker H1 and the
appearance probability is set to be lower as the position is farther.
[0026]
The center position S may be set in advance in consideration of the position of the microphone
and the position of the speaker, but an arbitrary position behind the speaker array (for example,
about 0.5 m behind the center of the speaker array) The movement area Z may be set to an
arbitrary value such as a radius of 1 m, or the operation unit (not shown) where the user
performs an operation may be provided to receive a manual input from the user. Good. Also, it
may be set automatically according to the width of the speaker array. For example, a straight line
connecting the end speaker 21 and the speaker 28 of the speaker array is set, and this straight
line is set as a long side, and a right triangle or an equilateral triangle connecting the speaker 21,
the speaker 28 and the center position S is set. Then, the radius of the circle of the movement
area Z is set to the distance between the speaker 21 (or the speaker 28) and the center position
S.
15-04-2019
8
[0027]
Next, FIG. 5 is a flowchart showing the operation of the speech processing device 3. The voice
processing device 3 starts this operation at the time of the first activation (when the power is
turned on), and thereafter performs this operation every predetermined time (for example, every
one second). First, the voice processing device 3 stands by until the speaker voice is picked up
(s11). For example, it is determined that the speaker's voice is picked up when the voice of a
predetermined level or more that can be determined to be a sound is picked up. When the
speaker's voice is not collected and conversation is not performed, since the masker sound is not
necessary, the generation of the masker sound and the localization process are on standby.
However, this processing may be omitted, and masker sound generation and localization
processing may be performed at all times.
[0028]
When the speaker's voice is picked up, the voice processing device 3 causes the masker sound
generation unit 73 to generate a masker sound (s12). In addition, it is preferable that the
masking sound be in a mode in which the volume changes in accordance with the level of the
collected speaker voice. When the level of the collected speaker voice is low, the speaker voice
reaches the third party H3 at a low level, and it is difficult to grasp the contents of the
conversation, so the level of the masker sound can be lowered. On the other hand, when the level
of the collected speaker voice is high, the speaker voice reaches the third party H3 at a high level
and it is easy to grasp the contents of the conversation, so it is preferable to increase the masker
sound level. In addition, the virtual sound source position may be changed at the moment when it
dynamically changes so as to make the third party H3 perceive that the position of the virtual
sound source changes little by little, so as to reduce discomfort. .
[0029]
Then, in the voice processing device 3, the control unit 72 sets the delay amount so that the
localization position of the masker sound changes randomly (s13). For example, as shown in FIG.
4, the virtual sound source position has a high probability that the closer to the center within the
predetermined range (within the moving area Z) from the center position S (the position closer to
the speaker H1) Dynamically change the amount of delay of the audio signal supplied to each
speaker so that
15-04-2019
9
[0030]
As described above, the voice processing device 3 dynamically changes the virtual sound source
position of the masker sound, so that the third-party H3 can be heard that the masker sound is
constantly moving, and the cocktail is generated. The party effect can be properly suppressed.
[0031]
As shown in FIG. 6, the voice processing device 3 may be provided with an echo canceller.
FIG. 6 is a block diagram showing the configuration of the speech processing device 3 when the
echo canceller is provided. The same reference numerals are given to the same components as
those in FIG.
[0032]
The voice processing device 3 in this example includes an echo canceller 75 that receives the
voice signal output from the A / D converter 51. The echo canceller 75 receives an audio signal
relating to the masker sound from the masker sound generation unit 73, and filters the audio
signal relating to the masker sound using an adaptive filter that simulates the transfer
characteristic of the acoustic transmission system from the speaker to the microphone. The echo
component is reduced by subtracting the signal input from the A / D converter 51. Also, the echo
canceller 75 may be provided as many as the number of speaker units of the speaker array. Since
the sound transmission system (echo path) from the speaker to the microphone is present by the
number of each speaker, ideally, an adaptive filter is provided which estimates the echo path for
each speaker, and an audio signal to be supplied to each speaker is It is desirable to filter and
estimate and subtract echo components.
[0033]
Note that the audio processing device 3 can be realized using hardware and software of an
information processing device such as a general personal computer, even if it is not a device
dedicated to the masking system shown in the present embodiment.
15-04-2019
10
[0034]
In the present embodiment, an example in which one microphone for collecting the voice of the
speaker H1 is provided is shown, but the number of microphones may be plural.
Moreover, the aspect which provides the microphone array which arranged the some
microphone may be sufficient. In this case, the position of the speaker H1 can be detected by
detecting the phase difference of the sound collected by each microphone of the microphone
array, and the speaker H1 detects the above-mentioned center position S and the movement area
Z. (Or a position near the speaker H1).
[0035]
Further, as a means for specifying the position, a method based on information other than voice
such as image recognition or using a sensor may be used.
[0036]
H1 ... speaker H2 ... listener H3 ... third party 1 ... microphone 2 ... speaker array 3 ... voice
processing device
15-04-2019
11
Документ
Категория
Без категории
Просмотров
0
Размер файла
21 Кб
Теги
description, jp2012093594
1/--страниц
Пожаловаться на содержимое документа