close

Вход

Забыли?

вход по аккаунту

?

JP2017126888

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2017126888
Abstract: To provide a directivity control system and an audio output control method for
suppressing deterioration of privacy protection of a person. A display unit displays an image of
an imaging area captured by an imaging unit. The memory stores position information of the
target area TA and the privacy area PA designated for the image of the imaging area displayed on
the display unit. The voice emphasizing unit emphasizes the voice in the first direction from the
sound collecting unit toward the target using the position information of the target area and the
privacy area, and emphasizes the voice in the second direction toward the privacy area. The
speech determination unit determines, based on the voice in the first direction and the voice in
the second direction emphasized by the voice emphasizing unit, whether or not the target and
the privacy area respectively have a speech. The output control unit controls the output of the
voice output unit 37 of the voice in the first direction in which the voice of the privacy area leaks,
when at least an utterance occurs in the privacy area. [Selected figure] Figure 6
Directional control system and voice output control method
[0001]
The present invention relates to a directivity control system and an audio output control method
for controlling the output of collected voice.
[0002]
Conventionally, there is known a directivity control system which forms directivity in a pointing
direction from a microphone array device toward a sound collection position of sound (see, for
03-05-2019
1
example, Patent Document 1).
In the directivity control system of Patent Document 1, when the designated sound collection
position of voice is within the range of a predetermined privacy protection area (that is, a
predetermined area for protecting a person's privacy), the microphone array device To suppress
the output of the voice data of the collected voice. The privacy protection area is hereinafter
abbreviated as "privacy area".
[0003]
JP, 2015-0229241, A
[0004]
However, in Patent Document 1, for example, when a place designated as a position (hereinafter
referred to as a “target”) where directivity is to be formed to emphasize directivity for
monitoring a situation is a privacy area, the output of audio is suppressed. Or sound collection is
paused.
For example, if a point close to the privacy area is designated as a target, the sound in the privacy
area may leak into the target sound and be heard, and the contents of the conversation may be
known to others, etc. There was a problem that the privacy protection of was insufficient.
[0005]
According to the present invention, in order to solve the above-described conventional problems,
even if voice is emitted by a person in a privacy area, the contents of the voice emitted by the
person are not known to others, and privacy of the person is protected. An object of the present
invention is to provide a directivity control system and an audio output control method for
suppressing deterioration.
[0006]
The present invention comprises an imaging unit for imaging an imaging area, a sound pickup
unit for picking up sound of the imaging area, a display unit for displaying an image of the
imaging area imaged by the imaging unit, and the sound collection unit An audio output unit for
outputting the audio of the imaging area picked up by the unit; a memory for storing position
03-05-2019
2
information of a target designated for an image of the imaging area displayed on the display unit,
and privacy area; The position information of the target and the privacy area is used to
emphasize the sound in the first direction from the sound collection unit toward the target, and
further to emphasize the sound in the second direction from the sound collection unit to the
privacy area Based on the voice emphasis unit, the voice in the first direction and the voice in the
second direction emphasized by the voice emphasis unit, in the target and the privacy area An
utterance determination unit that determines whether each of the utterances has been received
and an output of the voice output unit of the voice in the first direction in which the voice of the
privacy area leaks when there is an utterance at least in the privacy area And an output control
unit.
[0007]
The present invention is the audio output control method in a directivity control system having
an image pickup unit and a sound pickup unit, wherein the image pickup unit picks up an image
pickup area and the sound pickup unit collects the sound of the image pickup area. Each position
information of the target and privacy area specified for the display unit on which the image of
the imaging area is displayed is stored in the memory, and each position information of the
target and privacy area stored in the memory is stored. The voice in the first direction from the
sound collection unit toward the target is emphasized, and the voice in the second direction from
the sound collection unit toward the privacy area is emphasized, and the voice in the first
direction emphasized And based on the voice in the second direction, it is determined whether or
not each of the target and the privacy area has an utterance, If there is speech at Vacy area,
controls the output of the audio of the first direction in which sound is leaked in the privacy area,
provides an audio output control method.
[0008]
According to the present invention, even if voice is emitted by a person in the privacy area,
deterioration of privacy protection of the person can be suppressed without the contents of the
voice emitted by the person being known to others.
[0009]
System configuration diagram showing an example of the internal configuration of the directivity
control system of the present embodiment Explanatory diagram of an example of the principle of
forming directivity in a specific direction with respect to voice collected by the microphone array
device Internal configuration of the camera device Block diagram showing an example of the
figure As a comparative example, a diagram showing an example of the voice processing
operation when the distance between the person p2 at the position designated in the privacy
area and the target person p1 is long Diagram showing an example of the voice processing
03-05-2019
3
operation when the distance between the person p2 at the designated position and the target
person p1 is short In this embodiment, the person p2 at the position designated in the privacy
area and the target person The figure which shows an example of a speech processing operation
when the distance between p1 is short, The flow explaining an example of the operation
procedure of the speech judging of the directivity control apparatus of this embodiment Data
flow chart for explaining an example of the operation procedure of voice output control (for
example, mask sound addition) of the directivity control device of the present embodiment of
voice output control (for example, substitution to another sound) of the directivity control device
of the present embodiment Flowchart for explaining an example of operation procedure
Flowchart for explaining an example of operation procedure of voice output control (for example,
mute output) of the directivity control device of the present embodiment One example of an
internal configuration of the microphone array device in the modification of the present
embodiment Block diagram shown
[0010]
Hereinafter, an embodiment (hereinafter referred to as “the embodiment”) which specifically
discloses the directivity control system and the audio output control method according to the
present invention will be described in detail with reference to the drawings as appropriate.
However, the detailed description may be omitted if necessary.
For example, detailed description of already well-known matters and redundant description of
substantially the same configuration may be omitted.
This is to avoid unnecessary redundancy in the following description and to facilitate
understanding by those skilled in the art.
It is to be understood that the attached drawings and the following description are provided to
enable those skilled in the art to fully understand the present disclosure, and they are not
intended to limit the claimed subject matter.
[0011]
03-05-2019
4
The directional control system of the present embodiment includes, for example, a surveillance
system (manned surveillance system and an unmanned surveillance system) installed in a factory,
a company, a public facility (eg, a library, an event site), or a store (eg, a retail store, a bank).
However, the installation site is not particularly limited.
Hereinafter, in order to make the description of the present embodiment intelligible, the
directivity control system of the present embodiment will be described as being installed in, for
example, a store.
[0012]
(Definition of Terms) Further, in the present embodiment, the “user” refers to a person who
operates the directivity control apparatus 30, and the situation (for example, a store clerk) of an
imaging area (for example a store) or a sound collection area The main body to monitor the
customer service situation). The “privacy area” is an area within the imaging area and the
sound collecting area, and is a predetermined area for protecting the privacy of a person (for
example, a customer visiting a store).
[0013]
FIG. 1 is a system configuration diagram showing an example of the internal configuration of the
directivity control system 10 of the present embodiment. The directivity control system 10 is
configured to include a camera device CA, a microphone array device MA, a directivity control
device 30, and a recorder RC. The camera device CA, the microphone array device MA, the
directivity control device 30, and the recorder RC are connected so as to be able to mutually
communicate data via the network NW. The network NW may be a wired network (for example,
an intranet, the Internet) or a wireless network (for example, a wireless LAN (Local Area
Network)). The recorder RC is not essential but is provided in the directivity control system 10 as
necessary, and is required when the video captured in the past and the collected sound are used
in the directivity control device 30.
[0014]
03-05-2019
5
The camera device CA as an example of the imaging unit is, for example, an omnidirectional
camera installed on a ceiling in a room, and functions as a monitoring camera capable of imaging
a space in which the own device is installed (that is, an imaging area). The camera apparatus CA
is not limited to the omnidirectional camera, and may be, for example, a fixed camera having a
fixed angle of view, or a PTZ (Pan Tilt Zoom) camera capable of pan rotation / tilt rotation / zoom
processing. The camera device CA stores the video data of the imaging area obtained by imaging
in association with the imaging time, and periodically stores the video data including the imaging
time to the directivity control device 30 and the recorder RC via the network NW. Send. Note that
the camera device CA may transmit image data including an imaging time when there is a
request from the directivity control device 30 or the recorder RC other than periodically
transmitting.
[0015]
The microphone array device MA as an example of the sound collection unit is installed, for
example, on a ceiling in a room, and collects audio in all directions in the space where the own
device is installed (that is, the sound collection area). Here, the imaging area and the sound
collecting area are substantially the same. The microphone array device MA has, for example, a
housing in which an opening is formed at the center, and further includes a plurality of
microphone units concentrically arranged around the opening in the circumferential direction.
For example, a high-quality small electret condenser microphone (ECM: Electret Condenser
Microphone) is used for the microphone unit (hereinafter referred to as "microphone").
Microphone array device MA stores voice data obtained by voice collection in association with
voice collection time, and periodically stores voice data including the voice collection time to
directivity control device 30 and recorder RC via network NW. Send to The microphone array
device MA may transmit voice data including a sound collection time when there is a request
from the directivity control device 30 or the recorder RC, in addition to periodically transmitting.
[0016]
The directivity control device 30 is, for example, a stationary PC (Personal Computer) installed
outside the room in which the microphone array device MA and the camera device CA are
installed. The directivity control device 30 uses voice data transmitted from the microphone
array device MA or the recorder RC to specify for omnidirectional voice (in other words, nondirectional voice) collected by the microphone array device MA. By forming the main beam in the
direction of (ie, forming directivity), the speech in that particular direction is enhanced. In the
present embodiment, the speech enhancement processing is described as being performed in the
03-05-2019
6
directivity control device 30, but may be performed in the microphone array device instead of
the directivity control device 30.
[0017]
Further, the directivity control device 30 detects and estimates the position of the sound source
in the imaging area (hereinafter referred to as "voice position"), and the predetermined sound
source position is within the range of the privacy area. Perform mask processing. Details of the
mask processing will be described later. The directivity control device 30 may be a portable
communication terminal such as a mobile phone, a tablet terminal, or a smartphone instead of
the PC.
[0018]
The recorder RC as an example of the recording unit is a storage device having, for example, a
large capacity storage capacity, and video data with imaging time transmitted from the camera
device CA and audio with sound collection time transmitted from the microphone array device
MA. Data is correlated and recorded. When the video data and audio data recorded in the past
(for example, captured and picked up one day ago) and the audio data are reproduced by the
directivity control device 30, the recorder RC transmits the signal from the directivity control
device 30 based on the user's operation. In response to the request, the imaging time-added
video data and the collected time-added audio data are transmitted to the directivity control
device 30.
[0019]
(Details of Configuration of Directivity Control Device) The directivity control device 30 includes
a communication unit 31, an operation unit 32, a signal processing unit 33, a display device 36,
a speaker device 37, a memory 38, and a setting management unit. And 39 at least. The signal
processing unit 33 includes an utterance determination unit 34 and an output control unit 35.
[0020]
03-05-2019
7
The setting managing unit 39 as an example of the position setting unit sets, as an initial setting,
the position of the target designated by the user and the privacy area on the display device 36 on
which the image of the imaging area captured by the camera device CA is displayed. Holds the
indicated coordinates. However, each coordinate of the target and the privacy area may be
appropriately changed by an operation using the operation unit 32 of the user. In this case, the
setting management unit 39 holds coordinates indicating the changed position. In addition,
although a target is mainly assumed and demonstrated a person, it is not limited to a person, For
example, an electronic device, a speaker, a vehicle, a robot etc. may be sufficient.
[0021]
When the target in the image displayed on display device 36 is designated by the user's finger or
stylus pen, setting management unit 39 moves from microphone array device MA toward the
target corresponding to the designated position on display device 36 (see FIG. Coordinates
indicating the first direction are calculated and acquired. Similarly, when the privacy area in the
image displayed on display device 36 is designated by the user, setting management unit 39
causes microphone array device MA to move to a designated position on display device 36 (for
example, the center position of the privacy area). Coordinates indicating a direction (second
direction) toward the center position of the corresponding privacy area are calculated and
acquired.
[0022]
In this calculation processing, the setting management unit 39 calculates the coordinates
indicating the first direction and the coordinates indicating the second direction as (θMAh1,
θMAv1) and (θMAh2, θMAv2), respectively. The details of the coordinate calculation process
are specifically described in, for example, Patent Document 1, and thus the description thereof is
omitted. θMAh1 (θMAh2) indicates the horizontal angle in the first direction (second direction)
from the microphone array device MA toward the position of the target (privacy area) in the
imaging area. θMAv1 (θMAv2) indicates the vertical angle in the first direction (second
direction) from the microphone array device MA toward the position of the target (privacy area)
in the imaging area. The calculation process may be performed by the signal processing unit 33.
[0023]
The setting management unit 39 has a memory 39z, and corresponds to the position coordinates
of the target and privacy area designated by the user for the image displayed on the display
03-05-2019
8
device 36, and from the microphone array device MA to the target and the privacy area,
respectively. The coordinates indicating the pointing direction are stored in the memory 39z.
[0024]
The setting management unit 39 sets, in the memory 39z, a predetermined sound pressure
threshold sh to be compared with the sound pressure p of the sound collected by the microphone
array device MA.
Here, the sound pressure p indicates the size of the sound collected by the microphone array
device MA, and is distinguished from the sound volume representing the size of the sound output
from the speaker device 37. The sound pressure threshold sh is set to, for example, a value such
that the user can not hear the sound collected by the microphone array device MA and output
from the speaker device 37 or the content of the sound can not be understood even if the user
can hear it.
[0025]
The communication unit 31 receives video data with imaging time transmitted from the camera
device CA or the recorder RC, and audio data with collected time data transmitted from the
microphone array device MA or the recorder RC, and outputs them to the signal processing unit
33. .
[0026]
The operation unit 32 is a user interface (UI: User Interface) for notifying the signal processing
unit 33 of the content of the user's input operation, and is configured of, for example, a pointing
device such as a mouse or a keyboard.
The operation unit 32 may be disposed, for example, in correspondence with the screen of the
display device 36, and may be configured using a touch panel or a touch pad that allows an input
operation with a user's finger or a stylus pen.
[0027]
03-05-2019
9
Further, in the image (see FIGS. 4 to 6) of the imaging area displayed on the display device 36,
the operation unit 32 designates an area TA of a target that the user wants to listen to actively by
the user's operation. Then, coordinates indicating the designated position are acquired and
output to the signal processing unit 33. Similarly, in the image (see FIGS. 4 to 6) of the imaging
area displayed on the display device 36, the operation unit 32 operates the privacy area PA for
the user not to be heard by the user for privacy protection. When designated, coordinate data
representing the designated position is acquired and output to the signal processing unit 33.
[0028]
The memory 38 is configured using, for example, a RAM (Random Access Memory), and
functions as a program memory, a data memory, and a work memory when the directivity control
device 30 operates. In addition, the memory 38 stores the voice data of the voice collected by the
microphone array device MA in association with the collection time, and further associates the
video data of the imaging area captured by the camera device CA with the imaging time
Remember. Although the details will be described later, the signal processing unit 33 uses the
voice data stored in the memory 38 to determine whether voice is detected in the area TA or
privacy area PA of the target designated by the user. Therefore, the sound is reproduced slightly
later than the actual sound collection time of the sound collected by the microphone array device
MA. This delay time is the time required for processing to determine whether the voice has been
detected in the target area TA or privacy area PA after the microphone array device MA picks up
the voice. In addition, by storing voice data in the memory 38 for a certain period, the signal
processing unit 33 is a voice collected from a predetermined time before the time when the voice
of the target area TA and the privacy area PA is detected respectively. It is also possible to
control the output of Thus, the memory 38 also functions as a buffer memory for temporarily
storing audio data for a fixed period.
[0029]
In addition, the memory 38 may store a mask sound (see later) prepared in advance.
[0030]
The signal processing unit 33 is configured using, for example, a central processing unit (CPU), a
micro processing unit (MPU), or a digital signal processor (DSP), and generally controls the
operation of each unit of the directivity control device 30. Control processing, data input / output
processing with other units, data calculation (calculation) processing, and data storage
03-05-2019
10
processing are performed.
[0031]
The signal processing unit 33 as an example of the voice emphasizing unit uses the voice data
stored in the memory 38 (in other words, voice data for a fixed period collected by the
microphone array device MA) from the microphone array device MA The main beam of
directivity is formed in the pointing direction toward the sound source position corresponding to
the designated position in the image displayed on the display device 36.
More specifically, the signal processing unit 33 transmits an actual target corresponding to the
area TA and the privacy area PA in the video displayed on the display device 36 from the
microphone array device MA, a privacy area (for example, a center position). Each of the voices
in the target area and the privacy area is emphasized by forming directivity in each of them.
As a result, since the target sound and the sound in the privacy area are enhanced, respectively,
the speaker device 37 outputs the sound more clearly.
[0032]
Hereinafter, the target voice after emphasis processing by the signal processing unit 33 is
referred to as "target emphasis voice", and the voice of the privacy area after emphasis
processing by the signal processing unit 33 is referred to as "privacy area emphasis voice".
[0033]
The speech determination unit 34 determines whether or not there is a target speech based on
the emphasized voice of the target.
Specifically, the speech determination unit 34 calculates the sound pressure p of the emphasized
voice of the target, and determines that the speech of the target is present when the sound
pressure threshold sh stored in the memory 39z is exceeded, If the pressure threshold sh or
lower, it is determined that there is no target utterance.
03-05-2019
11
[0034]
Further, the speech determination unit 34 determines whether there is a speech in the privacy
area based on the emphasized sound of the privacy area. Specifically, the speech determination
unit 34 calculates the sound pressure p of the emphasized sound of the privacy area, and
determines that speech is present in the privacy area when the sound pressure threshold sh
stored in the memory 39z is exceeded. If the sound pressure threshold sh or less, it is determined
that there is no speech in the privacy area. The speech determination unit 34 holds the presence
or absence of the target's speech and the determination result of the presence or absence of the
speech in the privacy area as a speech determination result (see later). Details of the operation of
the speech determination unit 34 will be described later.
[0035]
In addition, the speech determination unit 34 divides, for example, the imaging area into a
plurality of blocks, forms directivity of sound for each block, and the sound having a sound
pressure p exceeding the predetermined sound pressure threshold sh in each directivity direction
is Whether or not there is an utterance of the target or the privacy area may be determined
depending on whether or not there is. Also, for a method of estimating the sound source by the
signal processing unit 33, see, for example, “Thesis,“ Plural sound source position estimation
based on the CSP method using a microphone array ”Nishinobu, Takanobu et al. J83-D-11 No. 1
8 pp. 1713-1721, August 2000].
[0036]
The output control unit 35 controls the operations of the camera device CA, the microphone
array device MA, the display device 36 and the speaker device 37, and outputs the video data
transmitted from the camera device CA to the display device 36. The transmitted audio data is
output from the speaker device 37 as audio.
[0037]
In addition, the output control unit 35 determines whether to add a mask sound to the emphasis
sound of the target, according to the speech determination result of the speech determination
unit 34.
03-05-2019
12
The mask sound to be used may be generated, for example, by using the emphasized sound of
the privacy area PA, or may be a prepared beep sound, noise sound, melody sound, or a
combination thereof. In addition, the output control unit 35 may convert the emphasized sound
of the target into a predetermined alternative sound (for example, pee sound, noise sound,
melody sound, etc.) instead of adding the mask sound to the emphasized sound of the target. It
may be mute output (ie, silence). Details of the operation of the output control unit 35 will be
described later.
[0038]
A display device 36 as an example of a display unit displays an image of an imaging area
captured by the camera device CA.
[0039]
The speaker device 37 as an example of the audio output unit outputs audio data of audio data
collected by the microphone array device MA or audio data after enhancement processing by the
signal processing unit 33.
The display device 36 and the speaker device 37 may be configured as separate devices from the
directivity control device 30.
[0040]
FIG. 2 is an explanatory diagram of an example of the principle of forming directivity in a specific
direction with respect to sound collected by the microphone array device MA. The directivity
control device 30 adds voice data collected by the microphones MA1 to MAn by directivity
control processing of voice data using voice data transmitted from the microphone array device
MA, and the microphone array device In order to emphasize voice (volume level) in a specific
direction from the position of each microphone MA1 to MAn of MA, voice data having directivity
in a specific direction is generated. The specific direction is a direction from the microphone
array device MA toward an actual sound source corresponding to the position designated by the
operation unit 32. In addition, the technique regarding the directivity control process of the
audio | voice data for forming the directivity of the audio | voice collected by the microphone
03-05-2019
13
array apparatus MA is, for example, Unexamined-Japanese-Patent No. 2014-143678 or 2015029241 (patent document) As shown in 1), etc., it is a known technique.
[0041]
In FIG. 2, the microphones are arranged in a linear array on a straight line in order to make the
explanation easy to understand. In this case, the directivity is a two-dimensional space in a plane,
but in order to form the directivity in a three-dimensional space, microphones may be arranged
in a two-dimensional array to perform the same processing method.
[0042]
A sound wave emitted from the sound source 80 has a certain angle (incident angle = (90-θ)
with respect to each of the microphones MA1, MA2, MA3, ..., MA (n-1), MAn contained in the
microphone array device MA. Incident at [degree]). The incident angle θ may be a horizontal
angle θMAh or a vertical angle θMAv in the pointing direction from the microphone array
device MA toward the audio position.
[0043]
The sound source 80 is, for example, a conversation of a person who is a subject of the camera
device CA (for example, a person in the target area TA or a person in the privacy area PA)
present in the sound collection direction of the microphone array device MA. It exists in the
direction of the predetermined angle θ with respect to the surface of the housing 21 of the
microphone array device MA. Also, the distance d between the microphones MA1, MA2, MA3, ...,
MA (n-1), MAn is constant.
[0044]
The sound wave emitted from the sound source 80 first reaches the microphone MA1 and is
picked up, then reaches the microphone MA2 and is picked up, and is similarly picked up one
after another, finally reaching the microphone MAn and picked up Be done.
[0045]
03-05-2019
14
The microphone array device MA converts analog voice data collected by the microphones MA1,
MA2, MA3, ..., MA (n-1), MAn into A / D converters 241, 242, 243, ..., 24 (n -1) AD conversion to
digital audio data in 24n.
[0046]
Furthermore, the microphone array device MA is configured to detect the difference in arrival
time at each of the microphones MA1, MA2, MA3,..., MA (n-1), MAn in the delay units 251, 252,
253 to 25 (n-1), 25n. After the phases of all sound waves are aligned by giving corresponding
delay times, the adder 26 adds the audio data after the delay processing.
As a result, the microphone array device MA can form directivity of voice data in the direction of
the predetermined angle θ in each of the microphones MA1, MA2, MA3, ..., MA (n-1), MAn.
Audio data 27 can be obtained.
[0047]
Thus, the microphone array device MA changes the delay times D1, D2, D3, ..., Dn-1, Dn set in the
delay units 251, 252, 253, ..., 25 (n-1), 25 n. Thus, the directivity of the collected voice data can
be easily formed.
It is to be noted that the directivity control device 30 can realize formation of directivity of voice
data (that is, enhancement processing of voice data in a specific direction), and in this case, the
directivity control device 30 is shown in FIG. The delay units 251, 252,..., 25 (n-1), 25n, and the
adder 26 may be at least provided. That is, the directivity control device 30 gives the delay time
corresponding to the arrival time difference in each of the microphones MA1, MA2, MA3, ..., MA
(n-1), MAn to align the phases of all the sound waves, and then adds them. The audio data after
delay processing may be added in the unit 26.
[0048]
FIG. 3 is a block diagram showing an example of the internal configuration of the camera
apparatus CA. The camera device CA is configured to include at least a CPU 41, a communication
03-05-2019
15
unit 42, a power supply unit 44, an image sensor 45, a memory 46, and a network connector 47.
In FIG. 3, a lens for imaging incident light on the image sensor 45 is omitted.
[0049]
The CPU 41 controls each part of the camera apparatus CA in an integrated manner. The CPU 41
may have a motion detection unit 41z that detects the motion of a person in the imaging area SA
based on each of the images constituting the video captured by the image sensor 45. There are
various known techniques for detecting the movement of a person. For example, the motion
detection unit 41z calculates the difference between the captured image frames, and in the
motion region obtained from the image frame difference, a range in which the ratio of the
vertical length and the horizontal length is assumed to be a person If there is an inner one, the
movement area is detected as the movement of a person.
[0050]
The image sensor 45 captures an image of the imaging area SA to obtain image data, and for
example, a complementary metal oxide semiconductor (CMOS) or a charge coupled device (CCD)
is used.
[0051]
The memory 46 is configured using a ROM (Read Only Memory) in which data of operation
programs and setting values in the camera device CA are stored, and a RAM that stores image
data and work data.
Further, the memory 46 may be configured to further include a memory card which is
detachably connected to the camera device CA and in which various data are stored.
[0052]
The communication unit 42 is a network interface that controls data communication with the
network NW connected via the network connector 47.
[0053]
03-05-2019
16
The power supply unit 44 supplies DC power to each part of the camera apparatus CA, and
supplies DC power to devices connected to the network NW through the network connector 47.
[0054]
The network connector 47 is a connector that transmits communication data such as image data
and can be fed via a network cable.
[0055]
Next, an outline of the voice processing operation of the present embodiment will be described
with reference to FIGS.
FIG. 4 is a diagram showing an example of the voice processing operation when the distance
between the person p2 at the position designated in the privacy area and the target person p1 is
long, as a comparative example.
FIG. 5 is a diagram showing an example of the voice processing operation when the distance
between the person p2 at the position designated in the privacy area and the target person p1 is
short, as a comparative example.
FIG. 6 is a diagram showing an example of the voice processing operation when the distance
between the person p2 at the position designated in the privacy area and the target person p1 is
short in this embodiment. FIGS. 4 to 6 show the difference in the operation of the voice output
processing according to the presence or absence of designation of the privacy area when there
are two speakers in a certain space. For example, a person p1 is a store clerk in a store, and a
person p2 is a customer in the store.
[0056]
In FIGS. 4 to 6, the imaging area SA imaged by the camera device CA is inside the reception
space. In the sitting space, two persons p1 and p2 face each other and talk while sitting on chairs
73 and 74, respectively. A person p1 sitting on the chair 73 is designated as a target, and an area
including the person p2 sitting on the chair 74 is designated as a privacy area. In other words,
03-05-2019
17
the voice of the person p1 can be regarded as the target voice, and the voice of the person p2
can be regarded as the voice of the person whose privacy is to be protected.
[0057]
In FIGS. 4 to 6, since the target area TA is designated in common to the person p1 in the image
displayed on the display device 36 by the operation of the finger FG of the user, from the speaker
device 37, The target emphasized voice (in other words, the emphasized voice of "Hello" uttered
by the person p1) is output. Here, the target area TA is designated so as to surround the person
p1. The designation of the target area TA is not limited to the designation by the user's finger FG,
and position coordinates representing a range may be directly input from a keyboard (not
shown), or the signal processing unit 33 may display a specific face image that appears It may be
performed by image processing such as recognition and setting in a range surrounding the face
image.
[0058]
Further, in FIG. 4, since the person p1 as the target and the person p2 as the target of privacy
protection are separated from each other, the uttered voice of the person p2 does not leak into
the emphasis sound of the person p1 designated as the target. The sound outputted from the
speaker device 37 is only the emphasized sound of the person p1.
[0059]
Next, in FIG. 5, as compared with FIG. 4, the distance between the person p1 as the target and the
person p2 as the target of privacy protection is closer.
In this case, there is a high possibility that the uttered voice of the person p2 (specifically, the
voice of “Thanks”) leaks or leaks into the enhanced voice of the person p1 designated as the
target. Therefore, the content of the uttered voice of the person p2 whose privacy should be
originally protected is output from the speaker device 37, and the privacy of the person p2 can
not be protected properly.
[0060]
03-05-2019
18
Therefore, in the present embodiment, as shown in FIG. 6, the privacy area PA is specified by the
operation of the finger FG of the user together with the area TA of the target. Although details
will be described later, when it is determined that there is an utterance in the privacy area PA, a
mask sound is added to the target emphasized voice (voice of “Hello” in FIG. 6) and output
from the speaker device 37 Ru. Thereby, when the persons p1 and p2 are close and talking, the
utterance content of the person p2 can leak or leak into the target emphasized voice (that is, the
voice in which the utterance content of the person p1 is emphasized) Although the quality is
enhanced, the addition of the mask sound suppresses the output of the voice of the person p2
from the speaker device 37. In other words, only the voice of the person p1 designated as the
target is emphasized and output from the speaker device 37, and the voice of the person p2 is
recognized by the user as a vague voice. It is possible to properly protect p2's privacy.
[0061]
(Details of Voice Processing in Directional Control Device) FIG. 7 is a flow chart for explaining an
example of an operation procedure of speech determination of the directional control device 30
of the present embodiment. As a premise of the description of FIG. 7, the signal processing unit
33 uses the voice data transmitted from the microphone array device MA or the recorder RC to
move from the microphone array device MA toward the person p1 corresponding to the target
area TA ( We have finished emphasizing the voice in the first direction). Similarly, the signal
processing unit 33 uses the audio data transmitted from the microphone array device MA or the
recorder RC to transmit voice in a direction (second direction) from the microphone array device
MA toward the person p2 corresponding to the privacy area PA. The emphasis has been
processed.
[0062]
In FIG. 7, the speech determination unit 34 reads the initial setting value held in the setting
management unit 39 (S1). Specifically, the speech determination unit 34 reads out from the
memory 39z of the setting management unit 39 the sound pressure threshold sh for determining
the presence or absence of a person's speech in the target area TA and the privacy area PA as an
initial setting value. get.
[0063]
03-05-2019
19
The speech determination unit 34 inputs the enhanced voice of the target based on the voice
data transmitted from the microphone array device MA and the enhanced voice of the privacy
area (S2). The speech determination unit 34 calculates the sound pressure of the enhanced voice
of the target input in step S2, and further calculates the sound pressure of the enhanced voice of
the privacy area input in step S2 (S3).
[0064]
The speech determination unit 34 compares the sound pressure p of the enhanced sound of the
privacy area calculated in step S3 with the sound pressure threshold sh acquired in step S1 to
obtain a person (specifically, FIG. 6) in the privacy area PA. It is determined whether or not there
is an utterance of the person p2) of (S4). The speech determination unit 34 does not have a
speech of a person (specifically, the person p2 in FIG. 6) in the privacy area PA (in other words,
the sound pressure p of the emphasized sound of the privacy area is less than the sound pressure
threshold sh) If it is determined (S4, NO), the utterance determination result = 3 is stored in the
memory 38 (S5). Thus, the process of the speech determination unit 34 illustrated in FIG. 7 ends.
[0065]
On the other hand, the speech determination unit 34 has an utterance of a person (specifically,
the person p2 in FIG. 6) in the privacy area PA (in other words, the sound pressure p of the
emphasized sound of the privacy area exceeds the sound pressure threshold sh (S4, YES), a mask
sound is generated using enhanced sound of the privacy area (S6). In the present embodiment,
the mask sound is a sound to be added to the emphasized voice of the target, and in order to
protect the privacy of the person p2 in the privacy area PA (that is, the speech content of the
person p2 is output from the speaker device 37) It is a mixed sound for making it difficult to
know what the uttered content of the person p2 is. As a method of generating a mask sound, for
example, a method of dividing enhanced speech of privacy area into fine time (for example, 500
ms) areas and generating them separately, one that configures emphasized speech of privacy
area not speech in time domain A method of generating speech separately for each phoneme, a
method of analyzing the frequency characteristics of enhanced speech in the privacy area and
generating sound pressure in a specific band up and down, collected in the past within the same
privacy area The method may be any method such as a method of superimposing a plurality of
uttered speech to generate, and in any case, it is a known technique.
03-05-2019
20
[0066]
After step S6, the speech determination unit 34 compares the sound pressure p of the
emphasized speech of the target calculated in step S3 with the sound pressure threshold sh
acquired in step S1 to obtain a person (specifically, in the area TA of the target). Specifically, it is
determined whether or not there is an utterance of the person p1) in FIG. 6 (S7). The speech
determination unit 34 determines that there is no speech of a person (specifically, the person p1
in FIG. 6) within the target area TA (in other words, the sound pressure p of the emphasized
sound of the target is less than the sound pressure threshold) If it is determined (S7, NO), the
utterance determination result = 2 is stored in the memory 38 (S8). Thus, the process of the
speech determination unit 34 illustrated in FIG. 7 ends.
[0067]
On the other hand, the speech determination unit 34 has made a speech of a person (specifically,
the person p1 in FIG. 6) within the target area TA (in other words, the sound pressure p of the
emphasized speech of the target exceeds the sound pressure threshold sh When it is determined
(YES at S7), the utterance determination result = 1 is stored in the memory 38 (S9). Thus, the
process of the speech determination unit 34 illustrated in FIG. 7 ends.
[0068]
FIG. 8 is a flow chart for explaining an example of the operation procedure of the sound output
control (for example, mask sound addition) of the directivity control device 30 of the present
embodiment. The output control unit 35 uses the speech determination result determined by the
speech determination unit 34 to determine whether it is necessary to add a mask sound to the
target emphasized voice. As a premise of the description of FIG. 8, it is assumed that one of the
speech determination results is held in the memory 38 by the speech determination unit 34
shown in FIG. 7.
[0069]
In FIG. 8, the output control unit 35 reads the target emphasized voice from the memory 38 and
03-05-2019
21
inputs it (S11). The output control unit 35 reads out the speech determination result from the
memory 38 and inputs it (S12). The output control unit 35 reads out and inputs the mask sound
generated in step S6 (S13). Note that, in step S13, the output control unit 35 may read out the
mask sound prepared in advance from the memory 38 and input it instead of inputting the mask
sound generated in step S6.
[0070]
The output control unit 35 determines whether the utterance determination result input in step
S12 is 3 (S14). If the output control unit 35 determines that the utterance determination result is
3 (S14, YES), it means that there is no utterance of the person p2 in the privacy area PA, so it is
not necessary to mask the emphasized voice of the target. to decide. That is, the output control
unit 35 causes the speaker device 37 to output the emphasized voice of the target input in step
S11 as it is (S15).
[0071]
On the other hand, when it is determined that the utterance determination result is not 3 (S14,
NO), the output control unit 35 masks the emphasized voice of the target because there is an
utterance of the person p2 in the privacy area PA. Determine that it is necessary. The output
control unit 35 reads and acquires the coordinates indicating the position information of the
target area TA and the privacy area PA held in the memory 39z of the setting management unit
39 (S16).
[0072]
Further, when the output control unit 35 determines that it is necessary to mask the emphasized
voice of the target, the mask sound input in step S13 based on the position information of the
area TA of the target and the privacy area PA acquired in step S16. Adjust the volume of (S17).
The output control unit 35 calculates and adjusts the volume of the mask sound based on the
position of the target and the position of the privacy area. More specifically, when the output
control unit 35 calculates the angle between the target and the privacy area in the specific
direction of the microphone (for example, the microphone MA1) of the microphone array device
MA and transmits the calculated angle to the microphone MA1. And the difference of the sound
volume attenuation amount when transmitting from the privacy area to the microphone MA1,
03-05-2019
22
and the sound volume of the mask sound according to the difference is calculated.
[0073]
The output control unit 35 emphasizes the privacy area when the utterance determination result
is 2 (that is, when there is an utterance of the person p2 in the privacy area PA but there is no
utterance of the person p1 in the target area TA). An appropriate volume of the mask sound may
be determined by the difference between the voice and the target emphasized voice.
[0074]
After step S17, the output control unit 35 adds the mask sound having the volume adjusted in
step S17 to the emphasis sound of the target input in step S11, and causes the speaker device 37
to output the sound (S18).
[0075]
FIG. 9 is a flow chart for explaining an example of an operation procedure of voice output control
(for example, substitution to another sound) of the directivity control device 30 of the present
embodiment.
FIG. 10 is a flowchart for explaining an example of the operation procedure of the audio output
control (for example, mute output) of the directivity control device 30 of the present
embodiment.
In the description of FIGS. 9 and 10, the same step number is assigned to the same process as the
process shown in FIG. 8 and the description is omitted, and different contents will be described.
[0076]
In FIG. 9, when the output control unit 35 determines that the utterance determination result is
not 3 (S14, NO), it means that there is an utterance of the person p2 in the privacy area PA.
Decide that you need to mask. The output control unit 35 converts the emphasis sound of the
target input in step S11 into any one of a beep sound, a melody sound, or a mute output (that is,
silence), and outputs the sound from the speaker device 37 (S19). That is, in the present
03-05-2019
23
embodiment, the mask sound does not have to be based on the emphasized sound of the privacy
area PA, and may be a peep sound (P sound), a melody sound or the like prepared in advance. As
a result, the sound generated in the privacy area is not output from the speaker device 37 at all.
[0077]
In FIG. 10, when the output control unit 35 determines that the utterance determination result is
not 3 (S14, NO), there is an utterance of the person p2 in the privacy area PA. Decide that you
need to mask. Further, the output control unit 35 determines whether or not the utterance
determination result input in step S12 is 2 (S20).
[0078]
When the output control unit 35 determines that the utterance determination result is 2 (S20,
YES), the emphasis sound of the target input in step S11 is a beep sound, a melody sound, or a
mute output (that is, silence). And the voice output from the speaker device 37 (S19).
[0079]
On the other hand, when the output control unit 35 determines that the utterance determination
result is not 2 (that is, the utterance determination result is 1) (S20, NO), there is an utterance of
the person p2 in the privacy area PA. Therefore, it is determined that the target emphasized
speech needs to be masked.
That is, since the process of step S16-step S18 shown in FIG. 8 is each performed, detailed
description is abbreviate | omitted.
[0080]
As described above, in the directivity control system 10 of the present embodiment, the camera
device CA captures an image of the imaging area SA. The microphone array device MA picks up
the sound of the imaging area SA. The display device 36 displays an image of the imaging area
SA captured by the camera device CA. The speaker device 37 outputs the sound of the imaging
area SA collected by the microphone array device MA. The setting management unit 39 stores, in
03-05-2019
24
the memory 39z, position information of the target area TA and the privacy area PA designated
for the video data of the video area displayed on the display device 36.
[0081]
The directivity control apparatus 30 uses the position information of the area TA of the target to
emphasize the voice in the first direction from the microphone array apparatus MA toward the
target to generate the emphasized voice of the target. Similarly, using the position information of
the privacy area PA, the signal processing unit 33 emphasizes the voice in the second direction
from the microphone array device MA toward the privacy area to generate emphasis voice of the
privacy area.
[0082]
The directivity control device 30 determines whether there is an utterance in the target area or
the privacy area based on the emphasis sound in the target and the emphasis sound in the
privacy area, and determines that there is an utterance in at least the privacy area. Control the
output of emphasized speech of the target where area speech leaks. The output control unit 35
adds the mask sound to, for example, the emphasized voice of the target and outputs the result
from the speaker device 37.
[0083]
Thereby, even when a person (for example, the person p2 in FIG. 6) utters at least in the privacy
area PA, even if there is a target person (for example, the person p1 in FIG. 6) near the privacy
area. In other words, even if the voice of the person in the privacy area leaks into the enhanced
voice after the target voice is enhanced, the mask sound is added to the enhanced voice of the
target, so the enhanced voice from the target enhanced voice I can not understand the contents
of the voice of the person in PA. That is, according to the directivity control system 10 of the
present embodiment, even if a voice is emitted within the privacy area, the content is not known
to others, and the target voice is emphasized so that it can be heard clearly In addition to being
able to, the privacy of the person in the privacy area can be properly protected. Further, since the
mask sound can be heard from the speaker device 37 in a mixed state, the user who has heard
the mixed sound can know when it is uttered even if the content of the sound in the privacy area
PA is not known.
03-05-2019
25
[0084]
Further, when the sound pressure p of the emphasized sound in the privacy area exceeds the
sound pressure threshold sh, the directivity control device 30 adds the mask sound to the
emphasized sound of the target and outputs it from the speaker device 37, When the sound
pressure p is equal to or less than the sound pressure threshold sh, the addition of the mask
sound is not performed and the emphasized sound of the target is output from the speaker
device 37 as it is. Thereby, even if a sound is emitted within the range of the privacy area PA, if
the sound pressure of the sound is equal to or less than the sound pressure threshold sh, the
emphasized sound of the target is clearly output, and the mask sound addition process The
unnecessary processing can be omitted, and the processing load of the directivity control device
30 can be reduced.
[0085]
In addition, when the directivity control device 30 determines that there is an utterance in the
privacy area, instead of adding the mask sound to the emphasis sound of the target, the emphasis
sound of the target is replaced with a predetermined alternative sound (for example, beep sound,
melody sound , And may be output from the speaker device 37. As a result, since the emphasized
voice of the target in the imaging area SA changes to the substitute sound, it is also difficult to
infer the contents of the two conversations from the voice emitted by the person (for example,
the target person p1) who has left the privacy area PA Confidentiality is increased.
[0086]
Also, the directivity control apparatus 30 generates mask sound using enhanced sound of the
privacy area. Thus, the directivity control apparatus 30 uses the enhanced sound of the privacy
area, and therefore can generate a highly accurate mask sound for scraping the sound of the
privacy area leaked into the enhanced sound of the target.
[0087]
03-05-2019
26
Further, the directivity control device 30 stores the mask sound in the memory 38 in advance,
and reads out and acquires it from the memory 38 when adding it to the emphasis sound of the
target. This eliminates the need for the directivity control apparatus 30 to generate mask sound
dynamically using enhanced sound in the privacy area, and can reduce the load of addition
processing on the enhanced sound of the target.
[0088]
Further, the directivity control device 30 adjusts the volume of the mask sound to be added to
the emphasis sound of the target based on each position information of the target and the
privacy area. Thereby, the directivity control device 30 effectively predicts the amplitude
attenuation amount at the time of transmission of voice according to the position of the target
viewed from the microphone array device MA and the position of the privacy area, and then the
volume of the appropriate mask sound is obtained. You can get it.
[0089]
The directivity control device 30 stores, in the memory 38, a plurality of voice data collected in
the past in the privacy area, and the plurality of collected voices in the past read from the
memory 38 as emphasized voices in the privacy area. The mask sound is generated by adding.
Thereby, the directivity control apparatus 30 mixes a plurality of contents spoken in the past in
the privacy area in view of the fact that the conversation contents such as the privacy area
should be concealed, thereby emphasizing the target emphasized voice. It is possible to obtain an
appropriate mask sound for squeezing out the sound of the privacy area leaked into the.
[0090]
In addition, the directivity control device 30 limits the output of the emphasized voice of the
target in which the voice of the privacy area leaks. Thereby, the directivity control apparatus 30
does not output from the speaker device 37 any conversation content of the person who uttered
in the privacy area where the conversation content should be concealed, so the person in the
privacy area PA (for example, the person p2 in FIG. 6). ) Can be properly protected.
[0091]
03-05-2019
27
Further, the directivity control device 30 processes and outputs the enhanced voice of the target
in which the voice of the privacy area leaks. As a result, the directivity control device 30 switches
the conversation content of the person uttered in the privacy area where the conversation
content should be concealed to another voice completely and outputs it from the speaker device
37, so that the person in the privacy area PA ( For example, the privacy of the person p2) in FIG.
6 can be properly protected.
[0092]
When directivity control device 30 receives a designation operation of privacy area PA from
operation unit 32 based on the user's operation, directivity control device 30 sets the coordinates
of the position according to the designation operation as position information of the privacy area.
Thereby, the user can arbitrarily set the privacy area PA by designating the privacy area PA so as
to trace the screen with a finger FG or a stylus pen, for example, with respect to an image
captured by the camera device CA.
[0093]
Further, the directivity control device 30 stores the sound of the imaging area SA collected by the
microphone array device MA in the memory 38 together with the collection time. The directivity
control device 30 controls the output of the sound collected in the memory 38 and collected
from a predetermined time before the time when the sound is detected in the privacy area PA.
Thereby, since the voice output is controlled in directivity control device 30 from a
predetermined time before voice from privacy area PA is detected, a short time required for
processing from voice detection in the privacy area to the output of mask sound It is possible to
prevent the beginning part (outgoing part) of the sound in the privacy area PA from being output
without the masking process (for example, for several seconds). Therefore, it is also avoided that
the contents can be detected from the beginning of the voice. In this case, the voice data
collected once is stored in the memory 38 and then played back, or the voice is collected by the
microphone array device MA a little later than the voice collection time (delayed from real time)
It is effective when playing back audio. The predetermined time is a short time (for example,
several seconds of about three seconds) required to determine whether or not the voice is
detected in the privacy area after the microphone array device MA picks up the voice. ).
03-05-2019
28
[0094]
(Modification of the Present Embodiment) In the present embodiment described above, the
directivity control device 30 stores position information (that is, position coordinates) of the
target and the privacy area, and it is determined that an utterance has occurred in the privacy
area. The mask sound is added to the target emphasis sound, or the target emphasis sound is
replaced with a predetermined alternative sound or silenced. A modification of the present
embodiment (hereinafter simply referred to as “modification”) shows a case where the
microphone array device performs these processes instead of the directivity control device 30.
[0095]
FIG. 11 is a block diagram showing an example of the internal configuration of the microphone
array device MB in the modification of the present embodiment. In the microphone array device
MB of the modified example, the same components as those of the microphone array device MA
in the above-described embodiment are denoted by the same reference numerals, and the
description thereof is omitted.
[0096]
The microphone array device MB includes a plurality of microphones MB1, MB2, ..., MBn,
amplifiers 231, 232, ..., 23n, A / D converters 241, 242, 243, ..., 24n, a CPU 25, an encoding unit
28, and a communication unit 29 is included.
[0097]
The amplifiers 231, 232,..., 23n amplify audio signals collected by the plurality of microphones
MB1, MB2,.
[0098]
The A / D converters 241, 242, 243, ..., 24n convert the audio signals amplified by the amplifiers
231, 232, ..., 23n into digital audio data.
[0099]
03-05-2019
29
The CPU 25 picks up sound with a plurality of microphones MB1, MB2, ..., MBn, inputs voice data
converted by the A / D converters 241, 242, 243, ..., 24n, and based on these voice data, Perform
various voice output processing.
The CPU 25 stores voice data obtained by collecting the plurality of microphones MB1, MB2,...,
MBn in an internal memory (not shown) in association with the collection time.
[0100]
Further, for example, when the target area TA or privacy area PA is designated by the user, the
CPU 25 receives the position information of the target or privacy area transmitted from the
directivity control apparatus 30 at the communication unit 29.
Furthermore, the CPU 25 collects position data of the target and the privacy area for voice data
collected by the microphones MB1, MB2, ..., MBn and converted by the A / D converters 241,
242, 243, ..., 24n, respectively. To emphasize the voice in the direction from the microphone
array apparatus MB toward the target, and to emphasize the voice in the direction from the
microphone array apparatus MB to the privacy area.
[0101]
When the speech determination result in the speech determination unit 34 is transmitted from
the directivity control device 30, the CPU 25 receives the speech determination result in the
communication unit 29 and stores it in an internal memory (not shown).
If the speech determination result stored in the internal memory is not 3 (i.e., if the speech
determination result is 1 or 2), the CPU 25 performs the above-described operation on the
emphasized voice in the direction from the microphone array device MB toward the target. The
added mask sound is added, the emphasis sound of the target is replaced with a predetermined
alternative sound, or silence is made. The processes of addition of the mask sound in the CPU 25,
replacement with a predetermined alternative sound, and silence processing are the same as the
process of the output control unit 35 of the present embodiment described above, and thus the
detailed description will be omitted.
03-05-2019
30
[0102]
The encoding unit encodes audio data output from the CPU 25 and generates an audio packet
that can be transmitted by the network NW.
[0103]
The communication unit 29 transmits the voice data encoded by the encoding unit 28 to the
directivity control device 30 via the network NW.
The communication unit 29 also receives various types of information transmitted from the
directivity control device 30 via the network NW. The various types of information include, for
example, each position information of the target and the privacy area, and a speech
determination result in the speech determination unit 34.
[0104]
As described above, the microphone array device MB of the modification stores the collected
voice data in association with the collection time, and stores the stored voice data and the
collection time data via the network NW. It transmits to the control device 30. Further, when the
speech determination result in directivity control device 30 is transmitted from directivity control
device 30, microphone array device MB is directed from microphone array device MB toward the
target when the received speech determination result is not 3 The above-mentioned mask sound
is added to the emphasized sound of, the emphasis sound of the target is replaced with a
predetermined alternative sound, or silenced.
[0105]
Further, by using the microphone array device MB of the modified example, audio data
transmitted from the microphone array device MB to the directivity control device 30 is already
masked in the microphone array device MB, replaced with substitute sound or silenced. Since
voices of persons in the privacy area do not leak to the outside even if eavesdropping is
performed on the way, voice data can be transmitted safely. Also, in this case, it may be added to
the header of the audio data that the audio data is masked as attached information, or the audio
03-05-2019
31
data is masked on the side that has received the audio data. You can know immediately that it is
being processed. The attached information may include time information, position information,
and the like.
[0106]
Although the embodiments have been described above with reference to the drawings, it goes
without saying that the present invention is not limited to such examples. It will be apparent to
those skilled in the art that various changes and modifications can be made within the scope of
the appended claims, and of course these also fall within the technical scope of the present
invention. It is understood.
[0107]
For example, in the above embodiment, when the voice position of the voice detected by the
microphone array device is within the privacy area, the voice detected in the imaging area SA is
always subjected to the mask processing (mask sound addition). Does not have to be masked. For
example, the output control unit 35 may perform mask processing when the user operating the
directivity control device 30 is a general user, and may not perform mask processing when the
user is an authorized user such as a manager. is there. It can be determined, for example, by the
user ID or the like at the time of logging in to the directivity control device 30, which user the
user is. Further, not only mask processing or mask processing according to the authority of the
user, but also replacement processing with alternative sound and silence processing are the
same.
[0108]
Further, the output control unit 35 may perform voice change processing (processing
processing) on voice data of voice collected by the microphone array device MA. As an example
of the voice change processing, the output control unit 35 changes, for example, the high and
low of the frequency (pitch) of the voice data of the voice collected by the microphone array
device MA. That is, by changing the frequency of the sound output from the speaker device 37 to
another frequency that makes it difficult to understand the content of the sound, the content of
the sound heard from within the privacy area can be obscured. Therefore, it becomes difficult to
know the content of the sound collected by the microphone array device MA. As described above,
03-05-2019
32
the output control unit 35 processes the sound collected by the microphone array device MA and
causes the speaker device 37 to output the processed sound, so that the privacy of the subject
(for example, a person) present in the privacy area PA is effective. Can be protected.
[0109]
Further, the output control unit 35 explicitly notifies the user on the screen that the audio
position corresponding to the designated position designated on the screen by the finger FG of
the user or the stylus pen is included in the privacy area PA. You may For example, the user can
visually or sensibly recognize the position designated by the user as the privacy area by the
notification on the pop-up screen, the predetermined notification sound from the speaker device
37, or the like.
[0110]
The present invention is directed to a directivity control system and an audio output that
suppress deterioration of privacy protection of a person without the content of the voice emitted
by the person being known by another person even if the person in the privacy area emits voice.
It is useful as a control method.
[0111]
DESCRIPTION OF SYMBOLS 10 directivity control system 21 case 26 adder 30 directivity control
apparatus 31 communication part 32 operation part 33 signal processing part 34 speech
determination part 35 output control part 36 display apparatus 37 speaker apparatus 38
memory 39 setting management part 39z memory 73, memory 73, 74 chair 80 sound source
231, 232, ..., 23n amplifier 241, 242, 243, ..., 24n A / D converter 251, 252, 253, ..., 25n delay
device CA camera device FG finger NW network MA, MB microphone array device MA1, MA2, ...,
MAn, MB1, MB2, ..., MBn Microphones p1, p2 People RC Recorder
03-05-2019
33
Документ
Категория
Без категории
Просмотров
0
Размер файла
52 Кб
Теги
jp2017126888
1/--страниц
Пожаловаться на содержимое документа