close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2017208682

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2017208682
Abstract: The present invention provides a target sound collecting device capable of estimating
an unnecessary sound direction and performing directional sound collection with a direction
other than the unnecessary sound direction as a sound collecting direction. A direction
estimation unit estimates a direction of a sound source based on a plurality of acoustic signals
collected from a plurality of microphones, a pronunciation frequency measurement unit
measures a frequency estimated as a direction of a sound source in each direction, and An
unnecessary sound direction estimation unit that estimates one of the directions as an
unnecessary sound direction when the comparison result between the frequency in any of the
directions and a predetermined threshold satisfies a predetermined condition; When a different
direction is estimated as the direction of the sound source, a sound collection direction
determination unit that determines the direction of the sound source that is different from the
unnecessary sound direction as the sound collection direction, and emphasizing the sound in the
determined sound collection direction Includes a directional sound collector that collects sound.
[Selected figure] Figure 3
Target sound collection device, target sound collection method, program, recording medium
[0001]
The present invention relates to a target sound collection device, a target sound collection
method, a program, and a recording medium used for sound collection technology using a
plurality of microphones.
[0002]
11-04-2019
1
FIGS. 1 and 2 schematically show the configuration and operation of the target sound collection
device 9 of Patent Document 1. FIG.
As shown in FIG. 1, the target sound collection device 9 of Patent Document 1 includes a
direction estimation unit 91, a sound collection direction control unit 92, a directivity sound
collection unit 93, and a storage unit 94. Includes a desired direction setting unit 922 and a
sound collection direction determination unit 923. The direction estimation unit 91 estimates the
direction of the sound source based on the plurality of acoustic signals collected from the
plurality of microphones 8-1, ..., 8-N (N is an integer of 2 or more) (S91). The direction estimation
unit 91 estimates the direction of the sound source using the time difference or the amplitude
difference generated between the microphones 8-1, ..., 8-N as a clue. The desired direction setting
unit 922 sets in advance the direction in which the sound collection is desired (desired direction)
or the angle range in which the sound collection is desired (desired angle range) (S922). The
sound collection direction determination unit 923 determines the direction of the sound source
as the sound collection direction when the direction of the sound source estimated in step S91
matches the preset desired direction (or desired angle range) (S923) . The directional sound
collection unit 93 emphasizes the sound in the sound collection direction determined in step S
923 and executes directional sound collection (S 93). Step S93 can be realized by the method
disclosed in Patent Document 2 or the like. The acoustic signal whose directivity is collected in
step S93 may be stored in the storage unit 94 as shown in the figure, or may be output to the
outside of this device.
[0003]
JP, 2005-64968, A JP, 2009-44588, A
[0004]
The method of Patent Document 1 is effective if the direction and angle range to be collected are
determined in advance.
On the other hand, in the method of Patent Document 1, when the direction or angle range to
collect sound is not determined in advance, the omnidirectional direction is taken as the sound
collection direction. In this case, there is a problem that when there is a person or a person who
emits an unnecessary sound in the surroundings, this sound is also collected as a necessary
sound. For example, when the method of Patent Document 1 is applied to a robot that performs
11-04-2019
2
dialogue using voice recognition, or a remote control that operates a device using voice
recognition, voice recognition is performed in response to an unnecessary sound, and the robot
or The remote control may malfunction.
[0005]
Therefore, it is an object of the present invention to provide a target sound collecting device
capable of performing directional sound collection by estimating an unnecessary sound direction
and setting directions other than the unnecessary sound direction as a sound collecting direction.
[0006]
The target sound collection device of the present invention includes a direction estimation unit, a
pronunciation frequency measurement unit, an unnecessary sound direction estimation unit, a
sound collection direction determination unit, and a directivity sound collection unit.
[0007]
The direction estimation unit estimates the direction of the sound source based on the plurality
of acoustic signals collected from the plurality of microphones.
The pronunciation frequency measurement unit measures the frequency estimated as the
direction of the sound source in each direction.
The unnecessary sound direction estimation unit estimates one of the directions as the
unnecessary sound direction, when the comparison result between the frequency in any one of
the directions and the predetermined threshold satisfies a predetermined condition. When the
direction different from the unnecessary sound direction is estimated to be the direction of the
sound source, the sound collection direction determination unit determines the direction of the
sound source, which is a direction different from the unnecessary sound direction, as the sound
collection direction. The directional sound collector emphasizes and determines the sound in the
determined sound collection direction.
[0008]
According to the target sound collection device of the present invention, it is possible to estimate
11-04-2019
3
the unnecessary sound direction and execute directional sound collection with a direction other
than the unnecessary sound direction as the sound collection direction.
[0009]
FIG. 1 is a block diagram showing a configuration of a target sound collection device according to
the prior art.
The flowchart which shows operation | movement of the target sound collection apparatus of a
prior art. FIG. 1 is a block diagram showing a configuration of a target sound collection device of
a first embodiment. 6 is a flowchart showing the operation of the target sound collection device
of the first embodiment. FIG. 7 is a block diagram showing a configuration of a target sound
collection device of a second embodiment. 6 is a flowchart showing a synthetic speech
generation and reproduction operation of the target sound collection device of the second
embodiment. 6 is a flowchart showing a sound collection direction control operation of the target
sound collection device of the second embodiment. FIG. 7 is a block diagram showing a
configuration of a target sound collection device of a modification 1; 10 is a flowchart showing
the speech detection operation of the target sound collection device of the modification 1; 12 is a
flowchart showing a sound collection direction control operation of the target sound collection
device of the modification 1; FIG. 7 is a block diagram showing a configuration of a target sound
collection device of a third embodiment. 10 is a flowchart showing the operation of the target
sound collection device of the third embodiment. FIG. 10 is a block diagram showing a
configuration of a target sound collection device of a modification 2; 10 is a flowchart showing
the operation of the target sound collection device of the modification 2;
[0010]
Hereinafter, embodiments of the present invention will be described in detail. Note that
components having the same function will be assigned the same reference numerals and
redundant description will be omitted.
[0011]
Hereinafter, the configuration and operation of the target sound collector according to the first
11-04-2019
4
embodiment will be described with reference to FIGS. 3 and 4. As shown in FIG. 3, the target
sound collection device 1 of this embodiment includes a direction estimation unit 91, a sound
collection direction control unit 12, a directivity sound collection unit 93, and a storage unit 94,
and the sound collection direction The control unit 12 includes a sound generation frequency
measurement unit 121, an unnecessary sound direction estimation unit 122, and a sound
collection direction determination unit 123. The direction estimation unit 91, the directional
sound collection unit 93, and the storage unit 94 have the same functions as the components
having the same names and the same numbers as those of the target sound collection device 9 of
Patent Document 1, and thus the description thereof will be omitted.
[0012]
The pronunciation frequency measurement unit 121 measures the frequency estimated as the
direction of the sound source in each direction (S121). That is, the sound generation frequency
measurement unit 121 measures from which direction and how often the sound has been
generated in a fixed time. It can be known from the output of the direction estimation unit 91
whether or not there is a pronunciation. For example, assuming that the total of time in which the
direction estimated by the direction estimation unit 91 during the past T seconds is θ is A (θ)
seconds, the pronunciation frequency measurement unit 121 compares the ratio of the
pronunciation frequency in the θ direction. It can be determined as D (θ) = A (θ) / T. The
pronunciation frequency measurement unit 121 obtains all the frequencies in each direction. For
example, if it is assumed that the noise source is a television or a speaker for listening to music,
these continue to be heard from the same direction to microphones 8-1,. It will be. When such a
noise source is in the θ direction, the tone generation frequency D (θ) takes a large value close
to one.
[0013]
The unnecessary sound direction estimation unit 122 estimates one of the directions as the
unnecessary sound direction, when the comparison result between the frequency in any of the
directions and the predetermined threshold satisfies a predetermined condition (S122). ). For
example, the unnecessary sound direction estimation unit 122 sets the direction as the
unnecessary sound direction when the above-mentioned sounding frequency D (θ) exceeds a
preset threshold E (0 ≦ E ≦ 1). The unnecessary sound direction estimation unit 122 performs
the same estimation for all directions, and estimates one direction or a plurality of directions as
the unnecessary sound direction. The unnecessary sound direction estimation unit 122 sets all
directions in the predetermined angle range (for example, all directions in the range of .theta.N +
11-04-2019
5
.DELTA..theta. To .theta.N-.DELTA..theta.) As the unnecessary sound direction, based on the
direction .theta.N set as the unnecessary sound direction. It is also good. Δθ is a preset width of
the unnecessary sound direction, which is set based on the accuracy of the direction estimation.
For example, when the direction estimation accuracy is 10 degrees, by setting a value of 10
degrees or more, it is not necessary to determine an unnecessary sound as a necessary sound by
an estimation error of the direction.
[0014]
When the direction different from the unnecessary sound direction is estimated as the direction
of the sound source, the sound collection direction determination unit 123 determines the
direction of the sound source, which is a direction different from the unnecessary sound
direction, as the sound collection direction (S123).
[0015]
According to the target sound collection device 1 of the present embodiment, the sound source
generating sound continuously is estimated as an unnecessary sound, and the other sound source
is treated as the target sound, so that a speaker for television or music listening etc. Even when
there is a noise source, the target sound can be properly determined and the target sound can be
emphasized and collected.
[0016]
The configuration and operation of the target sound collection device of the second embodiment
will be described below with reference to FIGS. 5, 6, and 7.
As shown in FIG. 5, the target sound collection device 2 of this embodiment includes a direction
estimation unit 91, a sound collection direction control unit 22, a directivity sound collection unit
93, a storage unit 94, and an utterance control unit 24. The sound collection direction control
unit 22 includes a sound generation frequency measurement unit 221, an unnecessary sound
direction estimation unit 122, and a sound collection direction determination unit 123.
Components other than the sound generation frequency measurement unit 221, the speech
control unit 24, and the voice synthesis unit 25 have the same functions as the components
having the same name and the same number of the target sound collection device 1 of the first
embodiment. Do. Assuming that the target sound pickup device 2 of the present embodiment is a
11-04-2019
6
device that interacts with the user, the target sound pickup device 2 includes a speech control
unit 24 and a speech synthesis unit 25 to have speech control and speech synthesis functions.
There is. The speech control unit 24 controls speech (S24). The speech synthesis unit 25
generates and reproduces synthetic speech (S25). The speech frequency measurement unit 221
of the present embodiment measures the frequency only during the time during which the
synthetic speech is being reproduced (S221).
[0017]
During synthetic speech reproduction, it is assumed that the user who is the other party of the
dialogue is likely to give a talk, and that the possibility of the user uttering frequently is low.
Therefore, if the frequency of pronunciation during synthetic speech reproduction is high, the
sound source is likely not a user but a noise source. Based on this assumption, the target sound
collection device 2 of the present embodiment achieves the same effect as that of the first
embodiment even in a device that interacts with the user.
[0018]
First Modification Referring to FIGS. 8, 9, and 10, the speech control unit 24 and the speech
synthesis unit 25 are replaced with the speech detection unit 26a, and the pronunciation
frequency measurement unit 221 is replaced with the pronunciation frequency measurement
unit 221a. The configuration and operation of the target sound collection device 2a of the first
alternative embodiment to the above will be described.
[0019]
The speech detection unit 26a detects a speech from the speaker reproduction sound signal
(S26a).
The utterance detection unit 26a compares the level of the speaker reproduction sound signal
with a preset threshold, and detects an utterance as being present when the level of the speaker
reproduction sound signal exceeds the threshold. The speech frequency measurement unit 221a
measures the frequency only during the time when the speech is detected (S221a).
[0020]
11-04-2019
7
The configuration and operation of the target sound collection device of the third embodiment
will be described below with reference to FIGS. 11 and 12. As shown in FIG. 11, the target sound
collection device 3 of this embodiment includes a direction estimation unit 91, a sound collection
direction control unit 32, a directivity sound collection unit 93, a storage unit 94, and an
utterance control unit 24. The sound collection direction control unit 32 includes a sound
generation frequency measurement unit 221, an unnecessary sound direction estimation unit
122, a sound collection direction determination unit 323, and a sound generation timing
measurement unit 324. Components other than the sound collection direction determination unit
323 and the sound generation timing measurement unit 324 have the same functions as the
components with the same names and the same numbers of the target sound collection device 2
of the second embodiment, and thus the description will be omitted.
[0021]
The sound generation timing measurement unit 324 measures the time from immediately after
the reproduction of the synthesized speech is paused to when the direction of the sound source
is first estimated, when the reproduction of the synthesized speech is paused (S324).
[0022]
The sound collection direction determination unit 323 determines the sound source estimated
first when the direction of the sound source estimated first is a direction different from the
unnecessary sound direction and the time measured in step S324 satisfies the predetermined
condition. Direction is determined as the sound collection direction (S323).
Specifically, in addition to the condition that the direction of the sound source estimated by the
direction estimation unit 91 is not estimated as the unnecessary sound direction in the
unnecessary sound direction estimation unit 122, the sound collection direction determination
unit 323 When the measured time falls between the preset minimum value and the maximum
value, the sound collecting direction is determined as the estimated direction (S323). However,
the minimum value and the maximum value may be negative or positive.
[0023]
11-04-2019
8
It is assumed that it is natural for the user who is interacting with the present apparatus to speak
immediately after the completion of the reproduction of the dialog speech (synthetic speech).
Based on this assumption, the target sound pickup device 3 of this embodiment sounds within a
short time immediately after the reproduction of the dialogue speech (synthetic speech) is
paused when the reproduction of the dialogue speech (synthesized speech) is paused. It can be
determined whether the sound source is the user or not by paying attention to whether or not.
[0024]
<Modification 2> Hereinafter, referring to FIGS. 13 and 14, speech control unit 24 and speech
synthesis unit 25 are substituted for speech detection unit 26a, and pronunciation frequency
measurement unit 221 is substituted for pronunciation frequency measurement unit 221a. The
configuration and operation of the target sound collection device 3a of the modification 2 in
which the sound generation timing measurement unit 324 is replaced with the sound generation
timing measurement unit 324a will be described. If the utterance is not detected, the
pronunciation timing measurement unit 324a measures the time from when the utterance is not
detected until when the direction of the sound source is first estimated (S324a). In another
expression, the sound generation timing measurement unit 324a measures the time from when
the output of the speech detection unit 26a shifts to the absence of speech to the time when the
speech direction is first estimated until the direction of the sound source is estimated (S324a).
[0025]
<Supplement> The device of the present invention is, for example, an input unit to which a
keyboard can be connected, an output unit to which a liquid crystal display etc can be connected
as a single hardware entity, and a communication device capable of communicating outside the
hardware entity Communication unit to which the communication cable can be connected, CPU
(Central Processing Unit, may be provided with a cache memory, a register, etc.), RAM or ROM
which is a memory, external storage device which is a hard disk, and input / output units thereof
, A communication unit, a CPU, a RAM, a ROM, and a bus connected so as to enable exchange of
data between external storage devices. If necessary, the hardware entity may be provided with a
device (drive) capable of reading and writing a recording medium such as a CD-ROM. Examples
of physical entities provided with such hardware resources include general purpose computers.
[0026]
11-04-2019
9
The external storage device of the hardware entity stores a program necessary for realizing the
above-mentioned function, data required for processing the program, and the like (not limited to
the external storage device, for example, the program is read) It may be stored in the ROM which
is a dedicated storage device). In addition, data and the like obtained by the processing of these
programs are appropriately stored in a RAM, an external storage device, and the like.
[0027]
In the hardware entity, each program stored in the external storage device (or ROM etc.) and data
necessary for processing of each program are read into the memory as necessary, and
interpreted and processed appropriately by the CPU . As a result, the CPU realizes predetermined
functions (each component requirement expressed as the above-mentioned,...
[0028]
The present invention is not limited to the above-described embodiment, and various
modifications can be made without departing from the spirit of the present invention. Further,
the processing described in the above embodiment may be performed not only in chronological
order according to the order of description but also may be performed in parallel or individually
depending on the processing capability of the device that executes the processing or the
necessity. .
[0029]
As described above, when the processing function in the hardware entity (the apparatus of the
present invention) described in the above embodiment is implemented by a computer, the
processing content of the function that the hardware entity should have is described by a
program. Then, by executing this program on a computer, the processing function of the
hardware entity is realized on the computer.
[0030]
11-04-2019
10
The program describing the processing content can be recorded in a computer readable
recording medium. As the computer readable recording medium, any medium such as a magnetic
recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory,
etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a
flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R
(Recordable) / RW (Rewritable), etc. as magneto-optical recording medium, MO (Magneto-Optical
disc) etc., as semiconductor memory EEP-ROM (Electronically Erasable and Programmable Only
Read Memory) etc. Can be used.
[0031]
Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable
recording medium such as a DVD, a CD-ROM or the like in which the program is recorded.
Furthermore, this program may be stored in a storage device of a server computer, and the
program may be distributed by transferring the program from the server computer to another
computer via a network.
[0032]
For example, a computer that executes such a program first temporarily stores a program
recorded on a portable recording medium or a program transferred from a server computer in its
own storage device. Then, at the time of execution of the process, the computer reads the
program stored in its own recording medium and executes the process according to the read
program. Further, as another execution form of this program, the computer may read the
program directly from the portable recording medium and execute processing according to the
program, and further, the program is transferred from the server computer to this computer
Each time, processing according to the received program may be executed sequentially. In
addition, a configuration in which the above-described processing is executed by a so-called ASP
(Application Service Provider) type service that realizes processing functions only by executing
instructions and acquiring results from the server computer without transferring the program to
the computer It may be Note that the program in the present embodiment includes information
provided for processing by a computer that conforms to the program (such as data that is not a
direct command to the computer but has a property that defines the processing of the computer).
11-04-2019
11
[0033]
Further, in this embodiment, the hardware entity is configured by executing a predetermined
program on a computer, but at least a part of the processing content may be realized as
hardware.
11-04-2019
12
Документ
Категория
Без категории
Просмотров
0
Размер файла
22 Кб
Теги
jp2017208682, description
1/--страниц
Пожаловаться на содержимое документа