close

Вход

Забыли?

вход по аккаунту

?

JP2016226024

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2016226024
Abstract: A voice acquisition unit acquires speech and identifies a speaker based on the acquired
sound pressure ratio of speech. A terminal device is based on a sound pressure ratio of a first
microphone and a second microphone, a sound pressure of a sound acquired by the first
microphone, and a sound pressure of a sound acquired by the second microphone. , Voice
analysis to identify whether the voice acquired by the first microphone 11 and the second
microphone 12 is the voice of the user wearing the first microphone 11 and the second
microphone 12 or the voice of another person other than the user And a unit. The first
microphone 11 and the second microphone 12 have a positional relationship in which the
distance between the second microphone 12 and the user's mouth is equal to or less than the
distance between the second microphone 12 and the first microphone 11. [Selected figure]
Figure 3
Voice analysis device and voice analysis system
[0001]
The present invention relates to a voice analysis device and a voice analysis system.
[0002]
Patent Document 1 discloses the following prior art.
This prior art is provided with a first and a second non-directional microphone, and voice / non-
03-05-2019
1
voice detection means for detecting voice and non-voice based on output signals from these nondirectional microphones. The noise level is estimated from the output of the second microphone
when not voiced, and the output is amplified according to the noise level. Then, the output of the
variable signal amplification means is subtracted from the output of the first nondirectional
microphone. Thus, when the noise is high, the directivity is changed, and when the noise is low,
the directivity and directivity are changed.
[0003]
Further, Patent Document 2 discloses the following prior art. This prior art configures a headset
to produce acoustically distinct audio signals in a noisy acoustic environment. The headset places
a pair of microphones with a predetermined gap near the mouth of the user. The microphones
each receive the user's voice and also receive the noise of the acoustic environment. Microphone
signals having both noise and information components are received in the separation process.
The separation process produces a speech signal with substantially reduced noise components. It
then processes the audio signal for transmission.
[0004]
JP, 8-191496, A Special table 2008-507926
[0005]
An object of the present invention is to provide a voice analysis apparatus and system for
obtaining a voice by voice acquisition means and identifying a speaker based on the non-verbal
information of the obtained voice.
[0006]
The invention according to claim 1 is a first voice acquiring means provided to a strap worn by a
user, and a second voice acquiring means provided to the strap, the strap being worn on the neck
When hung, the distance of the sound wave propagation path from the mouth of the user to the
second voice acquisition means is smaller than the distance of the sound wave propagation path
from the mouth of the user to the first voice acquisition means, and A second voice acquisition
means provided at a position equal to or less than a distance between the second voice
acquisition means and the first voice acquisition means, the user's mouth, the first voice
acquisition means, and the second voice acquisition means The sound pressure ratio of the sound
03-05-2019
2
pressure of the sound acquired by the first sound acquisition unit to the sound pressure of the
sound acquired by the second sound acquisition unit using the result obtained from the distance
relationship with the sound acquisition unit From the voice generated by the user or another
sound source other than the user And wherein the identifying of the voice, a voice analyzer.
In the invention according to claim 2, the strap has a tubular structure, and the first voice
acquiring unit and the second voice acquiring unit provided in the strap have the inside of the
strap. It is a speech analysis device according to claim 1, characterized in that it is provided.
The invention according to claim 3 is characterized in that the first sound acquisition means and
the second sound acquisition means are a distance (La1) of a sound wave propagation path from
the mouth of the user to the first sound acquisition means and the user The distance (La2) of the
sound wave propagation path from the mouth of the finger to the second sound acquisition
means is provided in a distance relation of La1 ≒ 1.5 × La2 to 4 × La2. Or it is a speech
analysis device according to claim 2. According to the fourth aspect of the present invention, the
sound pressure (G1) of the sound acquired by the first sound acquisition unit and the sound
pressure (G2) of the sound acquired by the second sound acquisition unit are G2 / G1. The voice
analysis device according to any one of claims 1 to 3, wherein in the case of <2, the acquired
voice is distinguished from the voice from the other sound source. The invention according to
claim 5 is the first voice acquisition means and the second voice acquisition means arranged so
that the distance of the sound wave propagation path from the mouth of the user is different
from each other, and the first voice acquisition means Based on the sound pressure ratio of the
sound pressure of the sound acquired by the sound pressure to the sound pressure of the sound
acquired by the second sound acquisition unit, the sound acquired by the first sound acquisition
unit and the second sound acquisition unit is A first voice acquisition means, an identification
unit for identifying whether the voice of a user of the second voice acquisition means or a voice
from another sound source other than the user, the second voice acquisition means It is an audio
| voice analyzer characterized by being provided in the position from which the distance between
a user's mouth and the distance between the said 1st audio | voice acquisition means becomes
equal or less. In the invention according to claim 6, the identification unit performs a filtering
process on the audio signal of the audio acquired by the first audio acquisition unit and the audio
signal of the audio acquired by the second audio acquisition unit. The speech analysis apparatus
according to claim 5, wherein the comparison of the sound pressure is performed after removing
noise from the acquired speech. In the invention according to claim 7, the first voice acquisition
means and the second voice acquisition means are a distance (La1) of a sound wave propagation
path from the mouth of the user to the first voice acquisition means and the user The distance
(La2) of the sound wave propagation path from the mouth to the second sound acquisition
means is provided in a distance relation of La1La1.5 × La2 to 4 × La2. Or it is a speech analysis
device according to claim 6. The invention according to claim 8 is characterized in that the sound
03-05-2019
3
pressure (G1) of the sound acquired by the first sound acquisition unit and the sound pressure
(G2) of the sound acquired by the second sound acquisition unit are G2 / G1. The voice analysis
device according to any one of claims 5 to 7, characterized in that in the case of <2, the acquired
voice is distinguished from the voice from the other sound source.
The invention according to claim 9 comprises a terminal device worn by a user, and a host device
for acquiring information from the terminal device, wherein the terminal device comprises a first
voice acquisition means, and a mouth of the user A second distance between the first sound
acquisition means and the second sound acquisition path is different from that of the first sound
acquisition means, and the second distance is less than the distance between the first speech
acquisition means and the second sound acquisition means; With regard to the sound pressure
ratio of the sound acquired by the sound acquisition means, the first sound acquisition means
and the second sound acquisition means, the distance between the mouth of the user and the
first sound acquisition means, the user's A distance between the mouth and the second sound
acquisition means, a distance between another sound source other than the user and the first
sound acquisition means, and a distance between the other sound source and the second sound
acquisition means Using the result obtained from the distance relationship of the distance, the
first voice acquisition hand concerned And an identification unit that identifies whether the voice
acquired by the second voice acquisition unit is the voice of the user wearing the terminal device
or the voice from the other sound source, the first including the identification result by the
identification unit. A voice acquisition unit; and a transmitter configured to transmit, to the host
device, speech information that is information related to a voice signal acquired by the second
voice acquisition unit, the host device including the plurality of terminal devices transmitted from
the plurality of terminal devices A receiving unit for receiving utterance information, an
accumulation unit for accumulating the utterance information received by the reception unit for
each of the terminal devices that transmitted the utterance information, and analyzing the
utterance information accumulated in the accumulation unit An audio analysis system
comprising: an analysis unit; and an output unit that outputs an analysis result by the analysis
unit. In the invention according to claim 10, the analysis unit compares, as one of the analysis,
the utterance information acquired from a plurality of the terminal devices, and based on time
information on the utterance included in the utterance information. 10. The speech analysis
system according to claim 9, wherein the speech information of the user participating in a
specific conversation is identified. According to the invention as set forth in claim 11, the
analyzing unit is configured to determine the degree of dialogue based on the number of times of
change of the speaker during the conversation and the dispersion of time until the speaker
changes. 11. The speech analysis system according to claim 9, wherein a feature is extracted. In
the invention according to claim 12, the analysis unit extracts the feature of the conversation by
the listening degree represented by the ratio of the own utterance time for each conversation
participant and the utterance time of the other person in the utterance information. The speech
analysis system according to any one of claims 9 to 11, characterized in that:
03-05-2019
4
The invention according to claim 13 is characterized in that the analysis unit extracts the feature
of the conversation based on the conversation activity represented by the ratio of the time when
no one of the conversation participants is speaking to the entire utterance information. The
speech analysis system according to any one of claims 9 to 12, which is characterized by the
following.
[0007]
According to the invention of claim 1, it is possible to identify a speaker by using a sound
pressure ratio as non-verbal information of the recorded voice. According to the second aspect of
the invention, it is expected that the user can wear without being aware of the presence of the
voice acquisition means, as compared with the case where the voice acquisition means is outside
the strap. According to the third aspect of the present invention, the sound pressure ratio of the
voices recorded by the first voice acquisition means and the second voice acquisition means is
increased, so that the identification system of the utterer is improved. According to the invention
of claim 4, in the configuration in which the first voice acquisition means and the second voice
acquisition means are provided on the strap, it is possible to accurately identify the speaker
based on the sound pressure ratio of the recorded voice. According to the invention of claim 5,
the identification of the speaker using the sound pressure ratio as the non-verbal information of
the recorded speech is realized. According to the invention of claim 6, by removing the noise
component from the recorded voice, the discrimination performance of the other person's speech
is improved. According to the seventh aspect of the invention, the sound pressure ratio of the
voices recorded by the first voice acquisition means and the second voice acquisition means is
increased, so that the identification system of the utterer is improved. According to the invention
of claim 8, in the configuration having a positional relationship in which the distance between
the second voice acquisition means and the mouth of the user is equal to or less than the
distance between the second voice acquisition means and the first voice acquisition means It is
possible to accurately identify the speaker based on the sound pressure ratio of the recorded
voice. According to the invention of claim 9, it is possible to analyze the speech situation in a
plurality of speakers. According to the invention of claim 10, it is possible to extract, from among
pieces of utterance information of a plurality of speakers, utterance information concerning a
conversation by a specific speaker. According to the invention of claim 11, an analysis based on
the degree of interaction can be performed with respect to the utterance situations of a plurality
of speakers. According to the invention of claim 12, it is possible to carry out analysis based on
the degree of listening with regard to the state of speech of a plurality of speakers. According to
the invention of claim 13, it is possible to carry out an analysis based on the degree of
conversational activity regarding the situation of speech among a plurality of speakers.
03-05-2019
5
[0008]
It is a figure showing the example of composition of the speech analysis system by this
embodiment. It is a figure which shows the structural example of the terminal device in this
embodiment. It is a figure which shows the relationship of a wearer's and the other person's
mouth (speech part), and a position with a microphone. It is a figure which shows the
relationship between the distance of the sound wave propagation path between a microphone
and a sound source, and sound pressure (input sound volume). It is a figure which shows the
identification method of a user's own speech voice and the other person's speech. It is a
flowchart which shows operation | movement of the terminal device in this embodiment. It is a
figure which shows the condition where the several wearer who each mounted | worn the
terminal device of this embodiment is having a conversation. It is a figure which shows the
example of the speech information of each terminal device in the conversation condition of FIG. It
is a figure showing an example of functional composition of a host device in this embodiment.
[0009]
Hereinafter, embodiments of the present invention will be described in detail with reference to
the accompanying drawings. <System Configuration Example> FIG. 1 is a view showing a
configuration example of a speech analysis system according to the present embodiment. As
shown in FIG. 1, the system of the present embodiment is configured to include a terminal device
10 and a host device 20. The terminal device 10 and the host device 20 are connected via a
wireless communication line. As a type of wireless communication line, a line according to an
existing system such as Wi-Fi (trademark) (Wireless Fidelity), Bluetooth (trademark), ZigBee
(trademark), UWB (Ultra Wideband) may be used. Further, in the illustrated example, only one
terminal device 10 is described, but as will be described in detail later, the terminal device 10 is
worn and used by each user, and is actually used The terminal devices 10 for the number of
persons are prepared. Hereinafter, the user wearing the terminal device 10 is referred to as a
wearer.
[0010]
The terminal device 10 includes at least one set of microphones (first and second microphones
11 and 12) and amplifiers (first and second amplifiers 13 and 14) as sound acquisition means. In
03-05-2019
6
addition, the terminal device 10 includes, as processing means, a voice analysis unit 15 that
analyzes recorded voices, a data transmission unit 16 that transmits an analysis result to the host
device 20, and further includes a power supply unit 17.
[0011]
The first microphone 11 and the second microphone 12 are arranged at different positions in the
distance of the sound wave propagation path from the wearer's mouth (speaking part)
(hereinafter simply referred to as “distance”). Here, the first microphone 11 is disposed at a
position (e.g., about 35 cm) far from the mouth (speaking part) of the wearer, and the second
microphone 12 is a position (e.g., about 10 cm) near the mouth (speaking part) Shall be placed in
As types of microphones used as the first microphone 11 and the second microphone 12 of the
present embodiment, various existing ones such as a dynamic type and a capacitor type may be
used. In particular, it is preferable to use a nondirectional MEMS (Micro Electro Mechanical
Systems) microphone.
[0012]
The first amplifier 13 and the second amplifier 14 amplify the electrical signal (audio signal)
output by the first microphone 11 and the second microphone 12 in accordance with the
acquired voice. As the amplifiers used as the first amplifier 13 and the second amplifier 14 of the
present embodiment, an existing operational amplifier or the like may be used.
[0013]
The voice analysis unit 15 analyzes voice signals output from the first amplifier 13 and the
second amplifier 14. Then, it is determined whether the voice acquired by the first microphone
11 and the second microphone 12 is a voice uttered by the wearer wearing the terminal device
10 or a voice uttered by another person. That is, the voice analysis unit 15 functions as an
identification unit that identifies a speaker of voice based on the voice acquired by the first
microphone 11 and the second microphone 12. The contents of specific processing for speaker
identification will be described later.
[0014]
03-05-2019
7
The data transmission unit 16 transmits the acquired data including the analysis result by the
voice analysis unit 15 and the ID of the terminal device 10 to the host device 20 via the abovedescribed wireless communication line. As information to be transmitted to the host device 20,
according to the contents of processing performed in the host device 20, in addition to the above
analysis result, for example, acquisition time of voice by the first microphone 11 and second
microphone 12 and acquired voice Information such as sound pressure may be included. The
terminal device 10 may be provided with a data storage unit for storing an analysis result by the
voice analysis unit 15 and batch transmission of storage data for a certain period may be
performed. It may be transmitted by a wired line.
[0015]
The power supply unit 17 supplies power to the first microphone 11, the second microphone 12,
the first amplifier 13, the second amplifier 14, the voice analysis unit 15, and the data
transmission unit 16 described above. As a power supply, for example, an existing power supply
such as a dry battery or a rechargeable battery is used. Further, the power supply unit 17
includes known circuits such as a voltage conversion circuit and a charge control circuit, as
necessary.
[0016]
The host device 20 outputs a data receiving unit 21 that receives data transmitted from the
terminal device 10, a data storage unit 22 that stores the received data, a data analysis unit 23
that analyzes the stored data, and an analysis result. And an output unit 24. The host device 20 is
realized by, for example, an information processing device such as a personal computer. Further,
as described above, in the present embodiment, a plurality of terminal devices 10 are used, and
the host device 20 receives data from each of the plurality of terminal devices 10.
[0017]
The data receiving unit 21 corresponds to the above-described wireless communication line,
receives data from each of the terminal devices 10, and sends the data to the data storage unit
22. The data storage unit 22 is realized by, for example, a storage device such as a magnetic disk
03-05-2019
8
device of a personal computer, and stores the reception data acquired from the data reception
unit 21 for each speaker. Here, the identification of the speaker is performed by collating the
terminal ID transmitted from the terminal device 10 with the speaker name and the terminal ID
registered in the host device 20 in advance. Further, instead of the terminal ID, the wearer's
name may be transmitted from the terminal device 10.
[0018]
The data analysis unit 23 is realized by, for example, a program-controlled CPU of a personal
computer, and analyzes data stored in the data storage unit 22. The specific analysis content and
analysis method can take various contents and methods according to the usage purpose and
usage mode of the system of the present embodiment. For example, analyzing the frequency of
interaction between the wearers of the terminal device 10 and the tendency of the other party of
the interaction with each wearer, or analogizing the relationship of the interlocutors from the
information of the length and sound pressure of each utterance in the dialogue To be done.
[0019]
The output unit 24 outputs an analysis result by the data analysis unit 23 or performs an output
based on the analysis result. The output means can take various means such as display display,
print output by a printer, audio output, etc., depending on the purpose of use and the use mode
of the system, the contents and format of the analysis result, and the like.
[0020]
<Configuration Example of Terminal Device> FIG. 2 is a diagram showing a configuration
example of the terminal device 10. As shown in FIG. As described above, the terminal device 10 is
worn and used by each user. In order to make the user attachable, as shown in FIG. 2, the
terminal device 10 of the present embodiment is configured to include an apparatus main body
30 and a strap 40 connected to the apparatus main body 30. In the illustrated configuration, the
user wears the strap 40 and wears the device body 30 from the neck.
[0021]
03-05-2019
9
The device main body 30 is a circuit and a power source that realizes at least the first amplifier
13, the second amplifier 14, the voice analysis unit 15, the data transmission unit 16, and the
power supply unit 17 in a thin rectangular parallelepiped case 31 formed of metal or resin. The
power supply (battery) of the part 17 is accommodated and comprised. The case 31 may be
provided with a pocket into which an ID card or the like displaying ID information such as the
name or affiliation of the wearer is inserted. In addition, such ID information or the like may be
printed on the surface of the case 31 itself, or a seal in which the ID information or the like is
described may be attached.
[0022]
The strap 40 is provided with a first microphone 11 and a second microphone 12 (hereinafter
referred to as the microphones 11 and 12 when the first microphone 11 and the second
microphone 12 are not distinguished from each other). The microphones 11 and 12 are
connected to the first amplifier 13 and the second amplifier 14 housed in the device body 30 by
a cable (electric wire or the like) passing through the inside of the strap 40. As materials of the
strap 40, various existing materials such as leather, synthetic leather, cotton and other natural
fibers, synthetic fibers such as resin, metals, etc. may be used. Moreover, the coating process
using a silicone resin, a fluorine resin, etc. may be given.
[0023]
The strap 40 has a tubular structure, and the microphones 11 and 12 are housed inside the strap
40. By providing the microphones 11 and 12 inside the strap 40, it is possible to prevent the
microphones 11 and 12 from being damaged or soiled, and to prevent the communicator from
becoming aware of the presence of the microphones 11 and 12. The first microphone 11
disposed at a position far from the wearer's mouth (speaking part) may be incorporated in the
case 31 and provided in the apparatus main body 30. In the present embodiment, a case where
the first microphone 11 is provided to the strap 40 will be described as an example.
[0024]
Referring to FIG. 2, the first microphone 11 is provided at an end (for example, a position within
10 cm from the connection site) of the strap 40 connected to the device body 30. As a result, in a
03-05-2019
10
state where the wearer puts the strap 40 on the neck and lowers the device body 30, the first
microphone 11 is disposed at a position approximately 30 cm to 40 cm away from the mouth
(speaking part) of the wearer . Also when the first microphone 11 is provided in the device body
30, the distance from the wearer's mouth (speaking part) to the first microphone 11 is
approximately the same.
[0025]
The second microphone 12 is provided at a position distant from the end of the strap 40
connected to the device body 30 (for example, a position of about 20 cm to 30 cm from the
connection site). Thus, with the wearer hanging the strap 40 around the neck and lowering the
device body 30, the second microphone 12 is located at the neck of the wearer (for example, at a
position that hits the clavicle), and the mouth of the wearer (speech The site is placed at a
distance of about 10 cm to 20 cm from the site.
[0026]
In addition, the terminal device 10 of this embodiment is not limited to the structure shown in
FIG. For example, in the microphones 11 and 12, the distance (of the sound wave propagation
path) from the first microphone 11 to the wearer's mouth (speech site) is (sound wave
propagation path) from the second microphone 12 to the wearer's mouth (speech site) The
positional relationship between the first microphone 11 and the second microphone 12 may be
specified so as to be approximately several times the distance of. Therefore, the first microphone
11 may be provided on the strap 40 behind the neck. Further, the microphones 11 and 12 may
be attached to the wearer by various methods without being limited to the configuration
provided on the strap 40 as described above. For example, each of the first microphone 11 and
the second microphone 12 may be individually fixed to clothes using a pin or the like. In
addition, a dedicated attachment designed so that the positional relationship between the first
microphone 11 and the second microphone 12 is fixed at a desired position may be prepared
and attached.
[0027]
Further, as shown in FIG. 2, the device body 30 is not limited to a configuration that can be
connected to the strap 40 and carried from the neck of the wearer, as long as the device body 30
03-05-2019
11
can be easily carried. For example, it may be configured to be attached to clothes or a body by a
clip or a belt instead of the strap as in the present embodiment, or may be configured to be
simply carried in a pocket or the like. In addition, a function of receiving, amplifying, and
analyzing audio signals from the microphones 11 and 12 may be realized by a mobile phone or
other existing portable electronic information terminals. However, when the first microphone 11
is provided in the device main body 30, the positional relationship between the first microphone
11 and the second microphone 12 needs to be maintained as described above, so the position of
the device main body 30 at the time of carrying is It is identified.
[0028]
Furthermore, the microphones 11 and 12 and the apparatus main body 30 (or the voice analysis
unit 15) may be connected by wireless communication rather than by wired connection.
Although the first amplifier 13, the second amplifier 14, the voice analysis unit 15, the data
transmission unit 16, and the power supply unit 17 are housed in a single case 31 in the above
configuration example, they are divided into a plurality of individual You may configure. For
example, the power supply unit 17 may not be housed in the case 31 and may be connected to
an external power supply and used.
[0029]
<Identification of Speaker (Self (Others)) Based on Non-verbal Information of Recorded Voice>
Next, a method of identifying a speaker in the present embodiment will be described. The system
according to the present embodiment uses the information of voices recorded by the two
microphones 11 and 12 provided in the terminal device 10 to discriminate between the voice of
the wearer of the terminal device 10 and the voice of another person. Do. In other words, this
embodiment identifies oneself and the other with respect to the recorded voice speaker. Further,
in the present embodiment, the utterance is based on non-linguistic information such as sound
pressure (input sound volume to the microphones 11 and 12) instead of linguistic information
obtained by using morphological analysis or dictionary information among information of
recorded voices. Identify the In other words, the speaker of the voice is identified from the
speech situation specified by the non-language information, not the speech content specified by
the language information.
[0030]
03-05-2019
12
As described with reference to FIGS. 1 and 2, in the present embodiment, the first microphone 11
of the terminal device 10 is disposed at a position far from the mouth (speaking portion) of the
wearer, and the second microphone 12 is the wearer. Placed at a position close to the mouth
(speaking part) of That is, when the wearer's mouth (speaking part) is used as a sound source,
the distance between the first microphone 11 and the sound source and the distance between the
second microphone 12 and the sound source are largely different. Specifically, the distance
between the first microphone 11 and the sound source is about 1.5 to 4 times the distance
between the second microphone 12 and the sound source. Here, the sound pressure of the
recorded voice at the microphones 11 and 12 attenuates (distance attenuation) as the distance
between the microphones 11 and 12 and the sound source increases. Therefore, regarding the
speech sound of the wearer, the sound pressure of the recorded sound in the first microphone 11
and the sound pressure of the recorded sound in the second microphone 12 are largely different.
[0031]
On the other hand, considering the case where the mouth (speaking part) of a person (other)
other than the wearer is the sound source, since the other person is apart from the wearer, the
distance between the first microphone 11 and the sound source The distance between the second
microphone 12 and the sound source does not change significantly. Depending on the position of
the other person with respect to the wearer, a difference between the two may occur, but the
distance between the first microphone 11 and the sound source is second as in the case where
the wearer's mouth (speaking part) is used as the sound source. It will not be several times the
distance between the microphone 12 and the sound source. Therefore, the sound pressure of the
recorded sound in the first microphone 11 and the sound pressure of the recorded sound in the
second microphone 12 do not greatly differ from each other as in the case of the speaker's
uttered voice with respect to the uttered voice of the other person.
[0032]
FIG. 3 is a diagram showing the positional relationship between the mouths of the wearer and
others (speaking parts) and the microphones 11 and 12. In the relationship shown in FIG. 3, the
distance between the sound source a, which is the wearer's mouth (speaking part), and the first
microphone 11 is La1, and the distance between the sound source a and the second microphone
12 is La2. Further, the distance between the sound source b which is the other person's mouth
(speaking part) and the first microphone 11 is Lb1, and the distance between the sound source b
03-05-2019
13
and the second microphone 12 is Lb2. In this case, the following relationship holds.
La1>La2(La1≒1.5×La2∼4×La2)Lb1≒Lb2
[0033]
FIG. 4 is a view showing the relationship between the distance between the microphones 11 and
12 and the sound source and the sound pressure (input volume). As described above, the sound
pressure attenuates in accordance with the distance between the microphones 11 and 12 and the
sound source. In FIG. 4, when the sound pressure Ga1 at the distance La1 and the sound pressure
Ga2 at the distance La2 are compared, the sound pressure Ga2 is about four times the sound
pressure Ga1. On the other hand, since the distance Lb1 and the distance Lb2 approximate each
other, the sound pressure Gb1 for the distance Lb1 and the sound pressure Gb2 for the distance
Lb2 are substantially equal. Therefore, in the present embodiment, the difference between the
sound pressure ratios is used to discriminate between the user's own utterance voice and the
other person's utterance voice in the recorded voice. Although the distances Lb1 and Lb2 are 60
cm in the example shown in FIG. 4, it means that the sound pressure Gb1 and the sound pressure
Gb2 are almost equal, and the distances Lb1 and Lb2 are limited to the values shown in the
figure. I will not.
[0034]
FIG. 5 is a diagram showing a method of identifying the voice of the wearer's own voice and the
voice of the other's voice. As described with reference to FIG. 4, the sound pressure Ga2 of the
second microphone 12 is several times (for example, about 4 times) the sound pressure Ga1 of
the first microphone 11 with respect to the voice of the wearer. Further, regarding the speech
voice of the other person, the sound pressure Gb2 of the second microphone 12 is substantially
equal to (about 1 times) the sound pressure Gb1 of the first microphone 11. Therefore, in the
present embodiment, a threshold is set to the ratio between the sound pressure of the second
microphone 12 and the sound pressure of the first microphone 11. Then, the sound whose sound
pressure ratio is larger than the threshold is judged as the speaker's own speech, and the sound
whose sound pressure ratio is smaller than the threshold is judged as the other's speech. In the
example shown in FIG. 5, the threshold is 2 and the sound pressure ratio Ga2 / Ga1 exceeds the
threshold 2, so it is judged that the wearer's speech is voiced, and the sound pressure ratio Gb2 /
Gb1 is smaller than the threshold 2 so the speech of others It is judged that.
[0035]
03-05-2019
14
The voices recorded by the microphones 11 and 12 include so-called noise (noise) such as
environmental sound in addition to the uttered voice. The relationship of the distance between
the noise source and the microphones 11 and 12 is similar to that of the other person's speech.
That is, according to the example shown in FIGS. 4 and 5, the distance between the noise source
c and the first microphone 11 is Lc1, and the distance between the noise source c and the second
microphone 12 is Lc2. Then, the distance Lc1 and the distance Lc2 approximate each other. The
sound pressure ratio Gc2 / Gc1 of the recorded sound of the microphones 11 and 12 is smaller
than the threshold 2. However, such noise is separated and removed from the speech by
performing filtering processing by an existing technique using a band pass filter, a gain filter, and
the like.
[0036]
<Operation Example of Terminal Device> FIG. 6 is a flowchart showing the operation of the
terminal device 10 in the present embodiment. As shown in FIG. 6, when the microphones 11
and 12 of the terminal device 10 acquire voice, electric signals (audio signals) corresponding to
the acquired voice are transmitted from the microphones 11 and 12 to the first amplifier 13 and
the second amplifier 14. Step 601). When the first amplifier 13 and the second amplifier 14
acquire audio signals from the microphones 11 and 12, they amplify the signals and send them
to the audio analysis unit 15 (step 602).
[0037]
The voice analysis unit 15 performs filtering processing on the signals amplified by the first
amplifier 13 and the second amplifier 14 to remove noise (noise) components such as
environmental sound from the signal (step 603). Next, the voice analysis unit 15 records the
microphones 11 and 12 at predetermined time units (for example, several tenths of a second to
several hundredths of a second) for the signal from which the noise component is removed. The
average sound pressure in the voice is determined (step 604).
[0038]
If the gain of the average sound pressure in each of the microphones 11 and 12 obtained in step
03-05-2019
15
604 is present (Yes in step 605), the voice analysis unit 15 determines that there is a speech
voice (utterance is performed), and then A ratio (sound pressure ratio) between the average
sound pressure at the first microphone 11 and the average sound pressure at the second
microphone 12 is determined (step 606). Then, if the sound pressure ratio obtained in step 606
is larger than the threshold (Yes in step 607), the voice analysis unit 15 determines that the
uttered voice is a voice uttered by the wearer itself (step 608). If the sound pressure ratio
obtained in step 606 is smaller than the threshold (No in step 607), the speech analysis unit 15
determines that the speech is a speech of another person (step 609).
[0039]
On the other hand, when the gain of the average sound pressure in each of the microphones 11
and 12 obtained in step 604 does not exist (No in step 605), the voice analysis unit 15
determines that there is no voice (voice is not performed) ( Step 610). It should be noted that the
determination in step 605 takes into account that noise that could not be removed by the
filtering processing in step 603 remains in the signal, and there is a gain if the value of the
average sound pressure gain is a certain value or more. You may judge.
[0040]
Thereafter, the voice analysis unit 15 causes the data transmission unit 16 to transmit the
information (presence or absence of the utterance, information of the utterer) obtained in the
processing of Step 604 to Step 610 to the host device 20 as an analysis result ( Step 611). At this
time, the length of the speech time of each speaker (the wearer or others), the value of the gain
of the average sound pressure, and other additional information may be transmitted to the host
device 20 together with the analysis result.
[0041]
In the present embodiment, by comparing the sound pressure of the first microphone 11 and the
sound pressure of the second microphone 12, it is determined whether the speech sound is a
speech by the wearer's own speech or a speech by another person's speech. . However, the
identification of the speaker according to the present embodiment may be performed based on
the non-verbal information extracted from the audio signals themselves acquired by the
microphones 11 and 12 and is not limited to the comparison of the sound pressure. For example,
03-05-2019
16
the voice acquisition time (output time of voice signal) in the first microphone 11 may be
compared with the voice acquisition time in the second microphone 12. In this case, the
difference between the distance from the wearer's mouth (speaking part) to the first microphone
11 and the distance from the wearer's mouth (speaking part) to the second microphone 12 is
large. Therefore, a certain difference (time difference) occurs in the voice acquisition time. On the
other hand, since the speech of the other person has a small difference between the distance
from the wearer's mouth (speaking part) to the first microphone 11 and the distance from the
wearer's mouth (speaking part) to the second microphone 12 The time difference between the
voice acquisition times is smaller than in the case of the user's uttered voice. Therefore, a
threshold is set for the time difference between the voice acquisition times, and if the time
difference between the voice acquisition times is larger than the threshold, it is determined that
the wearer speaks. If the time difference between the voice acquisition times is smaller than the
threshold It may be determined that the other person's speech.
[0042]
<Application Example of System and Function of Host Device> In the system according to the
present embodiment, information related to speech (hereinafter referred to as speech
information) obtained as described above by a plurality of terminal devices 10 is collected in the
host device 20. The host device 20 uses the information obtained from the plurality of terminal
devices 10 to perform various analyzes in accordance with the purpose of use and the mode of
use of the system. Hereinafter, an example using this embodiment as a system which acquires
information about communication of a plurality of wearers is explained.
[0043]
FIG. 7 is a diagram showing a state in which a plurality of wearers who respectively wear the
terminal device 10 of the present embodiment are in conversation. FIG. 8 is a diagram showing
an example of utterance information of each of the terminal devices 10A and 10B in the
conversation situation of FIG. As shown in FIG. 7, it is assumed that two wearers A and B who are
wearing the terminal device 10 are in conversation. At this time, the voice recognized as the
utterance of the wearer in the terminal device 10A of the wearer A is recognized as the utterance
of the other person in the terminal device 10B of the wearer B. On the contrary, the voice
recognized as the speech of the wearer in the terminal device 10B is recognized as the speech of
the other person in the terminal device 10A.
03-05-2019
17
[0044]
Speech information is sent to the host device 20 independently from the terminal device 10A and
the terminal device 10B. At this time, as shown in FIG. 8, the utterance information acquired from
the terminal device 10A and the utterance information acquired from the terminal device 10B
are opposite to each other in the identification result of the speaker (the wearer and the other
person), but the utterance The information indicating the state of speech such as the length of
time or the timing when the speaker is switched approximates. Therefore, the host device 20
according to this application example compares the information acquired from the terminal
device 10A with the information acquired from the terminal device 10B to determine that these
pieces of information indicate the same utterance situation, and the wearer It recognizes that A
and the wearer B are in conversation. Here, as the information indicating the utterance status, at
least the length of the utterance time in each utterance of each speaker mentioned above, the
start time and end time of each utterance, the time (timing) when the speaker is switched, etc. As
such, time information on the utterance is used. Note that only part of the time information on
these utterances may be used to determine the utterance status of a particular conversation, or
other information may be used additionally.
[0045]
FIG. 9 is a diagram showing an example of a functional configuration of the host device 20 in the
present application example. In the present application example, the host device 20 detects the
speech information (hereinafter, conversation information) from the terminal device 10 of the
wearing person having a conversation among the speech information acquired from the terminal
device 10. And a conversation information analysis unit 202 that analyzes the detected
conversation information. The conversation information detection unit 201 and the conversation
information analysis unit 202 are realized as a function of the data analysis unit 23.
[0046]
Speech information is also sent to the host device 20 from terminal devices 10 other than the
terminal device 10A and the terminal device 10B. The speech information from each of the
terminal devices 10 received by the data receiving unit 21 is stored in the data storage unit 22.
Then, the conversation information detection unit 201 of the data analysis unit 23 reads out the
speech information of each terminal device 10 stored in the data storage unit 22, and detects
conversation information which is speech information related to a specific conversation.
03-05-2019
18
[0047]
As shown in FIG. 8 described above, as the speech information of the terminal device 10A and
the speech information of the terminal device 10B, characteristic correspondences different from
the speech information of the other terminal devices 10 are extracted. The conversation
information detection unit 201 compares the utterance information acquired from each of the
terminal devices 10 stored in the data storage unit 22 and, among the utterance information
acquired from the plurality of terminal devices 10, the correspondence as described above. The
speech information that it has is detected and identified as speech information pertaining to the
same speech. Since the utterance information is sent to the host device 20 from the plurality of
terminal devices 10 as needed, the conversation information detection unit 201 performs the
above-mentioned processing while sequentially separating the utterance information for a fixed
time, for example, and performs a specific conversation It is determined whether or not the
conversation information pertaining to is included.
[0048]
In addition, the conditions for the conversation information detection part 201 to detect the
conversation information which concerns on a specific conversation from the speech information
of several terminal devices 10 are not limited to the correspondence shown in FIG. 8 mentioned
above. Conversation information related to a specific conversation may be detected from any of
plural pieces of utterance information by any method.
[0049]
Moreover, in the above-mentioned example, although the example in which two wearers who
wore the terminal device 10 respectively are talking is shown, the number of persons who
participate in a conversation is not limited to two. When three or more wearers are in
conversation, in the terminal device 10 worn by each wearer, the speech voice of the wearer of
the own device is recognized as the speech voice of the wearer itself, and the other person (two
people This is distinguished from the uttered voice of However, the information indicating the
utterance status such as the utterance time and the timing when the speaker is switched
approximates among the acquired information in each terminal device 10. Therefore, the
conversation information detection unit 201 detects the speech information acquired from the
03-05-2019
19
terminal device 10 of the wearer who participates in the same conversation, and does not
participate in the conversation, as in the case of the above two conversations. It distinguishes
from the utterance information acquired from the terminal device 10 of the wearer.
[0050]
Next, the conversation information analysis unit 202 analyzes the conversation information
detected by the conversation information detection unit 201, and extracts features of the
conversation. In the present embodiment, as a specific example, the feature of the conversation is
extracted based on three evaluation criteria of the conversation degree, the listening degree, and
the conversation activity degree. Here, the degree of interactivity represents the balance of the
speech frequency of the conversation participants. The degree of listening shall represent the
degree of listening to the other person's speech in each conversation participant. The
conversational activity level represents the density of utterances in the entire conversation.
[0051]
The degree of interaction is specified by the number of times of speaker substitution during a
conversation and the variation of the time until the speaker is substituted (the time when one
speaker is speaking continuously). This is obtained from the number of times the speaker has
been switched and the time when the speaker has been switched in the conversation information
for a fixed time. Then, it is assumed that the value (level) of the degree of interaction is larger as
the number of times of change of the speaker is larger and the variation of the continuous speech
time of each speaker is smaller. This evaluation criterion is common to all conversation
information (speech information of each terminal device 10) related to the same conversation.
[0052]
The degree of listening is specified by the ratio of the own speech time for each speech
participant in the speech information to the speech time of another person. For example, in the
case of the following equation, it is assumed that the value (level) of the listening degree is larger
as the value is larger. Degree of listening = (speaking time of another person) ÷ (speaking time
of the wearer) This evaluation criterion is, for each piece of speech information acquired from the
terminal device 10 of each conversation participant, even if the conversation information relates
to the same conversation Will be different.
03-05-2019
20
[0053]
The degree of conversational activity is an index that represents so-called excitement of
conversation, and is specified by the ratio of silent time (time when no one of the conversation
participants is speaking) to the entire conversation information. As the sum of silent time is
shorter, it means that one of the conversation participants speaks in the conversation, and the
value (level) of conversational activity is assumed to be larger. This evaluation criterion is
common to all conversation information (speech information of each terminal device 10) related
to the same conversation.
[0054]
As described above, analysis of the conversation information by the conversation information
analysis unit 202 extracts the feature of the conversation related to the conversation information.
Also, the above analysis identifies how each participant participates in the conversation. Note
that the above evaluation criteria is only an example of information representing the
characteristics of conversation, and the purpose and use of the system of this embodiment can be
achieved by adopting other evaluation items or adding weights to each item. You may set the
evaluation criteria according to an aspect.
[0055]
By performing the above-described analysis on various pieces of conversation information
detected by the conversation information detection unit 201 out of the speech information
stored in the data storage unit 22, in the entire group of the wearer of the terminal device 10
Can analyze communication trends. Specifically, for example, by examining the correlation
between the number of conversation participants, the time the conversation was conducted, the
degree of interaction, the value of activity, etc., and the frequency of occurrence of conversation,
what kind of group of wearers are It is determined whether the conversation of the aspect is
likely to take place.
[0056]
03-05-2019
21
Moreover, the communication tendency of the wearer can be analyzed by performing the abovementioned analysis on a plurality of conversation information of a specific wearer. The manner in
which a particular wearer participates in a conversation may have a certain tendency depending
on conditions such as the number of conversation partners and the number of conversation
participants. Therefore, by examining a plurality of pieces of conversation information in a
specific wearer, for example, features such as a large conversation level in a conversation with a
specific party, an increase in the degree of listening when the number of conversation
participants increases, and the like are detected It is expected to be done.
[0057]
Note that the above-described process of identifying speech information and the process of
analyzing conversational information only show application examples of the system according to
the present embodiment, and the purpose and mode of use of the system according to the
present embodiment, functions of the host device 20, etc. It is not something to do. A processing
function for executing various analyzes and investigations on the utterance information acquired
by the terminal device 10 of the present embodiment may be realized as a function of the host
device 20.
[0058]
DESCRIPTION OF SYMBOLS 10 ... Terminal device, 11 ... 1st microphone, 12 ... 2nd microphone,
13 ... 1st amplifier, 14 ... 2nd amplifier, 15 ... Voice analysis part, 16 ... Data transmission part, 17
... Power supply part, 20 ... Host apparatus , 21: data reception unit, 22: data storage unit, 23:
data analysis unit, 24: output unit, 30: device main body, 40: strap, 201: conversation
information detection unit, 202: conversation information analysis unit
03-05-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
38 Кб
Теги
jp2016226024
1/--страниц
Пожаловаться на содержимое документа