close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2011087074

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2011087074
An output control device of a remote conversation system capable of high-quality communication
without preventing mixing of unnecessary sound without bothering the participants, an
execution control method therefor, and a computer are executable. The purpose is to provide a
program. A communication terminal 100 always buffers voice input from a microphone in a
buffer memory, detects a movement of a participant A's mouth, and then buffers the first
predetermined time in the buffering memory. The voice data from before T is fast-forwarded and
output for a second predetermined time T to synchronize the voice and the image. [Selected
figure] Figure 1
Output control device of remote conversation system, method thereof, and computer executable
program
[0001]
The present invention relates to an output control device of a remote conversation system, a
method thereof, and a computer-executable program.
[0002]
In recent years, video conferences can be held by exchanging image data and audio data by
connecting two or more points with the increase in speed and capacity of communication lines. A
"remote conversation system" such as a system is used.
11-04-2019
1
When there are a plurality of participants in one of the plurality of conference rooms in which
the remote conference is held, usually, in the one conference room, the image or sound of the
speaking participant is selectively selected to the other. It is necessary to transmit to the meeting
room side. As a conventional remote conversation system, for example, in Patent Document 1, a
technology for specifying a speaker and selectively photographing an image based on image
information being photographed, or selectively collecting voice is used. Proposed. Further, in
Patent Document 2, a technique is proposed in which a speaker is specified and an image is
selectively photographed based on image information being photographed.
[0003]
In the above remote conversation system, unnecessary sounds are mixed in from the
microphones of the participants who are not speaking, and there is a problem that the
conference is disturbed (for example, key type sound of a personal computer, noise around the
participants, etc.). In order to avoid mixing in unnecessary sounds, the remote conference
terminal often has a mute switch, but if you use the mute switch, you may forget to release the
mute at the time of speaking, which may affect the conference. It may occur.
[0004]
JP, 2004-118314, A JP, 2003-189273, A
[0005]
The present invention has been made in view of the above, and it is an output of a remote
conversation system capable of making high-quality calls by preventing mixing of unnecessary
sounds without bothering participants. It is an object of the present invention to provide a
control device, a method thereof, and a computer-executable program.
[0006]
In order to solve the problems described above and to achieve the object, the present invention
relates to an output control device of a remote conversation system for transmitting and
receiving at least voice between terminals, a mouth in image data of a participant imaged by
imaging means. A speech state detection means for detecting whether or not the speech state is
detected, and an output for outputting voice data collected by the speech input means when the
speech state detection means detects the speech state And control means.
11-04-2019
2
[0007]
Further, according to a preferred aspect of the present invention, the apparatus further
comprises storage means for storing voice data collected by the voice input means, and the
output control means is detected as the speech state by the speech state detection means. In this
case, it is desirable to output the audio data stored in the storage means from a first
predetermined time before.
[0008]
Further, according to a preferred aspect of the present invention, it is preferable that the output
control unit outputs image data of the participant photographed by the imaging unit.
[0009]
Further, according to a preferred aspect of the present invention, when the output control means
outputs the voice stored in the storage means from a first predetermined time before, the fast
forward output or skip is performed for a second predetermined time. It is desirable to output
and synchronize audio data and image data to be output.
[0010]
Further, according to a preferred aspect of the present invention, the storage means stores the
image of the participant photographed by the imaging means, and the output control means
detects that the mouth is in the speech state by the speech state means. Preferably, the image
stored in the storage means is output from the previous first predetermined time to synchronize
audio data and image data.
[0011]
Further, according to a preferred aspect of the present invention, the speech state detection
means further judges whether or not the volume of the audio data inputted from the speech
input means is equal to or higher than a threshold, and is photographed by the photographing
means It is desirable to detect the movement of the mouth in the participant's image data, and to
detect the utterance state when the volume of the audio data is equal to or higher than a
threshold.
[0012]
Further, according to a preferred aspect of the present invention, the output control device is
mounted on the terminal on the transmission side, and the output control means outputs the
audio data and the image data to the terminal on the reception side. Is desirable.
11-04-2019
3
[0013]
Further, according to a preferred aspect of the present invention, the output control device is
mounted on the terminal on the receiving side, and the output control means is configured to
transmit the audio data and the image data received from the terminal on the transmitting side. It
is desirable to output the above to the speaker and monitor of
[0014]
Further, according to a preferred aspect of the present invention, the output control device is a
relay device for relaying communication between the terminals, and the output control means
transmits voice data and image data received from the terminal on the transmission side.
Preferably, the output is to the terminal on the receiving side.
[0015]
Further, in order to solve the problems described above and to achieve the object, the present
invention relates to an output control method of a remote conversation system for transmitting
and receiving at least voice between terminals, among image data of a participant imaged by
imaging means. A speech state detection step of detecting the movement of the mouth of the
mouth to detect whether or not it is a speech state, and outputting voice data collected by the
speech input means when the speech state detection step detects the speech state. And an output
process.
[0016]
Furthermore, in order to solve the problems described above and achieve the object, the present
invention is a program installed in an output control device of a remote conversation system that
transmits and receives at least voice data between terminals, A speech state detection step of
detecting the movement of the mouth in the image data of the participant and detecting whether
or not it is a speech state, and when the speech state detection step detects the speech state, the
speech input means It is desirable to cause a computer to execute an output step of outputting
collected voice data.
[0017]
As described above, according to the present invention, in the output control device of the
remote conversation system for transmitting and receiving at least voice between the terminals,
the movement of the mouth in the image data of the participant imaged by the imaging unit is
detected. And uttering state detecting means for detecting whether or not the uttering state is
11-04-2019
4
detected; and output control means for outputting voice data collected by the voice input means
when the uttering state is detected by the uttering state detecting means. Therefore, it is possible
to provide an output control device of a remote conversation system capable of performing highquality calls by preventing mixing of unnecessary sounds without bothering the participants.
Play an effect.
[0018]
FIG. 1 is a conceptual diagram for explaining a configuration example of a video conference
system to which a remote conversation system according to the present invention is applied.
FIG. 2 is a schematic block diagram for explaining a configuration example of the communication
terminal of FIG.
FIG. 3 is a diagram for explaining an example of output timing of image data and audio data.
FIG. 4 is a diagram for explaining an example of output timings of image data and audio data
according to the second embodiment.
FIG. 5 is a conceptual diagram for explaining a configuration example of the video conference
system according to the second embodiment.
FIG. 6 is a schematic block diagram for explaining a configuration example of a communication
terminal according to the third embodiment.
FIG. 7 is a conceptual diagram for explaining the configuration of the video conference system
according to the fourth embodiment.
FIG. 8 is a schematic block diagram for explaining a configuration example of a relay apparatus
according to the fourth embodiment.
11-04-2019
5
[0019]
Hereinafter, an embodiment of an output control device of a remote conversation system
according to the present invention, a method thereof, and a computer-executable program will be
described in detail based on the drawings.
The present invention is not limited by the embodiment.
Further, constituent elements in the following embodiments include those which can be easily
conceived by those skilled in the art or those substantially the same.
[0020]
Embodiment 1 In Embodiment 1, the case where the output control device of the remote
conversation system according to the present invention is applied to the transmission side will be
described.
FIG. 1 is a conceptual diagram for explaining a configuration example of a video conference
system to which a remote conversation system according to the present invention is applied.
The teleconferencing system shown in FIG. 1 performs data communication between the
communication terminal 100 disposed in the conference room 1 and the communication
terminal 200 disposed in the conference room 2 via the network 300 such as a public network
or the Internet. Communication is possible.
For example, a personal computer can be used as the communication terminals 100 and 200.
Here, although the number of meeting rooms for holding a meeting is not limited, in the
following description, in order to simplify the explanation, it is assumed that "television meeting"
is performed by connecting two meeting rooms. The two participants are the participant A of the
conference room 1 and the participant B of the conference room 2.
11-04-2019
6
The network 300 is not limited to the public network or the Internet, and other wide / narrow
area networks may be used.
[0021]
Communication terminals 100 and 200 have the same configuration, and each has a camera for
photographing a participant, a microphone for inputting a participant's voice, and a monitor for
displaying an image of the other party. , And a speaker for outputting the voice of the other
party.
[0022]
In the present video conference system, for example, in the conference room 1, the
communication terminal 100 constantly outputs image data captured by a camera (output to the
communication terminal 200).
Further, the communication terminal 100 detects the utterance state of the participant A, and
when the participant A is in the utterance state, the external output (output to the
communication terminal 200) of the voice data is turned on, and the participant A speaks If not,
the external output of voice is turned off. This is because external noise and the like are not
transmitted to the communication terminal 200 when the participant A is not in the speech state.
[0023]
Specifically, in the communication terminal 100, shooting of the participant A with the camera
and input of voice with the microphone are performed, and the motion of the participant A's
mouth is detected in the captured image data. Is determined to be in an utterance state. Here, the
external output of voice data (output to communication terminal 200) is ON only when the
movement of participant A's mouth is detected, but when participant A actually moves the
mouth, After the motion is detected and detected, the external output of audio data is turned on,
but there is a delay associated with image processing etc. from the time participant A moves the
mouth until the external output of audio data is turned on. Therefore, the voice of the first part of
the speech of the participant A is lost. Therefore, in the present embodiment, as will be described
in detail later, the voice input from the microphone is constantly buffered in the buffer memory
11-04-2019
7
to detect the movement of the participant A's mouth in order to prevent the loss of the voice.
After that, of the audio data buffered in the buffer memory, the audio data from before the first
predetermined time T1 is fast-forwarded for the second predetermined time T2 to synchronize
the audio and the image. . That is, the transmission side of the communication terminal adjusts
the output timing of the voice and the image.
[0024]
FIG. 2 is a schematic block diagram for describing a configuration example of communication
terminal 100 in FIG. As shown in FIG. 2, the terminal device 100 includes a camera 101, a
microphone 102, a data processing unit 103, a buffer memory 104, a mouth detection unit 105,
an output control unit 107, a speaker 108, and a monitor 109. , Data communication unit 110
and the like.
[0025]
The camera 101 captures the participant A, and outputs the captured image data to the data
processing unit 103. The microphone 102 collects voice and outputs voice data to the data
processing unit 103. The data processing unit 103 performs data processing (A / D conversion,
etc.) on the image data input from the camera 101 and the audio data input from the microphone
102, and the image data after data processing is transmitted to the data communication unit 110
and the mouth detecting unit While transferring to 105, the audio data after data processing are
sequentially stored in the buffer memory 104.
[0026]
The mouth detection unit 105 detects the face image portion of the image data input from the
data processing unit 103, and further specifies the mouth of the detected face image to detect its
movement, and detects the movement of the mouth ( For example, “1” when there is
movement of the mouth and “0” when there is no movement of the mouth is output to the
output control unit 107. The method of detecting the movement of the face and its mouth may
use a known method such as template matching, and thus the detailed description thereof is
omitted.
11-04-2019
8
[0027]
When movement of the mouth is detected by the mouth detection unit 105, the output control
unit 107 outputs, to the data communication unit 110, audio data sequentially stored in the
buffer memory 104.
[0028]
Data communication unit 110 is for transmitting and receiving data via network 300, transmits
image data and audio data to network 300, and transmits image data transmitted from
communication terminal 200 via network 300. And receive voice data.
Note that the data communication unit 110 may encode and transmit the image data and the
audio data, and may decode the image data and the audio data when the encoded image data and
the audio data are received. .
[0029]
The monitor 109 is, for example, a liquid crystal display device, and displays an image according
to the image data received by the data communication unit 110 from the communication
terminal 200 via the network 300. The speaker 108 outputs voice corresponding to voice data
received by the data communication unit 110 from the communication terminal 200 via the
network 300.
[0030]
An outline of transmission and reception operations of image data and audio data of the
communication terminal 100 configured as described above will be described. First, the
transmission operation will be described. The image data of the participant A photographed by
the camera 101 and the voice data of the participant A collected by the microphone 102 are
input to the data processing unit 103. The image data input to the data processing unit 103 is
transferred to the data communication unit 110 and the mouth detection unit 105. The voice
data input to the data communication unit 110 is transmitted to the communication terminal 200
via the network 300. Also, the mouth detection unit 105 detects the movement of the mouth in
the input image, and the detection result is output to the output control unit 107. On the other
11-04-2019
9
hand, audio data input to the data processing unit 103 is sequentially buffered in the buffer
memory 104. The output control unit 107 transfers the voice data buffered in the buffer memory
104 to the data communication unit 110 when the mouth detection unit 105 detects the
movement of the mouth. The voice data input to the data communication unit 110 is transmitted
to the communication terminal 200 via the network 300.
[0031]
Next, the reception operation will be described. Data communication unit 110 receives the image
data and audio data of participant B transmitted from communication terminal 200 via network
300, displays the corresponding image by monitor 109, and corresponds by speaker 108. Output
voice.
[0032]
As described above, a loss of sound may occur due to a delay due to image processing or the like
from when the participant A moves his / her mouth until the external output of sound data is
turned on. The output control unit 107 adjusts the output timing of the audio as follows in order
to prevent the loss of the audio.
[0033]
FIG. 3 is a diagram for explaining an example of output timing of image data and audio data. In
the figure, (a) is an image input timing, (b) is an image output timing, (c) is an audio input timing
(input to the buffer memory 104), (d) an audio output timing (output from the buffer memory
104). Is shown. Further, t0 is the time when the opening of the mouth is detected, t4 is the time
when the movement of the mouth is stopped, and T1 is the first predetermined time for detecting
the opening of the mouth and turning on the audio data output. , T2 indicates a second
predetermined time for fast-forwarding audio data to recover a delay at T1, T3 indicates a time
for normal output of audio data.
[0034]
11-04-2019
10
In the same figure, when the opening of the mouth is detected at time t0 in the image data, when
the audio data is output after the detection, a delay for the first predetermined time T1 occurs, so
the audio for the first predetermined time T1 is It will be missing. When the output of voice data
from time t0 when the mouth is opened after the first predetermined time T1 is started to avoid
voice loss, a gap between the voice and the image occurs. Therefore, when the opening of the
mouth is detected, the output control unit 107 buffers the audio data from the time before the
first predetermined time T1 (time t0) which is buffered in the buffer memory 104 for the second
predetermined time T2. In the meantime, the output is performed by fast forward, the audio and
the image are synchronized, and after synchronizing, during T3, the normal output is performed
to eliminate the loss of the audio and the deviation of the image and the audio. Note that skip
output may be performed instead of fast forward output.
[0035]
As described above, according to the first embodiment, the communication apparatus 100
captures the participant A with the camera 101 and collects the voice of the participant A with
the microphone 102, and the participant captured with the camera 101 The image data of the
subject is transmitted to the communication terminal 200 through the data communication unit
110, while the audio data input from the microphone 102 is buffered in the buffer memory 104,
and the mouth detection unit 105 The movement of the mouth in the image data of the
participant taken is detected to detect the call state, and the output control unit 107 detects the
movement of the mouth of the participant A by the mouth detection unit 105, that is, speaking
After the state is detected, the audio data from the first predetermined time T1 before, which is
buffered in the buffer memory 104, is fast-forwarded for a second predetermined time T2 and
then output normally. Since transmission to the communication terminal 200 is performed via
the data communication unit 110, it is possible to prevent high-quality communication by
preventing unnecessary sound mixing without bothering the participants. In addition, it is
possible to prevent the dropout of the voice spoken by the participant and the deviation between
the image and the voice. In addition, the communication terminal 200 on the receiving side
merely reproduces the audio data and the image data transmitted from the communication
device 100 as it is, there is no mixing of unnecessary sound, and there is a dropout of the voice
said by the participant and It is possible to make a call without deviation between image and
sound.
[0036]
Here, when movement of participant A's mouth is detected in the photographed image, it is
11-04-2019
11
judged that participant A is in a speech state, but participant A has an open mouth, A voice level
determination unit may be further provided in order to prevent the state not being determined to
be the speech state. More specifically, the movement of the participant A's mouth is detected in
the image data captured by the mouth detection unit 105, and the sound level determination unit
detects the volume (sound level) of the sound data input from the microphone 102. If it is
determined that the threshold value is exceeded, it may be determined that the participant A is in
the speech state.
[0037]
Second Embodiment In the first embodiment, the voice input from the microphone 102 is
constantly buffered in the buffer memory 104, and after the movement of the participant A's
mouth is detected, the buffer memory 104 is buffered. The voice before the first predetermined
time T1 is fast-forwarded for the second predetermined time T2, but the participant B hears the
fast-forwarded voice, so it becomes difficult to hear the participant A's speech . Therefore, in the
second embodiment, the voice data input from the microphone 102 and the image data input
from the camera 101 are constantly buffered in the buffer memory 104 and the movement of
the participant A's mouth is detected. The voice data and image data from before the first
predetermined time T1 buffered in the buffer memory 104 are output, and the point that the
speech of the participant A becomes difficult to hear is resolved.
[0038]
The outline of the configuration example of the communication terminal 100 according to the
second embodiment is the same as that of FIG. 2, so only different operations are described. In
FIG. 2, the data processing unit 103 sequentially stores, in the buffer memory 104, the image
data input from the camera 101 and the audio data input from the microphone 102. The mouth
detection unit 105 detects the face image portion of the image input from the data processing
unit 103, and further specifies the mouth of the detected face image to detect the movement
thereof, and the detection result of the movement of the mouth (for example, When the mouth
movement is present, “1” is output, and when the mouth movement is absent, “0” is output
to the output control unit 107. When the mouth detection unit 105 detects the movement of the
mouth, the output control unit 107 sequentially stores the image data and audio data from
before the first predetermined time T1 stored in the buffer memory 104 into the data
communication unit 110. Output to The image data and audio data input to the data
communication unit 110 are transmitted to the communication terminal 200 via the network
300.
11-04-2019
12
[0039]
FIG. 4 is a diagram for explaining an example of output timings of image data and audio data
according to the second embodiment. In the figure, (a) is an image input timing (input to buffer
memory 104), (b) is an image output timing (output from buffer memory 104), and (c) is an
audio input timing (input to buffer memory 104). And (d) indicate the audio output timing
(output from the buffer memory 104). Also, t1 is the time when the opening of the mouth is
detected, t4 is the time when the movement of the mouth is stopped, and T1 is the time until the
output of the audio data and the image data is turned on by detecting the opening of the mouth
The predetermined time of 1 is shown.
[0040]
In the figure, the output control unit 107 detects the voice from the time before the first
predetermined time T1 (time t0) stored in the memory buffer 104 after the first predetermined
time T1 after the opening of the mouth is detected. Output data and image data. As a result, in
the communication device 200, it becomes easy to listen to the speech of the participant A, and it
is possible to make a natural call without fast-forwarding or skipping of voice.
[0041]
Third Embodiment In the third embodiment, the case where the output control device of the
remote conversation system according to the present invention is applied to the receiving side
will be described. In the first and second embodiments, the output side of the audio data and the
image data is adjusted on the transmission side, but in the third embodiment, the output side of
the audio data and the image data is adjusted on the reception side. Do. FIG. 5 is a conceptual
diagram for explaining a configuration example of the video conference system according to the
second embodiment. In FIG. 5, parts having the same functions as those in FIG. 1 are given the
same reference numerals. In the figure, in the communication terminal 100 on the transmitting
side, the image data of the participant taken by the camera 101 and the audio data collected by
the microphone 102 are transmitted as they are to the communication terminal 200, and the
communication terminal 200 on the receiving side is Adjust the output timing of the received
image data and audio data.
11-04-2019
13
[0042]
FIG. 6 is a schematic block diagram for explaining a configuration example of the communication
terminal 200 according to the third embodiment. The configuration of the communication
terminal 100 is the same as that of the communication terminal 200. As shown in FIG. 6, the
communication terminal 200 includes a camera 201, a microphone 202, a data processing unit
203, a buffer memory 204, a mouth detection unit 205, an output control unit 207, a speaker
208, and a monitor 209. , Data communication unit 120 and the like.
[0043]
The camera 201 photographs the participant B, and outputs the photographed image data to the
data processing unit 203. The microphone 202 collects voice and outputs voice data to the data
processing unit 203. The data processing unit 203 processes the image data input from the
camera 201 and the audio data input from the microphone 202 and transfers the data to the
data communication unit 120. Image data and audio data are transmitted from the data
communication unit 120 to the communication terminal 100 via the network 300.
[0044]
On the other hand, the data communication unit 120 receives the image data and audio data
transmitted from the communication terminal 100 via the network 300. The data communication
unit 120 outputs the received image data to the monitor 209 to display an image and transfers
the image data to the mouth detection unit 205. Further, the data communication unit 120
sequentially stores the received audio data in the buffer memory 204. The mouth detection unit
205 detects the face image portion of the image data input from the data communication unit
120, and further specifies the mouth of the detected face image to detect its movement, and
detects the movement of the mouth ( For example, the output control unit 207 outputs “1”
when there is movement of the mouth and “0” when there is no movement of the lips. When
the mouth detection unit 205 detects a movement of the mouth, the output control unit 207
outputs the audio data sequentially stored in the buffer memory 204 to the speaker 208 for
reproduction. Here, the output timing of the output control unit 207 is the same as the output
timing shown in FIG. 3 of the first embodiment. The output timing may be the same as that
shown in FIG. 4 of the second embodiment.
11-04-2019
14
[0045]
According to the third embodiment, since the output timing of the voice and the image is
adjusted on the receiving side, the adjustment of the output timing of the voice and the image on
the transmitting side becomes unnecessary, and the process of the participant It is possible to
make a call without the loss of the voice that has been made and the difference between the
image and the voice.
[0046]
Fourth Embodiment In the fourth embodiment, the case where the output control device of the
remote conversation system according to the present invention is applied to a relay device will be
described.
In the fourth embodiment, a video conference is performed via a relay device such as a server
supporting a video conference system, and the relay device adjusts output timings of audio data
and image data.
[0047]
FIG. 7 is a conceptual diagram for explaining the configuration of the video conference system
according to the fourth embodiment. In FIG. 7, parts having the same functions as those in FIG. 1
are given the same reference numerals. In the figure, the communication terminal 100 arranged
in the conference room 1, the communication terminal 200 arranged in the conference room 2,
and the relay device 500 are connected via the network 300, and the communication terminal
100 and the communication terminal 200 are connected. Transmits and receives image data and
audio data via the relay device 500. The communication terminals 100 and 200 transmit the
image data and the audio data acquired by the camera and the microphone to the relay device
500 as they are, respectively, and the relay device 500 adjusts the timing of the received image
data and the audio data to Send to 200, 100.
[0048]
FIG. 8 is a schematic block diagram for explaining a configuration example of the relay apparatus
11-04-2019
15
500 according to the fourth embodiment. In the figure, data communication unit 510 receives
image data and audio data transmitted from communication terminal 100 via network 300. The
data communication unit 510 transmits the received image to the communication terminal 200,
and transfers the image to the mouth detection unit 505. In addition, the data communication
unit 510 sequentially stores the received audio data in the buffer memory 504.
[0049]
The mouth detection unit 505 detects the face image portion of the image data input from the
data communication unit 510, and further specifies the mouth of the detected face image to
detect its movement, and detects the movement of the mouth ( For example, “1” when there is
movement of the mouth and “0” when there is no movement of the mouth is output to the
output control unit 507. When movement of the mouth is detected by the mouth detection unit
505, the output control unit 507 reads out audio data sequentially stored in the buffer memory
504, and outputs the audio data to the data communication unit 510. The data communication
unit 510 The data is transmitted to communication terminal 200. Here, the output timing of the
output control unit 507 is the same as the output timing shown in FIG. 3 of the first embodiment.
The output timing may be the same as that shown in FIG. 4 of the second embodiment.
[0050]
According to the fourth embodiment, since the output timings of the voice and the image are
adjusted by the relay device 500, it is not necessary to adjust the output timings of the voice and
the image on the transmitting side and the receiving side, and only the processing of the relay
device 500 is performed. Thus, it is possible to make a call without a dropout of the voice said by
the participant and a gap between the image and the voice.
[0051]
In the video conference system according to the first to fourth embodiments, two conference
rooms are connected, but the present invention is not limited to this, and three or more
conference rooms may be connected. Good.
Although one participant in each conference room is used, the present invention is not limited to
this, and a plurality of participants in each conference room may be used. In this case, when
there are a plurality of participants in the conference room, the process of the above embodiment
11-04-2019
16
may be performed when the movement of the mouth of any of the plurality of participants is
detected. . In the first to fourth embodiments described above, the remote conversation system
according to the present invention is applied to a video conference system. However, the present
invention is not limited to video conference calls performed in a conference room. It goes without
saying that it can also be used for casual calls. In addition, Embodiments 1 to 4 above can be
implemented alone or in any combination.
[0052]
Further, it is an object of the present invention to provide a system or apparatus with a recording
medium recording software program code for realizing the function of the output control
apparatus of the remote conversation system described above, and a computer of that system or
apparatus (or It can also be achieved by the CPU, MPU, DSP) executing program code stored in
the recording medium. In this case, the program code itself read out from the recording medium
implements the function of the output control apparatus described above, and the program code
or the recording medium storing the program constitutes the present invention. Recording media
for supplying the program code include FD, hard disk, optical disk, magneto-optical disk, CDROM, CD-R, magnetic tape, non-volatile memory, optical recording medium such as ROM,
magnetic recording medium, optical Magnetic recording media and semiconductor recording
media can be used.
[0053]
Further, by executing the program code read by the computer, not only the functions of the
output control device described above are realized, but also an operating system (OS) or the like
operating on the computer based on the instructions of the program code. It is needless to say
that the present invention also includes the case where the part of or part of the actual
processing is performed, and the processing of the output control device described above is
realized by the processing.
[0054]
As described above, the output control device of the remote conversation system according to the
present invention, the method thereof, and the computer-executable program can be widely used
for public video conference calls and private calls performed in companies etc. It is.
[0055]
1, 2 conference room 100, 200 communication terminal 101, 201 camera 102, 202 microphone
11-04-2019
17
103, 203 data processing unit 104, 204 buffer memory 105, 205 mouth detection unit 107, 207
output control unit 108, 208 speaker 110, 210 data communication Part 300 Network 500
Relay Device
11-04-2019
18
Документ
Категория
Без категории
Просмотров
0
Размер файла
30 Кб
Теги
jp2011087074, description
1/--страниц
Пожаловаться на содержимое документа