close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2006340376

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2006340376
An audio conference bridge system and method are provided. The present invention breaks the
traditional notion of being a single mixing function for a conference. Alternatively, the novel
flexible design of the present invention provides a separate mixing function to each participant
(20) in the conference. This new architecture is generally described herein as "edge point mixing"
(70). Edge point mixing overcomes the limitations of traditional conferencing systems by
providing each participant control depending on the participant's conference experience. Edge
point mixing allows each participant to receive a clearly mixed audio signal from the conference,
depending on the "position" of the performer in the world of the virtual conference, if desired.
Simulation of "real life" meetings will also be possible. [Selected figure] Figure 4
Teleconferencing bridge with edge point mixing
[0001]
BACKGROUND OF THE INVENTION Related Art) The present application was filed on May 21,
1999, US Provisional Application No. 60 / 135,239, entitled “Teleconferencing Bridge with
EdgePoint Mixing,” and filed on June 17, 1999, Claims priority to US Provisional Application
No. 60 / 139,616, entitled "Automatic Teleconferencing Control System". Both are incorporated
herein by reference. This application also claims priority to US Patent Application Serial No. ____
of "Teleconferencing Bridge with Edgepoint Mixing", filed May 15, 2000. This is also
incorporated herein by reference.
[0002]
10-04-2019
1
(2. FIELD OF THE INVENTION The present invention relates to communication systems, and
more particularly, to an audio conferencing system that can provide conference participants with
a high level of control over realistic and imminent experiences and conference parameters.
[0003]
(3. 2. Description of the Related Art It is desirable in a communication network to provide
conference arrangements so that many participants can be bridged together by conference call. A
conference bridge is a device or system that can connect multiple connection endpoints together
to establish a teleconference. A modern conference bridge can adapt to both audio and data,
which allows, for example, co-authoring documents by participants in the conference.
[0004]
However, historically, the experience of acoustic conferences is still not sufficient, especially in
conferences with high attendance. Speaker area recognition (to know who is talking), volume
control, speaker clipping, speaker breakthrough (a function that another speaker inserts), track
noise, standby music situation, end to control the conference experience There is a problem of
user inactivity.
[0005]
In conventional systems, only one mixing function is applied to the entire audio conference.
There are attempts to provide sufficient sound levels for all participants using automatic gain
control, but the participants do not have control over the sound mixing level in the conference,
unlike the adjustment of their own telephone (For example, changing the sound level of mixed
conferences can not change the audio of any individual in it). In this way, the audio of each of the
conference participants can not be amplified or attenuated. In addition, with conventional
conference bridge technology, it is difficult to identify an individual's audio or to identify who is
speaking other than explicitly stating the speaker's name. Furthermore, line noise can be
separated and corrected only by human conference operator intervention.
10-04-2019
2
[0006]
The inflexibility of conventional conferencing systems causes serious problems. For example,
conventional conferencing systems can not adequately address users having different qualities of
conferencing connections and / or endpoint devices. Certain conference participants can receive
high fidelity mixed audio signals from the conference bridge due to the connection to the
conference and / or the quality of the endpoint conferencing equipment. However, since only one
mixing algorithm is applied to the entire conference, the mixing algorithm must be tailored to the
lowest level of participants. Thus, even if one conferee can handle considerable high fidelity
output from the conference bridge, the mixing algorithm is only two people capable of speaking
and the third party is disturbed.
[0007]
Additionally, conventional acoustic bridge systems attempt to equalize the gain applied to each of
the conference participant's audio. However, almost always it is even more difficult for certain
participants to hear other people's talk due to variations in line quality, background noise,
speaker volume, microphone sensitivity etc. For example, during a business teleconference, one
participant is often too large and another is often too small. In addition, because conventional
business conferencing systems do not provide a visual interface, it is difficult to recognize who is
speaking at any particular time. Music-on-held also presents a problem for conventional systems
where any participant waiting for a meeting broadcasts music to everyone else in the meeting.
Without controlling the mixing individually, participants in the conference can not turn off
unwanted music.
[0008]
The particular audio conferencing environment that requires more end-user control is the
"virtual chat room". Chat rooms have become popular in the Internet in recent years. Participants
in the chat room access the same website via the Internet to communicate about a particular
topic (e.g., sports, movies, etc.) at the center of the chat room. A traditional "chat room" is actually
a text-based website, whereby participants type messages in real time to be viewed by everyone
in the room. Recently, audio-based chat has emerged as a common and more realistic alternative
to text chat. In an audio chat room, participants actually talk to each other in an audio conference
enabled via an Internet website. Because chat room participants generally do not know each
other before a particular chat room session, each participant is generally identified in the audio
10-04-2019
3
chat room by their "screen name." The "screen name" may be listed on the web page during the
meeting.
[0009]
The need for greater end-user control for audio conferences is even more pronounced in chat
room settings than business conferences. Internet users receive services of widely varying
quality. Among other things, the quality of service depends on the user's Internet Service
Provider (ISP), connection speed, multimedia computing capabilities. It is particularly desirable
that it is necessary to provide different participants with varying fidelity meeting output as the
quality of service varies among participants in the audio chat room. Furthermore, the clarity and
loudness of the acoustic signal input by each user varies with the quality of service of the user.
Participants with broadband access to the Internet and high quality multimedia computers send a
much clearer audio signal to the audio chat room than participants using dial-up access and lowend personal computers. As a result, the volume and clarity of the audio heard in the internet
chat room can vary significantly.
[0010]
Furthermore, the content of the participants' speech is hardly monitored in the audio chat room.
Some chat rooms include "moderators", that is, human observers who are responsible for
ensuring that the conversation is appropriate for a particular category. For example, if a
participant enters a chat room on the subject of a children's book discussion, a human moderator
may banish participants who have begun talking about sex or use of vulgar things. However, as
chat websites are cost centric, not all chat websites provide human moderators. Furthermore,
even chat rooms that use human custodians generally do not generally protect participants from
just offensive (sloppy) users.
[0011]
Instead, without the individual mixing controls, that is, when the human monitoring is ended, the
chat room participants will be poor, regardless of how poor the sound quality is or whether the
content is vulgar or offensive Tell all the other participants. Furthermore, traditional chat rooms
do not give the user a "real life" experience. The participant's audio is usually mixed according to
a single algorithm that conveys the content throughout the conference, with the gains applied to
10-04-2019
4
each participant's audio being equalized. Thus, everyone in the conference receives the same
audio stream. This is contrasted with a real experience room full of people speaking. In a real-life
"chat room", everyone in the room speaks slightly differently depending on their location in the
room relative to other speakers.
[0012]
Conventional attempts to overcome the limitations of conventional conferencing techniques
(such as "the use of whisper circuits") are inadequate as they still do not provide conference
participants with full mixing fidelity. There is a need for a robust, flexible acoustic conference
bridge system.
[0013]
SUMMARY OF THE INVENTION In accordance with the principles of the present invention, an
audio conference bridge system and method is provided. The present invention discards the
conventional notification of a single mixing function for a conference. Instead, the novel and
flexible design of the present invention provides each participant in the conference with a
separate mixing function. This novel architecture is generally described herein as "edge point
mixing".
[0014]
Edge point mixing overcomes the limitations of conventional conferencing systems by providing
each participant with control of the conference experience of each participant. For example,
standby music is not important to business teleconferencing facilitated by the present invention.
The remaining participants may simply attenuate the signals of the waiting conference
participants and may end the attenuation when the participants return to the conference.
Similarly, small speakers of voice, i.e., speakers who can not hear clearly due to line noise, may be
individually amplified by any participant.
[0015]
10-04-2019
5
Edge point mixing also simulates a "real experience" meeting, where each participant can, if
desired, receive a clear mixing acoustic signal from the meeting depending on the speaker's
"position" in the virtual meeting world. Becomes possible. Preferably, participants in the
conference are provided with a virtual interface that indicates the positions of other participants
in the virtual conference world. The mixing parameters then change for the participants as they
move around in the virtual conference world (closer to a particular conferee and away from other
conferees).
[0016]
Preferred embodiments of the present invention allow for dynamic modification of each
participant's mixing parameters according to a three-stage control system. First, set default
mixing parameters according to an algorithm (eg, distance-based attenuation in a virtual chat
room). The mixing parameters for which the algorithm was determined may then be
automatically changed in accordance with the policies of the system set or participant group (e.g.,
turn off low-profile speakers). Finally, the algorithms and / or policies may be overridden by
explicit participant requests (e.g., requests to amplify the audio of a particular speaker, etc.).
[0017]
The invention preferably corresponds to different participants of the quality of service. In this
way, participants with high speed connectivity and / or high fidelity edge point conferencing
equipment receive better mixing signals than participants in the same conference with low speed
connectivity or low fidelity equipment. Each of the participants can then enjoy the highest level
of conference experience that the participants' connections and equipment allow.
[0018]
DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS The features of the present
invention will be more readily apparent and better understood by reference to the following
detailed description of exemplary embodiments of the present invention, taken in conjunction
with the accompanying drawings. Be done.
[0019]
10-04-2019
6
The system and method of the present invention overcomes the limitations of conventional
bridges by providing separate mixing functions to each participant in a conference.
Thus, the present invention supports conferencing applications that attempt to transmit more
realistic simulations of real-world conferencing experiences. In a real face-to-face meeting, each
participant sounds slightly different due to location and room acoustics etc. That is, each
participant actually has a separate mixing function built into his or her hearing system. By
providing each meeting participant with a separate mixing function, the present invention
enables recreation of a real world meeting environment.
[0020]
The invention also preferably provides advanced control of end users in a conference. The
control is used to amplify other loudspeakers that are difficult to hear, reduce noise sources,
filter out unwanted content (such as quality), etc. Thus, each participant can adjust the audio
quality of the meeting that exactly meets his needs. Of course, this ability is not easy to achieve in
real meetings (especially when the meeting is large). Thus, edge point mixing can provide the
participant's "more life" experience, if desired.
[0021]
The conceptual difference between edge point mixing and conventional mixing is briefly
illustrated in FIG. As shown in FIG. 1, in a conventional mixing conference, each participant 20
transmits its media stream to the conference bridge 30. Conference bridge 30 applies one mixing
function to the conference and outputs a mixing signal to each participant 20. Since only one
mixing function is applied to the conference 10, each participant basically receives the same
mixing signal.
[0022]
Edge point mixing is even more flexible. Each participant 20 sends its media stream 60 to the
conference bridge 50. However, conference bridge 50 is provided with a separate edge point
mixer 70 for each participant 20. Further, each participant sends a control stream 80 to the
audio bridge 50. Based at least in part on the control stream 80, the audio bridge 50 returns a
10-04-2019
7
separate mixing audio signal to each participant 20. Because each participant's control stream 80
is probably different, each participant 20 can obtain another well-coordinated conferencing
experience.
[0023]
FIG. 2 is a block diagram illustrating the general organization of the audio conference bridge
system 100 according to the present invention. In the illustrated exemplary embodiment, a
plurality of conference participant stations (A, B and C) 110 interface with a system control unit
200 and an audio bridge unit 300. Although only three participant stations 100 are illustrated,
any number of stations 110 may be connected to the system 100. The system control unit 200 is
generally responsible for receiving mixing control data 140 for the participant station 110 and
translating the mixing control data into mixing control parameters 150 for incorporation by the
audio bridge unit 300. . Although both the system control unit 200 and the audio bridge unit
300 may possibly be implemented simply in hardware, each of the units 200 and 300 and / or
both have computer programs running on a suitable hardware platform Have.
[0024]
In the preferred embodiment of the present invention, the interface between the conference
participant station 110 and the system control unit 200 utilizes a packet switched network, such
as an Internet Protocol (IP) network. The media interface between the conference participant
station 110 and the audio bridge unit 300 may be on another communication network. Such
other communication networks include, for example, the public switched telephone network
(PSTN), a packet switched network, or a combination of two across gateways between the PSTN
and the packet switched network. However, participant stations 110 may be connected to the
system by any communication network including local area networks (such as Ethernet), private
networks, circuit switched networks, and the like.
[0025]
The audio bridge unit 300 comprises a plurality of edge point mixers 310. In the preferred
embodiment, each edge point mixer 310 is a software process executing on the audio bridge unit
300 or implemented as part of the audio bridge unit 300. Preferably, one edge point mixer 310
is assigned to each participant station 110 (e.g., A, B and C). The edge point mixer 310 performs
10-04-2019
8
audio mixing for the participant station 110 by mixing incoming audio signals according to the
mixing parameters 150 dynamically supplied by the system control unit 200. In a simple system,
mixing parameters 150 may correspond to individual volume or gain control of the incoming
audio signal of each of the other participant stations 110.
[0026]
FIG. 3 illustrates generally the flow of operation of the audio conference bridge system of FIG.
Incoming audio signal 325 is received and transmitted by audio conference bridge system 100
by media interface unit (MIU) 400. The MIU 400 provides a media interface between the audio
bridge unit 300 and any network (s) used by the participant stations 110 to transmit and receive
audio signals. The MIU 400 performs functions such as media stream packetization and
depacketization, automatic gain control, acoustic echo cancellation (if necessary) and lower layer
protocol processing (such as RTP and TCP / IP). In one embodiment, the incoming audio signal
325 from the participant station 110 to the audio bridge unit 300 is received via the MIU 400 to
the audio stream duplicateer 399 where the incoming audio signal is duplicated and given.
Distributed to each of the edge point mixers 310 for the conference. As discussed below, the
audio stream duplicator 399 can be omitted by appropriately using matrix amplification.
[0027]
In this embodiment, each edge point mixer 310 includes a group of multiplier functions 311,
312, 313 and an adder function 319. Multipliers 311, 312, 313 multiply each incoming audio
signal 325 with an associated mixing control parameter 150 provided by system control unit
200. The adder function 319 then adds the multiplied incoming audio signal 325, thereby
performing the actual mixing and producing a mixed audio output signal 330. Again, mixing
control parameter 150 may be a simple gain control in the basic implementation of system 100.
In more complex implementations, the multiplier function 311 can be replaced by either a more
complex linear or non-linear function, time-varying or non-time-varying, thereby creating a
variety of conferencing experiences. For example, the mixing control parameter 150 can be very
complex and can have effects such as delay, reverberation (echo), frequency and phase shift,
harmonics, distortion, or any other sound processing function based on a pre-incoming audio
signal. The edge point mixer 310 is instructed to bring, thereby enhancing the conferencing
experience.
[0028]
10-04-2019
9
Figures 4 and 5 illustrate a preferred embodiment of a participant station 110 for use with the
audio conferencing bridge system of the present invention. The participant station 110 provides
the participants (eg, A, B and C) with both an audio interface to the audio conference bridge
system 100 and a visual interface.
[0029]
As shown in FIG. 4, participant station 110 may comprise a combination of a personal computer
(PC) 450 and a standard telephone 460. In this configuration, the PC 450 preferably has either a
low speed or high speed connection to a packet switched network 455 (such as the Internet or a
managed IP network) to provide the visual part of the participant interface, It communicates with
the system control unit 200. This visual interface (not shown) is preferably a software application
running on a PC 450, such as Java.RTM. Applet, an interactive game program, or any other
adapted to communicate with the system 100 of the invention. Including other applications. The
telephone 460 then provides an audio interface with the audio bridge unit 300 by its connection
via the public switched telephone network (PSTN) 465. This participant station embodiment
employs an IP / PSTN gateway 470 implemented in the management part of the IP network 455
of the system to enable audio connection between the audio bridge unit 300 and the participant
station phone 460 Make it The PSTN / IP gateway 470 is commercially available from, inter alia,
Cisco Systems, and may be co-located with the audio bridge unit 300 or remote from the audio
bridge unit 300, preferably via a managed IP network 455. It can either be connected.
[0030]
The participant station 110 shown in FIG. 4 has (1) multimedia capabilities on the participant's
PC 450, (2) high quality service of the packet switched network 455, or (3) uniform data packets
(UDP). It provides a special and useful means for business participants accessing the audio
conference bridge system 100 without having to have a special configuration that can bypass the
corporate network firewall.
[0031]
FIG. 5 shows different preferred participant stations 110 including a multimedia PC 451 with a
speaker 452 and a microphone 453.
10-04-2019
10
In this embodiment, the PC 451 preferably has a high speed connection to the managed IP
network 455. The audio conference bridge system 100 is connected to the managed IP network
455, and audio and visual / control signals are transmitted via the same communication network
455. Preferably, both audio and visual / control signals send an IP packet with appropriate
addressing in the IP packet header to direct audio signal information to the audio bridge unit
300 and control information to the system control unit 200. It is transmitted through.
[0032]
As used herein, a "signal" may be analog, digital, packet switched, or any other technology
sufficient to transmit audio and / or control information required by the present invention.
Includes the transmission of information through. Furthermore, as used herein, "connection" does
not necessarily mean dedicated physical connection such as a hardwired switched network.
Rather, a connection involves the establishment of any communication session, whether or not all
the information transmitted over such connection travels the same physical path.
[0033]
It should be understood that FIGS. 4 and 5 are merely exemplary. Many other participant stations
110 configurations are possible, including "Internet phones", PDAs, wireless devices, set top
boxes, top gaming stations, etc. Any single or multiple device (s) that may communicate
efficiently with both system control unit 200 and audio bridge unit 300 may function as
participant station 110. In addition, those skilled in the art will appreciate that business
participants with sufficient bandwidth, firewall clearance, and multimedia PC 451 resources are
also (optionally) capable of applying the "simple IP" embodiment of FIG. To understand the.
Similarly, this PC 450 / phone 460 combination shown in FIG. 4 may be used by non-business
participants, and is particularly useful for participants with only narrowband access to an IP
network 455 such as the Internet.
[0034]
FIG. 6 shows an embodiment of the invention in which the audio conference bridge system 100
is implemented on one server 600. It should be understood that some or all of the components
shown may be distributed via multiple servers or other hardware. This embodiment of the
10-04-2019
11
conferencing server 600 comprises three main components: a system control unit 200, an audio
bridge unit 300 and an MIU 400. Conference server 600 may include any number of different
hardware configurations, including a personal computer or a dedicated DSP platform.
[0035]
The system control unit 200 provides overall coordination of functionality to the conference
taking place on the conference server 600. The system control unit 200 communicates with the
participant station 110 to obtain mixing control data 140 and translates the mixing control data
140 into mixing parameters 150 for the audio bridge unit 300. The system control unit 200 may
either be entirely located within the conferencing server 600 or may be distributed among
several conferencing servers 600 and / or distributed to the participant stations 110.
[0036]
For example, in a visual chat room application, the system control unit 200 performs the
calculation of the distance between "avatar" (visual display of each participant), calculates the
amount of attenuation of the audio, and sends it It may apply to the audio signal 325. However,
each position vector, direction vector, vector indicating speech activity instructions in the chat
room may in any case be such that each of the participant stations 110 (so that the participant
stations can accurately update their screen). And, instead of the conferencing server 600, it is
possible to have the participant stations 110 perform distance calculations.
[0037]
In fact, participant station 110 may calculate actual mixing parameters 150 and send the
calculated parameters to audio bridge unit 300 (rather than transmitting position or distance
information). A significant advantage to this approach is the increased scalability of server 600
and the increased development of simplified application function development (because almost
everything is done at participant station 110). The disadvantage of such a distributed approach is
that the processing requirements of the participant station are slightly increased and the time lag
between the movement of the embodied on the participant station screen and the change in
audio mixing is large It is. This increase in time delay is approximately proportional to the time
taken to transmit the position and volume information of all other participants to the participant
station 110, which can be reduced by a so-called inference method. Hybrid applications are also
10-04-2019
12
possible, although some of the participant stations 110 include portions of the system control
unit 200, but not other participant stations.
[0038]
The audio bridge unit 300 includes an edge point mixer 310 and is generally responsible for
receiving the incoming audio signal 325 from the participant station 110 and for outputting a
separate mixing signal 330 to the participant station 110. Edge point mixer 310 performs audio
mixing for participant station 110 by mixing multiple incoming audio signals 325 in the
conference according to mixing parameters 150 dynamically supplied by system control unit
200. The mixing control parameters 150 provided to a given edge point mixer 310 may be
different from the parameters 150 provided to any other edge point mixer 310 for a particular
conference. Thus, the conference experience is unique to each participant in the conference.
[0039]
In a simple system, mixing parameters 150 may correspond to simple volume or gain control for
all of the other participants' incoming audio signals 325. However, preferably, the audio bridge
unit 300 performs a large amount of matrix amplification and should be optimized in this way.
Audio bridge unit 300 also preferably outputs an active speaker indicator (not shown) for each
participant station 110. This active speaker indicator indicates, for each mixing output signal
330, which incoming audio signal 325 is being mixed. The active speaker indicator may be
transformed by the participant station 110 to always visually indicate (eg, highlight the
participant's embodiment) which participant's audio is being heard.
[0040]
Audio bridge unit 300 includes one or more software processes that may run on either a general
purpose computing platform or a DSP platform, such as a Linux operating system that is run by
an Intel based PC. The audio bridge unit 300 preferably allocates sufficient resources of the
conferencing server 600 to each participant station 110 in the conference and implements one
edge point mixer 310. For example, if the conferencing server 600 is a DSP platform, each edge
point mixer 310 is assigned to another DSP. Alternatively, a DSP with sufficient processing power
to perform matrix mathematical operations may accommodate multiple edge point mixers 310.
10-04-2019
13
[0041]
In another embodiment, some or all of the edge point mixers 310 may be distributed to the
participant stations 110. However, this requires that all participant stations 110 broadcast their
audio signal input 325 to the distributed edge point mixers 310. This may be inefficient unless
there is a very fast connection between all participant stations 110. The advantage of having a
centrally centered edge point mixer 310 is that each participant station 110 only needs to
transmit and receive one audio signal.
[0042]
In the one server embodiment shown in FIG. 6, it is presently preferred that each edge point
mixer 310 be adapted to receive the following information as input. 16 bit pulse code modulated
(PCM) uncoupled incoming audio signal (325) samples, 8000 samples / s / participant. Although
8-bit PCM is a standard for telephony, the 16-bit requirement will allow the addition of wideband
codecs in the future. Attenuation / amplification mixing parameters 150 for all conference
participants updated at a default rate of 10 times per second. The update rate is preferably a
dynamically adjustable parameter. Other mixing parameters 150 obtained from the system
control unit 200 that modify the mixing algorithm. Other mixing parameters 150 include:
[0043]
・ Maximum number of speakers mixed simultaneously (N). The system or system operator
preferably adjusts this parameter to optimize performance or to adapt the capabilities of each
participant station 110.
[0044]
Attenuation / amplification level update speed. The system or system operator preferably adjusts
this parameter (e.g., 10 times / sec) to optimize performance.
[0045]
10-04-2019
14
Update speed of the active speaker indicator. The system or system operator adjusts this
parameter to optimize performance (eg, 10 times / second).
[0046]
Speech activity detection (SAD) enable / disable. Each participant station 110 can either enable
or disable SAD for the participant's conference experience. When SAD is disabled, the top N nonweakened incoming audio signals 325 are mixed regardless of any achieved threshold.
[0047]
The edge point mixer 310 preferably outputs at least the following data: That is, speech (audio)
samples of an uncoupled mixed audio signal (330) of 16-bit pulse code modulation (PCM), 8000
samples / sec for each participant station 110. Active speaker indicator (ie, the speaker currently
being mixed) that identifies the current speaker that can be heard.
[0048]
Both system control unit 200 and audio bridge unit 300 employ Media Interface Unit (MIU) 400
to communicate with external resources such as participant station 110. The MIU 400 is
preferably a software module that includes all of the necessary protocols and conversation
mechanisms to enable proper communication between the conferencing server 600 and the
participant stations 110. For example, the MIU 400 performs the conventional audio processing
functions of coding / decoding 610, automatic gain control 615 and packing / unpacking 620 of
RTP packets. The MIU 400 also performs protocol processing of the Voice over IP (VOIP)
protocol 630 used for specific conferences. Similar to system control unit 200 and audio bridge
unit 300, MIUs 400 may be distributed among different servers 600 in the network.
[0049]
Preferred is the IP routing achieved by the system described in US Pat. No. 5,513,328 "Apparatus
10-04-2019
15
for inter-process / device communication for multiple systems of asynchronous devices". The
above patents are incorporated herein by reference. The system described in the above patent
uses processing resources efficiently by following an event-driven software architecture and
efficiently extends to new plug-in applications (such as the audio conference bridge system of the
present invention). to enable.
[0050]
The basis of communication of the audio conference bridge system is preferably the Internet
Protocol (IP). Within the inclusive of this protocol, subprotocols (eg, TCP, UDP) and
superprotocols (eg, RTP, RTCP) are employed, if necessary. MIU 400 may also use standard VOIP
protocol 630, preferably SIP and H.264. Support H.323. However, any VOIP 630 can be used.
VOIP protocol stack 630 is commercially available from Radvision and many other companies.
[0051]
Real Time Protocol (RTP) and Real Time Control Protocol (RTCP) 620 are standard media for
media transmission in VOIP networks. The MIU 400 packs and unpacks the RTP input stream
and RTP output stream of each of the conference participant stations 110. RTP processing 620 is
preferably a function that is included with VOIP protocol stack 630. Furthermore, it is preferable
to transmit the VOIP media using compressed RTP to limit the header to data ratio and to
increase the throughput.
[0052]
To communicate with the participant stations, system control unit 200 is preferably used as a
custom protocol (denoted "True Chat Protocol" 640) translatable by media interface unit 400. As
those skilled in the art will appreciate, the true chat protocol 640 is application dependent and
includes simple identifiers such as attribute value pairs. Thereby, the true chat protocol 640
instructs the system control unit 200 how to process the incoming (outgoing) information from
the participant station 110. The true chat protocol 640 may be encapsulated in RTP with a
defined RTP payload header type. The true chat protocol 640, although not bandwidth intensive,
is inherently time sensitive. Encapsulation of the protocol into RTP takes advantage of the QoS
control mechanisms specific to some VOIP architectures, such as the CableLabs packet cable
architecture, by simply establishing a second RTP session.
10-04-2019
16
[0053]
The MIU also includes a media conversion unit 650. Audio bridge unit 300 preferably receives
16-bit linear incoming audio signal 325. However, standard telephone code (G.711) and many
compressed codes are somewhat non-linear. G. For 711 non-linear companding functions are
applied by media conversion unit 650 to improve signal to noise ratio and extend dynamic range.
In the case of a telephone-type code, media conversion unit 650 first provides incoming audio
signal 325 with G.V. Convert to 711 and then apply the inverse companding function. This is
preferably accomplished using a table lookup function. In order to output the mixed audio signal
330, the media conversion unit 650 performs the opposite operation. Thus, media conversion
unit 650 preferably includes a translation coder that is capable of translating a variety of
different codes into 16-bit linear (such as PCM) and back.
[0054]
As described, the present invention is preferably implemented via a managed IP network 455
(FIG. 5). However, even very highly managed IP networks 455 with quality of service (QoS)
capabilities may lose packets and receive packets randomly. Since audio communication is very
sensitive to latency, retransmitting lost packets is an infeasible method for data transmission
errors. From a future perspective of the application forward error correction (FEC) is a possible
solution to this problem, but FEC requires continuous transmission of redundant information.
That is, FEC requires expensive operations from both bandwidth and processing power. As a
compromise, many VOIP applications are moving to a receiver-based method to estimate lost
speech samples due to packet transmission problems. If there is one lost sample, the simple
algorithm is either to repeat the last sample or to interpolate linearly. If multiple samples are lost,
a more aggressive interpolation method should be employed, such as the interpolation method
recommended by ETSI TIPHON. For example, the method described in ANSI T1.521-1999 is
described in G.I. It is suitable for processing 711 code.
[0055]
The MIU 400 also preferably includes an automatic gain control (AGC) 615 with echo
cancellation. The AGC 615 is applied to the mixed audio signal 330 output from the audio bridge
unit 300. AGC 615 is based on G.I. Applied before being converted to 711 or other code. AGC
10-04-2019
17
615 also preferably normalizes the output from audio bridge unit 300 from 16 bits to 8 bits in
the case of a standard telephone code.
[0056]
The MIU also preferably includes an audio recognition module 660. As described below, audio
recognition 660 may be used in conjunction with the present invention to implement a specific
mixing policy (such as filtering out extraneous content emitted by other participants). Existing
audio recognition software such as Via Voice available from IBM can be adopted.
[0057]
FIG. 7 illustrates the basic method of the invention described in connection with the systems
described in FIGS. 2 and 3. First, the audio conference bridge system 100 dynamically generates
an audio conference bridge (700). The audio conference bridge system 100 is preferably a
software process operating on a server and comprises a system control unit 200 and an audio
bridge unit 300. In the preferred embodiment shown in FIGS. 2 and 3, this is achieved as follows.
The participant stations 110 individually establish control sessions with the system control unit
200. The system control unit 200 provides each of the participant stations 110 with a unique
session identifier or SID for that participant station 110. The system control unit 200 also
provides the SID to the audio bridge unit 300 and informs that unit 300 that the SID will be
grouped into the same conference. In implementing this function, it is useful to express the SID in
terms of conference ID and participant station ID to ensure uniqueness and to simplify the
process of associating a particular SID with a particular conference possible. Alternatively, the
SID may include the IP address and port address of the participant station 110.
[0058]
After establishing the control session, each of the participant stations 110 establishes an audio
connection with the audio bridge unit 300 and exchanges the appropriate SIDs. The SID may
either be automatically exchanged by the participant station 110 after being prompted by the
audio bridge unit 300, or may be manually exchanged by the participants (A, B, C). For example,
a person using participant station 110 as shown in FIG. 4 connects to audio bridge unit 300
using his phone 460 and DTMF tones his SID to audio bridge unit 300. It may be necessary to
provide manually via. From this point, the SID is used as a reference by the system control unit
10-04-2019
18
200 until the end of the conference. The system control unit 200 transmits the SID having the
mixing control parameter 150 to the audio bridge unit 300. This allows the audio bridge unit
300 to correlate incoming audio signals 325 from different participant stations 110 to the
appropriate edge point mixers and to apply the appropriate mixing parameters 150.
[0059]
Next, system control unit 200 receives mixing control data 140 for participant station 110 (710).
The mixing control data 140 for each of the participant stations 110 is system control to derive
the individual mixing parameters 150 applied to at least two (preferably all) of the incoming
audio signals 325 from the other participant stations 110. It contains data used by unit 200. The
configuration of mixing control data 140 may take many forms depending on the conferencing
application and distributed control levels of participant station 110. In the virtual chat room
example, mixing control data 140 received from each participant station 110 may be coordinates
of the avatar of that participant in the virtual conference world. In another example, the mixing
control data 140 may include mere notification that the participant station 110 is in tune with
the "parent control" function (low profile filtering). In yet another example, the mixing control
data 140 may include explicit mixing instructions from the participant (e.g., raise the volume of
the incoming audio signal 325 from the participant C).
[0060]
However, in general, the term "mixing control data" 140 includes any information used to
calculate mixing control parameters 150. As described, in one example, participant station 110
may be able to calculate its own mixing parameters 150. In that case, the mixing control data
140 is defined as the parameter 150 itself. Furthermore, the final mixing control parameters 150
calculated by the system control unit 200 may be data from other system resources (e.g. It
should be understood that it may depend on the warning).
[0061]
If system control unit 200 receives mixing control data 140, audio bridge unit 300 receives an
incoming audio signal 325 from participant station 110 (720). The system control unit 200 then
sets 730 mixing control parameters 150 for each of the edge point mixers 110 based on at least
the mixing control data 140 received for each participant station 110. Preferably, the mixing
10-04-2019
19
control parameter 150 is set (periodically revised) according to the three-stage control system.
First, default mixing parameters are set according to an algorithm such as distance-based
attenuation in a virtual chat room. The algorithmic determined mixing parameters may then be
automatically changed in accordance with a system setup policy or participant setup policy, such
as muting the vulgar speaker. Ultimately, the algorithms and / or policies may be overridden by
explicit participant requests, such as a request to amplify a particular speaker's audio.
[0062]
For example, in 3D conferencing applications, a suitable default algorithm may be determined to
reproduce the realistic propagation of sound in a simulated 3D environment. In this case, the
mixing control data 140 received from each of the participant stations 110 is the position of the
participant in the virtual environment and the direction in which the participant is heading (both
the listener and the And may be included. In operation, each of the participant stations 110 may
periodically update the system control unit 200 with the participant's current position and
orientation so that the mixing control parameters 150 may be updated. The system control unit
200 obtains this information and applies that information to the mixing algorithm to calculate
the appropriate mixing control parameters 150 for the designed edge point mixer 310 of each
participant station, and then the audio bridge unit 300. Transmits its parameters 150, whereby
the mixing takes place properly. Participant position information, mixing control parameters 150,
proper correlation of appropriate edge point 310 mixers are achieved by the above mentioned
SID.
[0063]
The distance based attenuation algorithm of this example can then be automatically changed by
the enforcement of the system or participant policy. For example, if the policy of a particular
participant station is to filter out certain vulgar words from the conference, the “parent
control” flag of that participant station will control the system as part of participant station
mixing control data 140 It is sent to the unit 200. The MIU 400 is loaded with offensive language
settings and searched using the speech recognition module 660. Whenever an offensive
language is detected, the MIU 400 informs the system control unit 200. The system control unit
200 then sets the damping parameter for the offensive talker temporarily (or permanently by
policy) to 100%, thereby effectively preventing unwanted speech.
[0064]
10-04-2019
20
This attenuation occurs whether the underlying algorithm (in this case, the distance-based
algorithm) is otherwise not included in the participant mixing audio signal output 330 as the
offensive speaker's audio. Preferably, this attenuation only affects participant stations 110 that
enable such a policy. Participants who do not enable that policy hear all that is said. In certain
applications, a system administrator may want to automatically filter low-level words from all
participant stations 110 (eg, virtual chat rooms for children). Many other types of system and
participant policy implementations are possible with the present invention, which will be readily
apparent to those skilled in the art.
[0065]
The default mixing algorithm may also be overridden directly by the mixing control data 140,
which includes explicit mixing instructions from the participant station 110. Explicit mixing
instructions may temporarily or permanently invalidate certain station aspects of the algorithm
calculations performed by system control unit 200. For example, a participant may require that
another participant in the conference may amplify more than indicated by the mixing algorithm.
This is useful, for example, if a person wants to hear distant conversations in a three-dimensional
chat room. A similar request may place participant station 110 in a whisper mode or a privacy
mode so that other participants can not listen to the participant's conversation. Many other types
of participant control requests are possible with the present invention and will be readily
apparent to those skilled in the art. In addition, mixing control parameters 150 may be more
complex than simple linear coefficients, and may include certain non-linear functions to produce
effects such as distortion, echo, and the like.
[0066]
The mixing control data 140 may also include information used to optimize the maximum
number of incoming audio signals 325 to be mixed to any particular participant station 110. As
described, the participant stations 110 have different qualities in operation, both in terms of
equipment and connections to the present audio conference bridge system 100. For example,
participant station 110 illustrated in FIG. 4 includes the audio interface of phone 460 connected
to audio bridge unit 300 via PSTN 465. Where the fidelity of the phone 460 and / or the PSTN
465 is limited, the present invention preferably reduces the maximum number of incoming audio
signals 325 that can be mixed to the participant station 110 (e.g. The incoming audio signal 325
is mixed, and the top eight incoming mixing signals are mixed to other participants).
10-04-2019
21
[0067]
A full IP participant station 110 (eg, FIG. 5), with a high power multimedia PC 451, full stereo
speakers 452, and high speed access to a managed IP network 455, efficiently consumes a great
deal of audio A low fidelity participant station 110 (eg, FIG. 4) may be mixed but may not be able
to do so. However, the system 100 allows sufficient flexibility even within the same conference.
High power users receive sufficient fidelity and low end users do not receive sufficient fidelity,
but both take full advantage of their facilities and network connections, and given these factors,
the expected service Receive This is a sufficient advantage in that all of the participant stations
110 of different quality can participate in the same conference and experience different but
equally satisfying experiences.
[0068]
Preferably, the fidelity adjustment for each participant station 110 may be an algorithm
implemented by the system control unit 200. The system control unit 200 preferably determines
the optimum, maximum number (automatically or with input from the user) of the incoming
audio signal 325 for mixing to its participant station 110. In one embodiment, the associated
mixing control data 140 includes explicit instructions from participant station 110. For example,
an application operating at participant station 110 may provide participants with suggestions on
how to set this parameter based on connection speed, audio equipment, and the like. This
parameter may also be dynamically modified during the meeting, thereby altering the maximum
number of incoming signals 325 mixed if the participant does not meet the original settings. In
another embodiment, system control unit 200 automatically controls mixing control data 140
through monitoring network conditions (including network jitter, packet loss, quality of service,
connection speed, latency, etc.). By collecting, the maximum number of incoming mixing signals
325 for each of the participant stations 110 can be optimized.
[0069]
Once the mixing control parameters 150 are calculated, the mixing control parameters 150 are
sent by the system control unit 200 to the audio bridge unit 300. The audio bridge unit 300 then
mixes the incoming audio signal 325 according to the mixing control parameters 150 of each
participant station using the edge point mixer 310 (740). Each of the participant stations 110 is
10-04-2019
22
assigned to a different edge point mixer 310, and the system control unit 200 transmits SIDs to
its participant stations 110 with mixing control parameters 150, and the appropriate correlation
by the audio bridge unit 300. Make it possible.
[0070]
The preferred method of mixing will be described with reference back to the arrangement of FIG.
For simplicity we assume a very simple mixing algorithm. A very simple mixing algorithm mixes
all the audio according to the dynamically updated attenuation values explicitly supplied by the
participant station 110. Further, for the various input and output signals of FIG.
[0071]
SI (1) = incoming audio signal from participant station A SI (2) = incoming audio signal from
participant station B SI (3) = incoming audio signal from participant station C SO (1) = participant
station Mixed audio signal output to A SO (2) = mixed audio signal output to participant station B
SO (3) = mixed audio signal output to participant station C A (1, 1) = participant A input signal
The amplification selected by participant A for (this is usually zero unless the virtual environment
contains some echo) A (1,2) = by participant A for participant B's input signal Selected
amplification A (1, 3) = Amplification selected by participant A for the input signal of participant
C A (2, 1) = Input signal of participant A The amplification selected by Participant B against A (2,
2) = the amplification selected by Participant B for Participant B's input signal (unless the virtual
environment contains some echo, this is usually A (2,3) = amplification selected by participant B
for the input signal of participant C A (3,1) = selected by participant C for the input signal of
participant A A (3, 2) = amplification selected by participant C for the input signal of participant
B A (3, 3) = amplification selected by participant C for the input signal of participant C (This is
usually zero as long as the virtual environment does not contain some echo). The formula for the
output signal can then be simply described as a function of the input signal.
[0072]
SO (1) = A (1,1) * SI (1) + A (1,2) * SI (2) + A (1,3) * SI (3) SO (2) = A (2,1) * SI (1) + A (2,2) * SI (2)
+ A (2,3) * SI (3) SO (3) = A (3,1) * SI (1) + A (3,2) * SI (2) + A (3, 3) * SI (3) This calculation can be
achieved as a simple matrix operation.
For example, if SI represents the input column vector of the input signal 325 of the participant of
10-04-2019
23
the participant, A represents the amplification matrix and SO represents the output vector 350 of
the mixing audio signal output 350.
[0073]
SO = A × SI, where “x” is used to mean matrix multiplication.
[0074]
The incoming audio signal 325 is constantly changing, the amplification matrix is periodically
updated, and this calculation represents only one sample of the output mixing audio signal 330.
G. For common PCM based code such as 711 this operation is performed at 8000 times per
second. It should be noted that by implementing edge point mixing calculations as matrix
operations, the need for explicit stream duplicator 399 is eliminated.
[0075]
The above example assumes that the number of participant stations 110 is small and the mixing
algorithm is simple. However, in more complex embodiments, there are generally more than
three participant stations 110 for the conference, and the mixing algorithm can be quite
complex. Thus, edge point mixing calculations are preferably optimized to limit the
computational overhead. For example, assume that a relatively large chat room has 50
participant stations 110, all of which are strongly interactive, and that the default mixing
algorithm mixes up to 8 speakers. First, audio conferencing system 100 must determine which
incoming audio signal 325 is to be mixed for each of participant stations 110. The mixing
calculations must then be optimized to reduce the complexity of the associated matrix
operations.
[0076]
The preferred real-time input to the audio bridge unit 300 is an amplification matrix (A) from the
system control unit 200 and a PCM speech sample vector (SI) derived from the incoming audio
signal 325 received via the media interface unit 400. It is. Use two simple steps to combine to
10-04-2019
24
determine which speakers should be mixed. The first step uses speech activity detection (SAD) to
determine as a means to reduce the number of possibilities for the currently active speaker. The
second step estimates the signal strength and amplification values to select the top N sources for
mixing.
[0077]
The first step in this preferred process is then to periodically calculate the SAD value for the
incoming audio signal 325. Speech activity detection algorithms are relatively standard building
blocks and will not be described here. However, the SAD is preferably implemented as part of the
MIU 400 associated with the media conversion unit 650. For a frequency of incoming speech
samples (e.g., 8000 / s), speech activity detection is relatively static (e.g., 10 updates / s). The
output of the SAD function is generally a Boolean value (0 or 1). The number of columns of the
amplification matrix (A) and the number of rows of the speech input vector (SI) decrease rapidly
because many incoming audio signals 325 are inactive (generating only silence or low level
noise) It achieves a considerable amount of reduction of the required matrix calculations. Such
reduced matrices are referred to as (a) and (si) respectively.
[0078]
Optimally, the second step in the preferred process was used to request the incoming signal 325
amplified according to the intensity (per participant station 110) and then finally mixed to the
participant station 110 Only the top N signals are summed for signal output 330. The
amplification signal selected for the final sum may change for each of the participant stations
110. This means that the matrix multiplication of the reduced amplification matrix (a) and the
input signal vector (si) is further reduced to a series of modified vector dot products. Here, each
row is calculated separately instead of one matrix multiplication. The vector dot product is
modified as there is a sorting process that occurs before the final addition. Then, preferably, the
audio bridge unit 300 performs a multiplication associated with the dot product and performs a
descending sort until the top N (eg 8) values are obtained. The N values are then summed to
obtain the desired output mixing signal 330.
[0079]
When the incoming audio signal 325 is properly mixed according to the mixing control
10-04-2019
25
parameter 150 (740), another mixing audio signal 330 is output from the audio bridge unit 300
to each participant station 110 (750). The output 750 of the mixed audio signal 330 generally
relates to the audio bridge unit 300 conveying the mixed audio signal 330 to each of the
participant stations 110 via a communication network. However, in embodiments where some of
the audio bridge units 300 are distributed to the participant stations 110 (such as some of the
participant stations 110 include their own edge point mixers 310), the output step 750 is mixed
It may include simply transmitting the audio signal 330 to the associated speaker.
[0080]
FIG. 8 shows an example of a possible visual interface to a virtual chat room 800 utilizing the
audio-conferernce bridging system 100 of the present invention. The exemplary application
shown in FIG. 8 shows a two-dimensional virtual chat room 800 in which an embodiment 810
representing participants AF is located. Certain chat rooms 800 show mountain views and may
be ideal for discussions about outdoor sports and the like. In addition to the participants, FIG. 8
includes a jukebox icon 820 and a hypertext link 830 to a separate virtual chat room (in this
case, the Hawaiian theme). This chat room 800 may be an Internet website hosted on the same
server 600 as the system control unit 200 and the audio bridging unit 300. In this embodiment,
the visual interface of chat room 800 may be provided to participant station 110 by Java applet
running on participant station 110. It is recognized that other visual interfaces with nearly
infinite variety are possible. However, the chat room 800 shown here is used in conjunction with
FIG. 9 to describe an exemplary virtual chat session using the audio conference bridging system
100 of the present invention.
[0081]
FIG. 9 is an event chart showing an exemplary chat session of the virtual chat room shown in FIG.
As described, many mixing algorithms are possible. In virtual visual chat room application 800,
for example, an associated mixing algorithm may attempt to reproduce realistic, distance-based
sound propagation in a calculated environment. The environment is two-dimensional or threedimensional. In the three-dimensional case, the mixing control data 140 transmitted by each
participant station 110 may include the position of the person in the room, the direction the
person is facing, and the tilt of the person's head (the embodied game and It should be a visual
paradigm in virtual environment applications etc.). By having this information, system control
unit 200 calculates mixing control parameters 150. The mixing control parameter 150 outputs
the mixed signal 330 from the audio bridging unit 300. The mixed output signal 330 is
attenuated based on the distance and direction of the speaker (e.g., the speaker to the left of the
10-04-2019
26
participant's embodiment is mixed with his / her voice). The mixed voice is mainly output from
the left stereo speaker of the participant station). However, for simplicity, the embodiment
illustrated in FIG. 9 simply assumes a distance based algorithm, regardless of direction, head tilt,
etc.
[0082]
The first "event" 900 is that participants A, B, and C are in the room 800 (already having
established a conference session). FIG. 8 is not drawn to scale but initially assumes that A, B and
C are equidistant from one another. In addition, the following initial assumptions are made: (1)
Participants D, E, & F are initially not at least one in Room 800; (2) All participants are
consecutive at the same audio level (3) Only participant C allows parent control (ie, filtering of
inappropriate speech); (4) default maximum number of incoming audio signals mixed at any one
time Is 4 (more susceptible to loss of participant stations with lower fidelity).
[0083]
While participants A, B and C are in the room 800, the participant station 110 periodically
updates the mixing control data 140, including the position in the room 800, at the system
control unit 200. (For purposes of this discussion, the position of the participant's embodiment
810 is referred to as the participant's own position. ) The system control unit 200 applies a
specific mixing algorithm to the mixing control data 140, thereby calculating the mixing
parameters 150 for each participant station 110. The audio bridge unit 300 then mixes the
individual output signals 330 for each of the participant stations 110 based on the individual
mixing parameters 150. In this case, participants A, B, and C are equidistant from one another,
and simply a distance-based mixing algorithm is applied, so that each participant station 110 has
an equal input of the other two participants. Receive a mix (eg, mixed signal of A = 50% (B) + 50%
(C)).
[0084]
It should be understood that the percentages shown in FIG. 9 are a mixture of the components of
the incoming audio signal 325. However, they do not necessarily have to indicate the strength of
the signal. Rather, in the present embodiment, the gain is a function of the distance between the
embodied 810 and the speaker volume input. In one embodiment, the gain decreases
10-04-2019
27
(approximately true in real space) as the square of the distance between the realizations 810
increases. However, in some applications, it may be advantageous to use a slower rate of
distance-based "attenuation" as an example of gain calculation as a linear function of proximity
between the realizations 810. In other embodiments, it may be desirable to amplify at least one
conversation in the virtual chat room 800 to an audible level regardless of the distance between
the realizations 810. In this embodiment, a simple distance-based algorithm is used, assuming
that all participants are constantly speaking at the same incoming level, and "top" for any
particular participant. The incoming signal 325 is received by the three other participants with
the closest proximity.
[0085]
Next, Participant A moves closer to Participant B (90) while Participants A and B remain
equidistant from Participant C (FIG. 8 shows the starting position of each participant) Note that
only shows). System control unit 200 receives the updated positions of participants A, B, and C
and recalculates mixing control parameters 150 for each participant station 110. Then, the audio
bridging unit 300 remixes the incoming audio signal 325 for each participant station 110 based
on the modified mixing control parameters 150 received from the system control unit 200. In the
present example, the distance between participants changes, whereby participant A receives a
70% -30% split between B's incoming audio signal 325 and C's incoming audio signal 325,
respectively. Do. B receives a similar split between A's incoming audio signal 325 and C's
incoming audio signal 325. However, C additionally receives a 50% -50% split between A's
incoming audio signal 325 and B's incoming audio signal 325, since these participants A and B
remain equidistant from C. Do.
[0086]
The event 920 shown next is that the participant B makes an inappropriate speech. Inappropriate
speech is detected by speech recognition module 660 in MIU 400 which informs system control
unit 200 of the inappropriate speech contained in B's incoming audio signal 325. The speech
recognition module 660 informs the system control unit 200 of inappropriate speech contained
in B's incoming audio signal 325. Recall that Participant C is the only Participant who can control
as his / her parent. The system control unit 200 recalculates the mixing control parameter 150
for the participant station C and sends the updated parameter 150 to the audio bridging unit
300. The audio bridging unit 300 then extinguishes the B's incoming signal 325 from the C's
mixed signal 330 temporarily (or permanently depending on location policy). Assume that B's
incoming signal 325 is permanently muted from C's mixed signal 330. Thus, C receives only the
10-04-2019
28
audio input from participant A. Assuming that the mixing control data 140 from A and B has not
changed, the mixed signal 330 that is the output to A and B remains the same (A is an
inappropriate speech spoken by B listen).
[0087]
Next, participants D and E enter the room 800 (event 930) and move to the position shown in
FIG. As previously discussed, in order to enter the room 800, participants D and E have already
established a control session by the system control unit 200 and a media connection to the audio
bridging unit 300. Assuming that D and E utilize the “pure IP” participant station 110 shown
in FIG. 5, participants D and E manually enter in the SID provided by the system control unit 200.
It is possible to enter the room 800 seamlessly without proceeding.
[0088]
Once participants D and E enter the room 800 (event 930), the system control unit 200 receives
periodic updates of the mixing control data 140 including the positions of all the participants.
With the addition of more than two participants, the system control unit 200 calculates the
mixing parameters 150 for the existing participants A, B and C, and the mixing parameters 150
for the new participants D and E. The audio bridging unit 300 then remixes the mixed signal 330
output to each participant station 110 based on the new mixing parameters 150. As shown in
FIG. 9, in the present embodiment, participants A, B and C are far apart from participants D and E
(participant E is slightly further away from participant D). , D and E to receive the highly
attenuated levels of the incoming audio signal 325. Similarly, participants D and E receive most
of each other's incoming audio signal 325, along with the highly attenuated portions of the
incoming audio signal 325 from participants A, B and C.
[0089]
Participant A then explicitly requests 940 to scan telephonic conversations of participants D and
E. This request may be made in a variety of ways, including Participant A clicking on their mouse
pointer directly on the interval between Participants D and E. The system control unit receives
this request from Participant A as part of the mixing control data 140. Then, the system control
unit 200 preferably recalculates A's mixing control parameters 150 as if participant A was at the
point clicked by the pointer of participant A's mouse. A is still considered to be in front of itself,
10-04-2019
29
in order for the still remaining participant to mix Participant A's incoming audio signal 325. The
audio bridging unit 300 then remixes the mixed signal 330 output by Participant A according to
the new mixing control parameter 150 (as a result, more weight is given to the conversation
between D and E) (A mixed output signal 330 results). The mixed audio signal 320 to other
participants is not changed by this event.
[0090]
Event 950, shown next, is a request from Participant F to join the conference using Participant
Station 110 (e.g., a visual PC interface and an audio PSTN telephone interface) similar to that
shown in FIG. It is. Preferably, the request from Participant F is made through his PC 450 or
other visual interface. The system control unit 200 receives the request, assigns the SID for the
conference to the participant F, and informs the participant F of the telephone number for
obtaining the audio interface. System control unit 200 also sends the SID to audio bridging unit
300. The audio bridging unit 300 correlates the SID to the currently open conference and waits
for Participant F to establish an audio connection. The unit participant F actually joins the
conference, and the mixed audio signal 330 for the existing participant stations 110 does not
change.
[0091]
In one embodiment, Participant F establishes an audio connection by calling a toll-free number
and causes Participant Station F to connect to Audio Bridging Unit 300 through PSTN-IP Gateway
470. The audio bridging unit 300 then prompts the participant F to enter the SID provided by the
system control unit 200 (possibly through DTMF tones). Once in the SID, the audio bridging unit
300 dedicates the edge point mixer 310 to the participant station F and connects it to the
current conference.
[0092]
Once Participant F establishes an audio connection and enters the conference (at the position
shown in FIG. 8) (Event 960), system control unit 200 participates, including the initial position
of Participant F in room 800. It receives periodic updates for all of the parties' positions and
calculates updated mixing control parameters 150 for each participant station 110. Recall that
the default maximum number of mixed audio signals for this conference was assumed to be four.
10-04-2019
30
Now, with six participants, each participant receives a mixed signal 330 that does not include at
least one of the other participants' incoming audio signals 325. For example, because Participant
C is far from Participant A's eavesdropping position (between Participants D and E), A's mixed
signal 330 includes any arbitrary input from C. Absent. Similarly, Participant B's mixed signal
330 does not include any input from Participant E. (Participant A is considered by Participants A
and B to still maintain Participant A's position for mixing with other Participants despite
Participant A listening and hearing. ) Participant C does not further lose any signal input with the
addition of Participant F since Participant B's input has already been turned off due to
inappropriate speech.
[0093]
However, assuming that the PSTN connection 465 of Participant F to the system 100 of the
present invention is limited in fidelity, the system control unit 200 preferably preferably receives
incoming audio signals for Participant F. Limit the number of 325 to three. Due to fidelity and
speed limitations, participant F's audio connection and equipment clearly can not receive the
mixed output signal 300 with four mixed voices in real time. Therefore, the control system
adjusts the participant F (herein assumed to be three mixed incoming audio signals 325) to a
level of fidelity that the participant station F can handle best . As discussed, this fidelity limitation
is preferably based on mixing control parameters 140, which are preferably received explicitly
from participant station 110 and / or automatically derived by system control unit 200. Included
as mixing control parameters 150 from unit 200.
[0094]
Participant A then moves to jukebox 820 at the corner of virtual chat room 800 (event 970). It is
recognized that this virtual jukebox 820 can take many forms, including a link to a streaming
audio service hosted on another server. However, as music flows into the virtual chat room 800,
it is preferred that the jukebox 820 be preferably simply treated as another participant for
mixing. In other words, participants closer to the jukebox 820 listen to music with more louder
sound than participants further away. Thus, system control unit 200 takes jukebox 820 as the
source of another potential incoming audio signal 325 and calculates distance based mixing
control parameters 150 based thereon. The audio bridging unit 300 then remixes the individual
mixed audio signal 330 for any participant affected by the activation of the jukebox 820. In this
case, only the participants A, D, E and F (from their own listening position) will replace the music
from jukebox 820 with one of the four incoming audio signals 325 previously mixed. Close
enough to the jukebox.
10-04-2019
31
[0095]
Finally, Participant A decides to collide with the "To Hawaii" symbol 830 at the corner of the
virtual chat room 800 (980). This is an example of a convenient portal to different chat rooms
(possibly an portal on a Hawaiian theme). This is implemented as a hypertext link within the
current chat room 800 or by various other mechanisms. A preferred method of handling such a
link-implementation event is described in US Provisional Application No. 60 / 139,616, filed June
17, 1999, entitled "Automatic Teleconferencing Control System,". Indicated by US Provisional
Application No. 60 / 139,616 is incorporated herein by reference.
[0096]
Once participant A collides with the hypertext (event 980), system control unit 200 assigns a
different SID to participant A and sends that to audio bridging unit 300. The audio bridging unit
300 correlates the SID to the Hawaii conference and connects participant A to the conference
with another dedicated edge point mixer 310. The system control unit 200 calculates an initial
mixing parameter 150 for the participant A in the Hawaii meeting, and transmits the calculated
value to the audio bridging unit 300. The audio bridging unit 300 then connects the incoming
audio signal 325 of A to the other edge point mixers 310 of the other participants in the Hawaii
meeting, and according to the mixing control parameter 150 of the A of the other Hawaiian
meeting participants. Mix incoming audio signal 325.
[0097]
It will be appreciated that the embodiment shown in FIG. 9 is not exclusive or limiting. In
particular, it can not be assumed that all the participants are talking at any time. Thus, a likely
appropriate selection of the incoming audio signal 325 to be mixed (including speech activity
detection) is more likely to be performed with respect to the method described in connection
with FIG. Furthermore, as discussed, the mixing formula can be quite complex or more complex
than distance-based attenuation algorithms, selective participant attenuation, and selective
participant amplification for non-directional mono applications. easy. A logic extension to this
basic mixing formula may similarly add speech directionality and / or stereo or 3D
environmental, directionality listening ability.
10-04-2019
32
[0098]
Furthermore, the audio conference bridging system 100 of the present invention is used in
conjunction with an interactive gaming application. In that case, it may be desirable to add "room
effects" such as echo, dead angle, noise and distortion to the audio mixing capability. In addition
to the third person's view of chat room 800 shown in FIG. 8, a given gaming application may also
add the first person's view in three dimensions. As used herein, an “implementation” 810 may
be any participant or participant station 110 regardless of whether its display is performed at the
first or third person's point of view. It should be understood to show a visual representation of
the Furthermore, for business meetings or certain entertainment applications, broadband audio
mixing may add significant value to the conference experience.
[0099]
Furthermore, those skilled in the art will recognize that the present invention is not limited to
just audio conferencing applications. Other types of data streams may also be adjusted. For
example, an embodiment may include a video display of a participant. Furthermore, the present
invention can be used to work collaboratively on documents in real time.
[0100]
While the present invention has been described in terms of the preferred description, it is
understood that changes and modifications can be made to the present invention without
departing from the scope or scope of the substantial invention as defined in the following claims.
It is clear enough to the trader.
[0101]
FIG. 6 is a simplified flow diagram illustrating the difference between a prior art mixing
algorithm and edge point mixing according to the present invention.
FIG. 2 is a simplified block diagram of an audio conferencing system bridge system and three
participant stations of the present invention. FIG. 3 is a simplified block diagram according to the
system illustrated in FIG. 2; FIG. 6 is a simplified block diagram of an exemplary embodiment of
10-04-2019
33
an audio conference bridge system and participant station of the present invention. FIG. 6 is a
simplified block diagram of another exemplary embodiment of the audio conferencing bridge
system and participant stations of the present invention. FIG. 1 is a block diagram of an
exemplary embodiment of the audio conference bridge system of the present invention when
implemented on a single server. It is a flowchart which shows the basic process of the method of
this invention. FIG. 6 is an exemplary diagram of a potential visual interface to a virtual chat
room enabled by the present invention. FIG. 9 is an event diagram illustrating certain events
occurring in the virtual chat room of FIG. 8 and an exemplary response of the present system.
FIG. 9 is an event diagram illustrating certain events occurring in the virtual chat room of FIG. 8
and an exemplary response of the present system. FIG. 9 is an event diagram illustrating certain
events occurring in the virtual chat room of FIG. 8 and an exemplary response of the present
system.
10-04-2019
34
Документ
Категория
Без категории
Просмотров
0
Размер файла
61 Кб
Теги
description, jp2006340376
1/--страниц
Пожаловаться на содержимое документа