close

Вход

Забыли?

вход по аккаунту

?

DESCRIPTION JP2016523001

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2016523001
Abstract An automatic mixer and method for creating a surround audio mix is disclosed. A set of
rules can be stored in the rule base. A rules engine can select a subset of the set of rules based at
least in part on the metadata associated with the plurality of stems. A mixing matrix may mix
stems according to a subset of selected rules to provide three or more output channels. [Selected
figure] Figure 3
Automatic multi-channel music mix from multiple audio stems
[0001]
The present disclosure relates to audio signal processing, and more particularly to a method of
automatically mixing multi-channel audio signals.
[0002]
In general, the process of making an audio recording begins by capturing and storing one or
more different audio objects to be synthesized into the final recording.
"Capture" in this context means converting the sound heard by the listener into storable
information. An "audio object" is the main part of audio information that can be carried as one or
more analog signals or digital data streams and saved as analog recordings, digital data files or
other data objects. Raw or unprocessed audio objects can be generally referred to as "tracks",
taking into account the times when each audio object was actually recorded on a physically
11-04-2019
1
separate track on a magnetic recording tape. Currently, "tracks" can be recorded on analog
recording tape or digitally recorded on digital audio tape or computer readable storage media.
[0003]
Audio music professionals generally use digital audio workstations (DAWs) to organize the
individual tracks into the desired final audio product that is ultimately delivered to the end user.
Generally, these final audio products are called "artistic mixes". Producing an artistic mix requires
a considerable amount of effort and expertise. Also, artistic mixes are usually approved by the
artist who owns the rights to the particular content.
[0004]
The term "stem" is widely used to describe audio objects. Also, this term is widely misunderstood,
as the meaning given to the "stem" is generally different in different contexts. During film
production, the term "stem" usually refers to a surround audio representation. For example, the
final audio used for movie audio reproduction is commonly referred to as the "print master
stem". For 5.1 representation, the print master stem consists of six channel audio: front left, front
right, center, LFE (a bass effect commonly known as a subwoofer), left back surround and right
back surround. Typically, each channel in the stem contains a mixture of multiple components
such as music, speech and sound effects. Furthermore, each of these original components can be
formed from hundreds of sound sources or "tracks". To complicate matters further, when mixing
movies, each component of the audio presentation is "printed" or recorded separately. At the
same time as creating the print master, each major component (eg, speech, music, sound effects)
can also be recorded or "printed" on the stem. These are called "DM & E", ie dialogs, music and
effects stems. Each of these components can be a 5.1 representation that includes 6 audio
channels. The DM & E stem sounds exactly the same as the print master stem when played back
in sync. The DM & E stem is created for various reasons, and dubbing foreign language dialogues
is a common example.
[0005]
The reason for creating a stem during record music production and the nature of the stem are
quite different from the "stem" of the above mentioned movie. The primary motivation for
stemming is to be able to "remix" record music. For example, popular songs that were not
11-04-2019
2
suitable for playback at a dance club can be remixed to be more compatible with dance club
music. Artists and their record labels may also make stems publicly available for promotional
reasons. The general public (usually a fairly advanced user with access to digital audio
workstations) can prepare the remix and release it for promotional purposes. The music can also
be remixed for use in video games such as the very popular Guitar Hero and Rock Band games.
Such games rely on the presence of stems that represent the individual instruments. Typically,
the stems created during record music production include music from different sources. For
example, a series of stems of a rock song can include drums, guitar (s), bass, vocal (s), keyboard
and percussion.
[0006]
A "stem" in this patent is a component or submix of an artistic mix produced by processing one
or more tracks. In general, this process can include, but is not necessarily, mixing multiple tracks.
This processing includes level correction by amplification or attenuation, spectral correction such
as low pass filtering, high pass filtering or graphic equalization, dynamic range correction such as
restriction or compression, time domain correction such as phase shift or phase delay, noise,
hum and feedback suppression , Reverberation and other processing may be included. Stems are
usually generated during the creation of artistic mixes. Usually, a stereo artistic mix consists of
four to eight stems. Depending on the mix, only two stems may be used, or more than eight
stems may be used. Each stem may contain only a single component or it may contain left and
right components.
[0007]
As the most common techniques for delivering audio content to listeners were compact disc and
radio broadcasts, the majority of artistic mixes are stereo, i.e. the majority of artistic mixes have
only two channels. The "channel" in this patent is a fully processed audio object ready to be
played to a listener through an audio playback system. However, due to the popularity of home
theater systems, many home and other venues have surround sound multi-channel audio
systems. The term "surround" includes sound source material for reproduction with more than
two speakers distributed in two or three dimensions, or more than two speakers distributed in
two or three dimensions It means playback configuration. Common surround sound formats
include 5.1 with a deep bass effect (LFE) or subwoofer channel added to 5 separate audio
channels, 5.0 including 5 audio channels excluding LFE channels, and 7 audio Includes 7.1 with
LFE channel added to channel. Surround mix of audio content has great potential to deliver a
more engaging listener experience. Surround mix can also provide higher quality reproduction,
11-04-2019
3
as the audio is played back by a large number of speakers and thus less dynamic range
compression and equalization of the individual channels. However, creating another artistic mix
designed for multi-channel playback requires an additional mixing session with artists and
mixing engineers involved. The cost of the surround artistic mix may not be approved by the
content owner or record company.
[0008]
In this patent, any audio content to be recorded and played back is called "music". The music may
be, for example, a three-minute pop tune, a non-music theater event, or a symphony.
[0009]
It is a block diagram of the conventional artistic mix production system. It is a block diagram of a
surround mix distribution system. FIG. 7 is a block diagram of another surround mix distribution
system. FIG. 7 is a block diagram of another surround mix distribution system. It is a functional
block diagram of an automatic mixer. It is a rule-based graphic representation. It is a functional
block diagram of another automatic mixer. Another rule-based graphic representation. It is a
graphic representation of the listening environment. It is a flowchart of surround mix automatic
creation processing. It is a flowchart of another surround mix automatic creation processing.
[0010]
Throughout the description, elements shown in the figures are assigned 3-digit reference
numbers, with the upper one digit being the number of the figure introducing the element and
the lower two digits being unique to the element. Elements not described in connection with the
figures can be assumed to have the same features and functions as the elements of the same
reference numerals already described.
[0011]
Device Description Referring first to FIG. 1, an artistic mix creation system 100 can include a
plurality of musicians and instruments 110A-110F, a recorder 120, and a mixer 130. Music
11-04-2019
4
made by the musicians and instruments 110A-110F can be converted to electrical signals by
transducers such as microphones, magnetic pickups and piezoelectric pickups. Some
instruments, such as electronic keyboards, can generate electrical signals directly without the
intervention of a transducer. The term "electrical signal" in the present context includes both
analog signals and digital data.
[0012]
These electrical signals can be recorded by recorder 120 as a plurality of tracks. Each track can
record the sounds produced by one musician and one instrument, or the sounds produced by
multiple instruments. In some cases, the sound produced by one musician, such as a drummer
playing a drum set, can also be captured by multiple transducers. The electrical signals from the
multiple transducers can be recorded as corresponding multiple tracks or combined into a small
number of tracks prior to recording. The various tracks synthesized into the artistic mix do not
have to be recorded simultaneously or at the same place.
[0013]
Once all the tracks to be mixed have been recorded, the mixer 130 can be used to combine these
tracks into an artistic mix. The functional elements of mixer 130 may include track processors
132A-132F, and summers 134L and 134R. Traditionally, track processors and adders have been
implemented with analog circuits that function based on analog audio signals. Currently, track
processors and adders are usually implemented using one or more digital processors, such as
digital signal processors. If there are two or more processors, it is not necessary to match the
functional division of the mixer 130 shown in FIG. 1 with the physical division of the mixer 130
among multiple processors. Multiple functional elements can be implemented in the same
processor, or any functional element can be split between two or more processors.
[0014]
Each track processor 132A-132F can process one or more recorded tracks. The processing
performed by each track processor is: addition or mixing of multiple tracks, level correction by
amplification or attenuation, spectral correction such as low pass filtering, high pass filtering or
graphic equalization, dynamic range correction such as restriction or compression, phase shift or
phase It may include some or all of time domain corrections such as delays, noise, hum and
11-04-2019
5
feedback suppression, reverberation and other processing. Vocal tracks can also be subject to
special processing such as de-essing and co-acting. Some processes, such as level correction, can
be performed on individual tracks prior to mixing or adding, while others can be performed after
multiple tracks have been mixed. The output of each track processor 132A-132F may be a
respective stem 140A-140F, and in FIG. 1 only stems 140A and 140F of these are identified.
[0015]
In the example of FIG. 1, each stem 140A-140F can include a left component and a right
component. The right adder 134R may add the right components of the stems 140A-140F to
output the right channel 160R of the stereo artistic mix 160. Similarly, the left adder 134L can
add the left components of the stems 140A-140F and output the left channel 160L of the stereo
artistic mix 160. Although not shown in FIG. 1, the signals output from the left adder 134L and
the right adder 134R may be subjected to additional processing such as limitation or dynamic
range compression.
[0016]
Each stem 140A-140F may include sounds produced by a particular instrument or group of
instruments and a musician. In this specification, the instruments or instrument groups and
musicians included in the stem are referred to as the stem's "voice". The audio can be named to
reflect the musician or instrument that contributed to the track processed to generate the stem.
For example, in FIG. 1, the output of track processor 132A may be the "strings" stem, the output
of track processor 132D may be the "vocal" stem, and the output of track processor 132E may be
the "drum" stem. The stems need not be limited to one type of instrument, and one type of
instrument can produce multiple stems. For example, strings 110A, saxophone 110B, piano 110C
and guitar 110F can be recorded as separate tracks and combined into a single "musical
instrument" stem. As a further example, in drummed music such as heavy metal, the sound
produced by drummer 110E can be used to create multiple stems such as "kick drum" stems,
"snare and cymbal" stems, and "other drum" stems. It can be integrated. These stems can have
widely different frequency spectra, and thus can be processed differently during mixing.
[0017]
The stems 140A-140F generated during creation of the stereo artistic mix 160 can be stored.
11-04-2019
6
Each stem audio object can also be associated with audio within the stem, metadata identifying
an instrument or musician. Relevant metadata may be attached to each stem audio object or
stored separately. Some or all of the stem audio objects may also be appended with other
information, such as song title, group or musician name, song genre, recording and / or mixing
date, or other information Can be stored as separate data objects.
[0018]
FIG. 2A is a block diagram of a conventional surround audio mix distribution system 200A. The
artistic mixing system 230, which may be, for example, a digital audio workstation, can be used
to create both stereo artistic mixes and surround artistic mixes 235. Stereo artistic mixes can be
used for compact disc production, conventional stereo radio broadcasts, and other applications.
Surround artistic mix 235 can be used for BluRay production (eg, BlueRay HDTV concert
recording) and other applications. Surround artistic mix 235 may also be encoded by multichannel encoder 240 and distributed, for example, via the Internet or other networks.
[0019]
Multi-channel encoder 240 can encode surround artistic mix 235 according to the MPEG-2
(Motion Picture Experts Group) standard, which allows encoding of audio mixes containing up to
6 channels in 5.1 surround audio systems . Multi-channel encoder 240 may also encode
surround artistic mix 235 according to the Free Lossless Audio Coder (FLAC) standard, which
allows for encoding of audio mixes that include up to eight channels. Multi-channel encoder 240
may also encode surround artistic mix 235 according to Advanced Audio Coding (AAC)
enhancements of the MPEG-2 and MPEG-4 standards. AAC enables the encoding of audio mixes
containing up to 48 channels. Multi-channel encoder 240 may also encode surround artistic mix
235 according to some other standard.
[0020]
The encoded audio generated by multi-channel encoder 240 may be transmitted to compatible
multi-channel decoder 250 via distribution channel 242. Distribution channel 242 may be a
wireless broadcast, a network such as the Internet or a cable TV network, or some other
distribution channel. The multi-channel decoder 250 can be reproduced or substantially
reproduced such that the channels of the surround artistic mix 235 can be provided to the
11-04-2019
7
listener by the surround audio system 260.
[0021]
As mentioned above, not every stereo artistic mix has necessarily an associated surround artistic
mix. FIG. 2B is a block diagram of another surround audio mix distribution system 200B in the
absence of a surround artistic mix of an audio program. System 200 B can synthesize surround
mixes from stem and metadata 232 generated during creation of stereo artistic mixes. The stem
and metadata 232 from the artistic mixing system 230 can be input to an automatic surround
mixer 270, which can generate a surround mix 275. The term "automatic" generally means
without operator involvement. Once the operator has started operation of the automatic
surround mixer 270, the surround mix 275 can be generated without further involvement.
[0022]
Surround mix 275 may be encoded by multi-channel encoder 240 and transmitted to compatible
multi-channel decoder 250 via distribution channel 242. Multi-channel decoder 250 can be
rendered or substantially rendered such that the channels of surround mix 275 can be provided
to the listener by surround audio system 260. In system 200B, a single surround mix generated
by automatic surround mixer 270 is delivered to all listeners.
[0023]
FIG. 2C is a block diagram of another surround audio mix distribution system 200C. System
200C allows each listener to create a customized surround mix that is suitable for the listener's
personal preferences and audio system. The stem and metadata 232 from the artistic mixing
system 230 can be input to a multi-channel encoder 245 that can encode stems that are similar
to the multi-channel encoder 240 but not (or in addition to) the channel.
[0024]
This encoded stem may then be sent to compatible multi-channel decoder 255 via distribution
channel 242. The multi-channel decoder 255 can reproduce or substantially reproduce the stem
11-04-2019
8
and metadata 232. Based on the reproduced stem and metadata, an automatic surround mixer
270 can generate a surround mix 275. The surround mix 275 may be adapted to listener
preferences and / or features of the listener's surround audio system 260.
[0025]
As can now be seen with reference to FIG. 3, an automatic surround mixer 300, such as the
automatic surround mixer 270 of FIGS. 2B and 2C, may be multi-channel surround from a stem
formed as part of the stereo artistic mix creation process. A mix can be generated. The automatic
surround mixer 300 can create multi-channel surround mixes without the need for recording
engineer or artist involvement. In this example, the automatic surround mixer 300 receives six
stems identified as stem 1-stem 6. The automatic mixer can also accept more or less than six
stems. Each stem can be mono or stereo with left and right components. In this example,
automatic surround mixer 300 outputs six channels identified as Out1-Out6. Out1 to Out6 can
correspond to the left rear channel, the left front channel, the center channel, the right front
channel, the right rear channel, and the deep bass channel suitable for a 5.1 surround audio
system. The automatic surround mixer can output eight channels or some other number of
channels in a 7.1 surround audio system.
[0026]
The automatic surround mixer 300 comprises respective stem processors 310-1 to 310-6 for
each input stem, a mixing matrix 320 for combining the processed stems in various proportions
to provide an output channel, and And a rules engine 340 for processing and determining
whether to mix.
[0027]
Each stem processor 310-1 to 310-6 performs level correction by amplification or attenuation,
low pass filtering, spectrum correction by high pass filtering and / or graphic equalization,
restriction, dynamic range correction by compression or decompression, noise, hum and
feedback suppression Processing such as reverberation and other processing can be performed.
One or more of the stem processors 310-1 through 310-6 may also perform special processing
such as de-essing and co-ordination to vocal tracks. One or more of the stem processors 310-1
through 310-6 may also provide multiple outputs that receive different processing. For example,
11-04-2019
9
one or more of stem processors 310-1 through 310-6 may provide the low frequency portion of
their respective stems for incorporation into the LFE channel, and the high frequency portions of
their respective stems of the other output channels. It can be provided to be incorporated into
one or more.
[0028]
Each stem input to the automatic surround mixer 300 may have already received part or all of
these processes as part of stereo artistic mix creation. Thus, the processing performed by the
stem processors 310-1 to 310-6 can be minimized to maintain the general sound and feel of the
stereo artistic mix. For example, adding reverb to some or all of the stems and low pass filtering
to provide an LFE channel may be the only processing performed by the stem processor.
[0029]
Each of the stem processors 310-1 to 310-6 can process the respective stem in accordance with
the effect parameters 342 provided by the rules engine 340. The effect parameters 342 may be,
for example, the amount of attenuation or gain, the breakpoint frequency and slope of any
filtering to be applied, equalization coefficients, compression or recovery coefficients, data
specifying reverberation delay and relative amplitude, and for each stem Other parameters may
be included that define the processing to be applied.
[0030]
The mixing matrix 320 may combine the outputs from the stem processors 310-1 to 310-6 to
provide an output channel according to the mixing parameters 344 provided by the rules engine.
For example, mixing matrix 320 can generate each output channel according to the following
equation: <img class = "EMIRef" id = "391183052-000003" /> (1) C j (t) = output channel j at
time t, S i = output of stem processor i at time t, a ij = amplitude coefficient, d ij = time delay, n =
number of stems used in the mix. The mixing parameters 344 may include amplitude coefficients
a ij and time delays d ij.
[0031]
11-04-2019
10
The rules engine 340 can determine the effect parameters 342 and the mixing parameters 344
based at least in part on the metadata associated with the input stems. Metadata can be
generated during creation of the stereo artistic mix, added to each stem object, and / or included
in a separate data object. Metadata may include, for example, the type of audio or instrument
contained in each stem, genre of play or other qualitative description, data indicating the
processing performed on each stem during creation of the stereo artistic mix, and others.
Information can be included. The metadata may also include descriptive material, such as a
program title or artist, that is of interest to the listener but is not used during creation of the
surround mix.
[0032]
When it is not possible to provide appropriate metadata with the stems, it is possible to create
metadata including the audio and music genres of each stem through content analysis of each
stem. For example, the spectral components of each stem can be analyzed to estimate what
speech is contained in the stem, and the music genre can be estimated by combining the stem
rhythm components with the speech present in the stem It can also be done.
[0033]
The automatic surround mixer 300 can be incorporated into the listener's surround audio
system. In this case, the rules engine 340 can access configuration data indicating which
surround audio system configuration (5.0, 5.1, 7.1, etc.) to use to provide the surround mix. If the
automatic surround mixer 300 is not integrated into the surround audio system, the rules engine
340 may receive information indicative of the surround audio system configuration, for example
as a manual input by a listener. The information indicative of the surround audio system
configuration can be automatically obtained from the audio system, for example by
communication via an HDMI (High Definition Media Interconnection) connection.
[0034]
The rules engine 340 can determine the effect parameters 342 and the mixing parameters 344
using a series of rules stored in the rule base. The term "rules" in this patent includes logical
descriptions, tabular data, and other information used to generate effect parameters 342 and
11-04-2019
11
mixing parameters 344. The rules can be built empirically, ie, based on the collective experience
of one or more acoustic engineers who have created one or more artistic surround mixes. Rules
can be constructed by collecting and averaging mixing and effect parameters of multiple artistic
surround mixes. The rule base 346 may include different rules for different music genres, and
may also include different rules for different surround audio system configurations.
[0035]
In general, each rule can include a condition and an action to be taken if the condition is met. The
rules engine can evaluate the available data (i.e., metadata and speaker configuration data) to
determine which rule conditions are met. Next, the rules engine 340 determines what action the
satisfied rule dictates, resolves any conflicts between the actions and causes the indicated action
to take place (ie, effect parameters 342 and mixing Parameters 344 can be set).
[0036]
The rules stored in the rule base 346 may be in the form of flats. For example, the rules stored in
the rule base 346 can include “migrate lead vocals to center channel”. As mentioned above,
this rule applies to all music genres and all surround audio system configurations. The conditions
in the rules are unique, ie the rules apply only if a lead vocal stem is present.
[0037]
More typical rules can also have explicit conditions. For example, the rules stored in the rule base
346 are: “If the audio system has a subwoofer, the low frequency components of the drum,
percussion and base stem are transferred to the LFE channel, otherwise the drum, percussion and
base stem Of the low frequency components of H.sub.i.sup.2 in the left front channel and the
right front channel. The explicit conditions of the rule can include logical expressions ("and", "or",
"not", etc.).
[0038]
A general rule may have conditions such as "if the genre of music is X and the audio is Y, ...".
11-04-2019
12
Such rules and other types of rules may be stored in the rule base 346 in tabular form. For
example, as shown in FIG. 4, the rule can be organized as a three-dimensional table 400 in which
three coordinate axes represent stem sounds, genres and channels. Each entry 410 may include
mixing parameters (level and delay factors) and effect parameters for a particular combination of
stem audio and genre. Table 400 is specific to 5.1 surround audio configuration. Other surround
audio configurations may store different tables in the rule base.
[0039]
For example, row 420 of table 400 implements the rule of "5.1 lead audio system and transition
lead vocals to center channel in this particular genre," assuming that lead vocal stems are not
subjected to sound effect processing. . As a further example, row 430 of table 400 "transfers the
low frequency components of the drum stem to the LFE channel in the 5.1 surround audio
system and this particular genre, and the high frequency components of the drum stem to the
front left channel and the front right Implement the rule of “divide by channel”.
[0040]
Referring again to FIG. 3, if the rule base 346 includes tabular rules, the rules engine can use the
metadata and surround audio configuration to read out the effect parameters 342 and mixing
parameters 344 from the appropriate table. The rules engine 340 may rely solely on tabular
rules or may have additional rules to handle situations where the tabular rules are not adequately
addressed. For example, a small number of successful rock bands employ two drummers, and
many recorded songs feature two lead vocalists. These situations can be dealt with by additional
table entries or by additional rules such as “put one weight to the left and put the other to the
right if the voices of the two stems are the same”. it can.
[0041]
The rules engine 340 can also receive data indicative of listener preferences. For example, the
listener can be given the option to select a standard mix and a non-standard mix such as an
acapella mix (of vocals only) or a "karaoke" mix (with reduced lead vocals). When a non-standard
mix is selected, some of the mixing parameters selected by the rules engine 340 can be
invalidated.
11-04-2019
13
[0042]
The functional elements of automatic surround mixer 300 may be implemented by one or more
processors executing analog circuitry, digital circuitry, and / or an automatic mixer software
program. For example, stem processors 310-1 through 310-6 and mixing matrix 320 may be
implemented using one or more digital processors, such as digital signal processors. The rules
engine 340 can be implemented using a general purpose processor. If there are two or more
processors, it is not necessary to match the functional division of the automatic surround mixer
300 shown in FIG. 3 with the physical division of the automatic surround mixer 300 between
multiple processors. Multiple functional elements can be implemented in the same processor, or
any functional element can be split between two or more processors.
[0043]
As can now be seen with reference to FIG. 5, the automatic surround mixer 500 can include stem
processors 310-1 to 310-6 that process the respective stems according to the effect parameters
342 as described above. The automatic surround mixer 500 can include a mixing matrix 320 for
combining the outputs from the stem processors 310-1 through 310-6 according to the mixing
parameters 344 as described above.
[0044]
The automatic surround mixer 500 may also include a rules engine 540 and a rules base 546.
The rules engine 540 can determine the effect parameters 342 based on the metadata and
surround audio system configuration data as described above.
[0045]
The rules engine 540 can not determine the mixing parameters 344 directly, but can determine
relative audio position data 548 based on the rules stored in the rule base 546. Each relative
audio position can indicate the position of the hypothetical sound source of the respective stem
on the virtual stage. For example, the rule base 546 may not include the rule of “transfer lead
vocals to the center channel”, but may include the rule of “position lead vocalist in front of
11-04-2019
14
center of stage”. Similar rules allow other audio / musician positions on the virtual stage to be
defined for different genres.
[0046]
A general rule may have conditions such as "if the genre of music is X and the audio is Y, ...".
Such rules may be stored in tabular form in rule base 546. For example, as shown in FIG. 6, a rule
can be systematized as a two-dimensional table 600 whose coordinate axes represent stem
speech and genre. Each entry 610 may include position and effect parameters of a particular
combination of stem audio and genre. The table 600 may not be specific to any particular
surround audio configuration.
[0047]
The rules described in the previous paragraph were simple examples. Again, although exemplary,
a more complete set if rule will be described with reference to FIG. FIG. 7 shows an environment
including a listener 710 and a series of speakers denoted C (center), L (left front), R (right front),
LR (left rear) and RR (right rear) . The center speaker C is located at an angle of 0 degrees with
respect to the listener 710 by definition. The left front and right front speakers L and R are
located at angles of -30 degrees and +30 degrees, respectively. The left rear and right rear
speakers LR, RR are located at angles of -110 degrees and +110 degrees, respectively. The
subwoofer or LFE speaker is not shown in FIG. The listener can hardly detect the direction of the
very low frequency sound. Thus, the relative position of the LFE speakers is not important.
[0048]
A set of rules for mixing stems can be expressed in terms of the apparent angle from the listener
to the sound source of the stem. The following exemplary set of rules can provide a pleasing
surround mix of songs of various genres. The rules are described in italics. Place the drum at ±
30 ° and the reverberation drum component at ± 110 °. Drums are considered the "skeleton"
of most types of popular music. Usually, in a stereo mix, the drums are evenly distributed
between the left and right speakers. In 5.1 surround representation, there is an option to give the
illusion that the drum is in the room surrounding the listener. Therefore, by dividing the drum
stem between the front left channel and the front right channel, and echoing and damping the
drum stem to the left back and right back speakers (± 110 °), the drum becomes “front” of
11-04-2019
15
the listener It can give the listener the impression that it exists and that there is an echo of a
"virtual room" behind the listener. Place the base at 0 ° at -3 db and make the L / R contribution
+1.5 db. Usually, in stereo mix, the bass guitar is at the "pseudo center" like a drum (divided
equally between left and right channels). For 5.1 mix, you can extend the base stem to the left
speaker, right speaker and center speaker in the following way. Place the base stem in the center
channel and lower the level by -3 db, then add -1.5 db equally to the front left and front right
speakers. Place the rhythm guitar at -60 °. Looking closely at FIG. 7, it can be seen that no
speaker is present at -60 °. The rhythm guitar stem can be split between the left front speaker L
and the left rear speaker LR to simulate a -60 simulated source. Place the keyboard at + 60 °.
The keyboard stem can be split between the right front speaker L and the right rear speaker LR
to simulate a -60 degree simulated sound source. ・ Place the chorus at ± 90 °. The chorus
stem can be divided between the left front and right front speakers L, R and the left back and
right rear speakers LR, RR to simulate ± 90 ° simulated sound sources. Place the percussion at
± 110 °. The percussion stem can be divided between the left rear and right rear speakers LR,
RR.
Place the lead vocals at 0 ° at -3 db and make the L / R contribution +1.5 db. Typically, lead
vocals are provided to the "pseudo center" of a typical stereo mix. Spreading the lead vocal over
the center, left and right channels preserves the lead vocalist's apparent position, adding richness
and complexity to the expression.
[0049]
Referring again to FIG. 5, if the rule base 546 includes tabular rules, the rules engine 540 may
use the metadata and surround audio configuration to read out the effect parameters 342 and
the audio position data 548 from the appropriate table. it can. The rules engine 540 may either
rely entirely on tabular rules, or have additional rules to handle situations where the tabular rules
do not adequately address as described above.
[0050]
The rules engine 540 can also receive data indicative of listener preferences. For example, the
listener can be given the option to select a standard mix and a non-standard mix such as an
acapella mix (of vocals only) or a "karaoke" mix (with reduced lead vocals). The listener may also
have the option to select an "education" mix where each stem is sent to a single speaker channel
so that the listener can focus on a particular instrument. When a non-standard mix is selected,
11-04-2019
16
some of the mixing parameters selected by the rules engine 540 can be invalidated.
[0051]
Rules engine 540 may provide coordinate position processor 550 with audio position data 548.
Coordinate processor 550 may receive listener selections for virtual listener positions for virtual
stages where speech is present. This listener selection can be done, for example, by prompting
the listener to select one of two or more predetermined selective locations. Possible virtual
listener location options may include "in-band" (e.g., the center surrounded by the audio of the
virtual stage), "central front row", and / or "middle of the audience". The coordinate processor
550 may then generate mixing parameters 344 that cause the mixing matrix 320 to process the
processing of the processed stem into a channel that provides the desired listener experience.
[0052]
Coordinate processor 550 may also receive data indicating the relative position of the speakers
in the surround audio system. Coordinate processor 550 may use this data to refine the mixing
parameters to correct, at least to some extent, for misalignment of the loudspeaker arrangement
relative to the nominal loudspeaker arrangement (such as the loudspeaker arrangement shown in
FIG. 7). For example, the coordinate processor may correct to some extent the asymmetry of the
loudspeaker position such that the left front and right front speakers are not in symmetrical
position with respect to the center speaker.
[0053]
The functional elements of automatic surround mixer 500 may be implemented by one or more
processors executing analog circuitry, digital circuitry, and / or an automatic mixer software
program. For example, stem processors 310-1 through 310-6 and mixing matrix 320 may be
implemented using one or more digital processors, such as digital signal processors. The rules
engine 540 and the coordinate processor 550 can be implemented using one or more general
purpose processors. When two or more processors are present, the functional division of the
automatic surround mixer 500 shown in FIG. 5 may not be identical to the physical division of
the automatic surround mixer 500 among multiple processors. Multiple functional elements can
be implemented in the same processor, or any functional element can be split between two or
more processors.
11-04-2019
17
[0054]
Process Description As can now be seen with reference to FIG. 8, the process 800 for providing a
surround mix of music may begin at 805 and end at 895. Process 800 is based on the premise
that a stereo artistic mix of songs is first created and then a multi-channel surround mix is
automatically generated from the stems saved during creation of the stereo artistic mix.
[0055]
At 810, a rule base such as rule bases 346 and 546 can be constructed. The rule base can
include rules for combining stems into surround mixes. These rules can be constructed by
analysis of past artistic surround mixes, a collection of unified views and practices of recording
engineers with experience creating artistic surround mixes, or in some other way. The rule base
may include different rules for different music genres and different rules for different surround
audio configurations. The rules in the rule base can be represented in tabular form. The rule base
is not necessarily permanent, but can be extended over time, for example to incorporate new
mixing techniques and new music genres.
[0056]
The initial rule base can be prepared before, during or after the first song is recorded and the
first artistic stereo mix is created. The initial rule base must be built before the surround mix can
be generated automatically. The rule base built at 810 can be sent to one or more automatic
mixing systems. For example, the rule base can be built into the hardware of each automatic
surround mixing system or can be transmitted to each automatic surround mixing system via a
network.
[0057]
At 815, the track of the song can be recorded. At 820, the track obtained at 815 may be
processed and combined using known techniques to create an artistic stereo mix. This artistic
stereo mix can be used for conventional purposes such as recording CDs and radio broadcasts.
11-04-2019
18
During the creation of the artistic stereo mix at 820, two or more stems can be generated. Each
stem can be generated by processing one or more tracks. Each stem can be a component or
submix of a stereo artistic mix. In general, a stereo artistic mix can consist of four to eight stems.
Depending on the mix, only two stems may be used, or more than eight stems may be used. Each
stem may include only a single channel or may include left and right channels.
[0058]
At 825, metadata can be associated with the stem created at 820. Metadata can be generated
during creation of the stereo artistic mix at 820 and can be attached to each stem object and / or
included in a separate data object. The metadata may for example be the sound of each stem (ie
the type of instrument), the genre of the music or other qualitative description, data indicating
the processing performed on each stem during the creation of the stereo artistic mix, and Other
information can be included. Metadata can also include descriptive material, such as song titles
or artist names, that are of interest to the listener but are not used during creation of the
surround mix.
[0059]
When the appropriate metadata is not available from 820, metadata can be extracted at 825
from the content of each stem including the audio and music genre of each stem. For example,
the spectral components of each stem can be analyzed to estimate what speech is contained in
the stem, and the music genre can be estimated by combining the stem rhythm components with
the speech present in the stem It can also be done.
[0060]
At 845, stem and metadata from 825 can be obtained by automatic surround mixing process
840. The automatic surround mixing process 840 can be performed at the same location using
the same system as stereo mixing at 820. In this case, at 845, the automatic mixing process can
simply read the metadata and stem from memory. The automatic surround mixing process 840
can also be performed at one or more locations away from stereo mixing. In this case, at 845, an
automatic mixing process 840 can receive the stem and associated metadata via a distribution
channel (not shown). The distribution channel may be a wireless broadcast, a network such as
the Internet or a cable TV network, or some other distribution channel.
11-04-2019
19
[0061]
At 850, metadata associated with the stem and surround audio configuration data can be used to
extract applicable rules from the rule base. The automatic surround mixing process 840 may also
select rules using data indicating the target surround audio configuration (e.g., 5.0, 5.1, 7.1). In
general, each rule may define explicit or intrinsic conditions and one or more actions to be
performed if the conditions are met. Rules can be expressed as logical statements. Some or all of
the rules can also be represented in tabular form. The extraction of applicable rules at 850 may
include selecting only those rules that have conditions that are met by the metadata and
surround audio configuration data. The operation defined by each rule can include, for example,
setting of mixing parameters, effect parameters, and / or relative positions of a specific stem.
[0062]
At 855 and 860, the extracted rules can be used to set mixing parameters and effect parameters,
respectively. The operations at 855 and 860 can be performed in any order or in parallel.
[0063]
At 865, the stem can be processed to a channel of a surround audio system. Processing the stem
into a channel can include performing processing on some or all of the stems according to the
effect parameters set at 870. Processing that can be performed includes level correction by
amplification or attenuation, low-pass filtering, spectral correction by high-pass filtering and / or
graphic equalization, dynamic range correction by restriction, compression or restoration, noise,
hum and feedback suppression, reverberation and other processing. Can be mentioned. Also,
special processing such as de-essing and co-acting can be performed on the vocal stem. One or
more of the stems can also be split into multiple components that undergo different treatments,
such that they can be included in multiple channels. For example, one or more of the stems can
be processed to provide a low frequency portion for incorporation into the LFE channel, and a
high frequency portion for incorporation into one or more of the other output channels.
[0064]
11-04-2019
20
At 870, the processed stems from 865 can be mixed into the channel. These channels can be
input to a surround audio system. Optionally, the channel can be recorded for future playback.
Process 800 may end at 895 after the completion of the song.
[0065]
Referring now to FIG. 9, another process 900 for providing a surround mix of music may begin at
905 and end at 995. Process 900 is similar to process 700 except for the actions 975 and 980.
Although descriptions of essentially duplicate elements are not repeated, any elements not
described in connection with FIG. 9 have the same function as the corresponding elements in FIG.
[0066]
At 975, the rules extracted at 750 can be used to determine the relative audio position of each
stem. Each relative audio position can indicate the position of the hypothetical sound source of
the respective stem on the virtual stage. For example, the rule extracted at 750 may be "position
lead vocalist in front of center of stage". Similar rules can define the positions of other audio /
musicians on the virtual stage for different genres.
[0067]
The automatic surround mixing process 940 can receive an operator selection of virtual listener
positions for the virtual stage whose audio position has been determined at 975. The operator
selection can be made by prompting the listener to select one of two or more predetermined
selective positions, for example. Exemplary options for virtual listener locations include "in-band"
(e.g., the center surrounded by the audio of the virtual stage), "central front row", and / or
"middle of the audience".
[0068]
The automatic surround mixing process 940 may also receive data indicating the relative
11-04-2019
21
position of the speakers in the surround audio system. Using this data, it is possible to refine the
mixing parameters so as to at least partially correct the asymmetry of the speaker arrangement
such that the center speaker is not arranged at the center of the left front speaker and the right
front speaker.
[0069]
At 980, if the selected virtual listener position and speaker position data are available, the audio
position defined at 975 can be converted to mixing parameters taking these into consideration.
At 770, the mixing parameters from 980 can be used to mix the processed stems from 765 into a
channel that provides the desired listener experience.
[0070]
Although not shown in FIG. 8 or 9, the automatic surround mixing process 840 or 940 may also
receive data indicative of listener preferences. For example, the listener can be given the option
to select a standard mix and a non-standard mix such as an acapella mix (of vocals only) or a
"karaoke" mix (with reduced lead vocals). When a non-standard mix is selected, some of the rules
extracted at 850 or 950 can be invalidated.
[0071]
Conclusion The embodiments and examples presented throughout this description should be
considered as illustrative rather than limiting on the disclosed or claimed devices and
procedures. Although many of the examples presented herein include specific combinations of
method acts or system elements, it should be understood that these acts and elements may be
combined in other ways to achieve the same purpose. With respect to the flowcharts, additional
and fewer steps may be employed, and the illustrated steps may be combined or further refined
to implement the methods described herein. The acts, elements and features described in
connection with only one embodiment are not excluded from the same role in other
embodiments.
[0072]
11-04-2019
22
As used herein, "plural" means two or more. As used herein, a "series" of items can include one or
more of such items. Terms such as "comprising, including, carrying, having, containing and
including", as used herein, are used herein regardless of whether they are used in the
specification or in the claims. It should be understood to be non-limiting, meaning including but
not limited to. With respect to the claims, only the transitional phrases "consisting of" and
"consisting essentially of" are restrictive or semi-restrictive transitional phrases, respectively. The
use of ordinal terms such as “first”, “second”, “third” and the like to modify the claim
elements in the claims does not limit the priority, priority, or the like of any of these. Or it does
not imply that a claim element is in order of another element or implying a temporal order in
which the method acts, but it does not delimit one claim element having one name with another
element of the same name. Separately (but with regard to the use of ordinal terms), it is merely a
notation to distinguish between claim elements. As used herein, “and / or” means that the
items listed are options, but that any combination of the items listed is also included in this
option.
[0073]
300 automatic surround mixer 310-1 stem processor 310-2 stem processor 310-3 stem
processor 310-4 stem processor 310-5 stem processor 310-6 stem processor 320 mixing matrix
340 rule engine 342 effect parameter 344 mixing parameter 346 rule base
11-04-2019
23
Документ
Категория
Без категории
Просмотров
0
Размер файла
41 Кб
Теги
description, jp2016523001
1/--страниц
Пожаловаться на содержимое документа