close

Вход

Забыли?

вход по аккаунту

?

JP2015050530

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2015050530
PROBLEM TO BE SOLVED: To separate a voice from a main subject captured by a single stereo
microphone and a voice from a photographer. An FFT (207) performs fast Fourier transform on
audio signals from stereo microphones (205A, 205B). A voice band extraction filter (208)
extracts band components of human voice. The subtractor (213) calculates the difference
between the L channel and the R channel, and the adder (214) calculates the sum. The level
detection unit (215) detects the output levels of the subtractor (213) and the adder (214), and
the determination unit (216) performs the correction process of the correction unit (209) from
the output of the level detection unit (215). Determine the necessity. The correction unit (209)
subtracts an amount corresponding to the difference between the L channel and the R channel
from the output of each channel of the FFT (207). The IFFT (210) inverse Fourier transforms the
output of the correction unit (209). The IFFT (219) inverse Fourier transforms the output of the
subtractor (213). [Selected figure] Figure 2
Imaging device and audio signal processing device
[0001]
The present invention relates to an imaging device and an audio signal processing device, and
more particularly to an imaging device and an audio signal processing device provided with two
audio input means.
[0002]
Conventionally, a video camera is known in which images of a plurality of shooting directions can
03-05-2019
1
be simultaneously recorded by extracting a plurality of arbitrary parts of a captured image by an
imaging device and recording each of the images (Patent Document 1) .
In this video camera, an audio processing circuit is described in which the directivity of the
stereo microphone is directed in the shooting direction by performing addition / subtraction
processing of each audio signal output of the stereo microphone according to the shooting
direction.
[0003]
2. Description of the Related Art Some portable information terminals represented by
smartphones include two imaging units, an imaging unit for imaging the front (subject side) and
an imaging unit for imaging the rear (photographer side). Even in such a device, it is usual that
only a single stereo microphone is used for audio, and the audio from the stereo microphone is
recorded as it is regardless of whether the imaging is performed by the front imaging unit or the
rear imaging unit. There is.
[0004]
Japanese Patent Application Publication No. 2007-300220
[0005]
According to the technology described in Patent Document 1, although images in a plurality of
shooting directions can be recorded, only one shooting direction can be handled for audio.
[0006]
In the case of simultaneously recording an image by the front imaging unit and an image by the
rear imaging unit in an apparatus having a front imaging unit and a rear imaging unit, it is
desirable to be able to separate the sound of both, but in the conventional example this is
realized Not done.
Of course, it is preferable that only the sound from the front can be separated and reproduced
when the sound from the back is mixed in at the time of photographing by the front imaging unit.
03-05-2019
2
On the contrary, when the voice from the front mixes in at the time of photography by the image
pick-up for back, it is preferred to be able to separate and reproduce only the voice from the
back.
[0007]
The present invention relates to an imaging device and an audio signal processing device that
eliminate such a disadvantage.
[0008]
In order to achieve the above object, the imaging apparatus according to the present invention
has a first imaging means for photographing a front subject, a second imaging means for
photographing a rear subject, a voice from the front subject, and the rear subject An imaging
device having a plurality of microphones for converting voice including voice from a voice to a
voice signal, the frequency conversion means for frequency converting voice signals of a plurality
of channels taken in by the plurality of microphones; Separation means for separating the
component of the sound from the front subject and the component of the sound from the rear
subject from the output of each channel of the means, and the frequency of the component of the
sound from the front subject separated by the separation means First inverse frequency
conversion means for converting means; and second inverse frequency conversion means for
inverse frequency converting means for audio components from the rear subject separated by
the separation means. Characterized in that it has.
[0009]
According to the present invention, each sound can be separated from the sound signal including
the sound from the front subject and the sound from the rear subject, and as a result, can be
separately recorded in association with the corresponding image.
[0010]
FIG. 1 is a schematic block diagram of an embodiment of the present invention.
It is a schematic block diagram of the principal part of a present Example.
03-05-2019
3
It is a schematic diagram explaining the input to the microphone of the sound from a front
subject, and the sound from a back subject (photographer).
It is a flowchart of the correction process of a present Example. It is a schematic diagram of a
frequency spectrum. It is a figure which shows an example of ALC diagram.
[0011]
Hereinafter, embodiments of the present invention will be described in detail with reference to
the drawings.
[0012]
FIG. 1 is a schematic block diagram showing a basic configuration of an embodiment of an
imaging apparatus according to the present invention.
[0013]
The imaging unit 101A is an imaging unit for shooting a main subject (forward subject) in the
forward direction.
The imaging unit 101A converts an optical image of a subject captured by a photographing lens
into an image signal by an imaging element, performs analog-to-digital conversion, image
adjustment processing, and the like, and generates image data.
The photographing lens may be a built-in lens or a detachable lens. The imaging device may be a
photoelectric conversion device represented by CCD or CMOS.
[0014]
The imaging unit 101B is an imaging unit for capturing, for example, a photographer (rear
subject) in the rear direction. Similar to the imaging unit 101A, the imaging unit 101B converts
03-05-2019
4
an optical image of a subject captured by a photographing lens into an image signal by an
imaging device, performs analog-to-digital conversion, image adjustment processing, and the like,
and generates image data. The photographing lens may be a built-in lens or a detachable lens.
The imaging device may be a photoelectric conversion device represented by CCD or CMOS. The
number of pixels of the imaging device may be smaller than that of the imaging unit 101A.
[0015]
The audio input unit 102 captures audio around the imaging device 100 with a built-in
microphone (or a microphone connected via an audio terminal), performs analog-to-digital
conversion, audio processing, and the like, and generates audio data. The microphone may be
directional or omnidirectional, but in the present embodiment, an omnidirectional microphone is
used.
[0016]
The memory 103 temporarily stores the image data output from the imaging unit 101A and the
imaging unit 101B and the audio data output from the audio input unit 102.
[0017]
The display control unit 104 displays an image related to image data obtained by the imaging
unit 101A and the imaging unit 101B, an operation screen of the imaging device 100, a menu
screen, and the like on the display unit 105 or an external display via a video terminal (not
shown). Let
[0018]
The encoding processing unit 106 reads out the image data and audio data temporarily stored in
the memory 103, performs predetermined encoding, and generates compressed image data and
compressed audio data.
Audio data may be uncompressed.
Image data may be, for example, MPEG2 or H.264. Although H.264 / MPEG4-AVC or the like is
03-05-2019
5
compression encoded, other compression methods may be used. Audio data is compressed by a
compression method such as AC3, AAC, ATRAC or ADPCM.
[0019]
The recording / reproducing unit 107 records, on the recording medium 108, the compressed
image data and the compressed audio data (or non-compressed audio data) generated by the
encoding processing unit 106 and various data in the recording medium 108. Read from 108.
The recording medium 108 may be any recording medium such as a magnetic disk, an optical
disk, or a semiconductor memory as long as it is a data recording medium capable of recording
the compressed image data and the like.
[0020]
The control unit 109 can control each block of the imaging device 100 by transmitting a control
signal to each block of the imaging device 100, and includes a CPU, a memory, and the like for
executing various controls. A memory used by the control unit 109 includes a ROM for storing
various control programs, a RAM for arithmetic processing, and the like.
[0021]
The operation unit 110 includes a button, a dial, and the like, and transmits an instruction signal
according to the user's operation to the control unit 109. The operation unit 110 includes a
shooting button for instructing start and end of moving image recording, a zoom lever for
optically enlarging and reducing an image electronically, and a cross key and an enter key used
for setting and various adjustment of sound recording level reproduction level. including. For
example, the mode change switch of the operation unit 110 is used to specify power on / off and
an operation mode.
[0022]
The operation mode typically includes a moving image recording mode and a reproduction mode.
The moving image recording mode is a mode in which moving image data obtained by the
03-05-2019
6
imaging unit 101A and the imaging unit 101B and audio data obtained by the audio input unit
102 are recorded in the recording medium 108. Depending on the setting, there are a mode for
recording moving image data obtained by one of the imaging unit 101A and the imaging unit
101B, and a mode for recording both. When both are recorded, audio is processed as will be
described later, and two types of audio data corresponding to moving image data on each side
are generated and recorded on the recording medium 108 as a pair with the moving image. The
reproduction mode is a mode in which the compressed image data and the compressed audio
data recorded on the recording medium 108 are reproduced by the recording and reproducing
unit 107, the reproduced image is displayed on the display unit 105, and the reproduced sound
is output from the speaker 112.
[0023]
The external output unit 113 outputs the compressed video data and the compressed audio data
reproduced by the recording and reproducing unit 107 to an external device.
[0024]
The data bus 114 supplies various data such as audio data and image data and various control
signals to each block of the imaging device 100.
[0025]
The basic operation of the imaging device 100 will be described.
When the user turns on the power switch of the operation unit 110, the imaging apparatus 100
supplies power to each block of the imaging apparatus from a power supply unit (not shown).
[0026]
When the power is supplied, the control unit 109 confirms the operation mode set by the mode
change switch of the operation unit 110.
[0027]
In the moving image recording mode, the control unit 109 first transmits a control signal to shift
03-05-2019
7
to the shooting standby state to each block of the imaging apparatus 100.
Each block operates as follows according to this control signal.
[0028]
The imaging unit 101A converts an optical image of a subject captured by a photographing lens
into an image signal by an imaging element, performs analog-to-digital conversion, image
adjustment processing, and the like, and generates image data.
The imaging unit 101A transmits the generated image data to the display control unit 104, and
the display control unit 104 causes the display unit 105 to display a corresponding image. The
imaging unit 101B also operates in the same manner as the imaging unit 101A. The user can
select which image of the imaging units 101A and 101B or both images are displayed on the
display unit 105. The user prepares for shooting (recording) while viewing the image displayed
in this manner.
[0029]
The voice input unit 102 converts analog voice signals obtained by the plurality of microphones
into digital, processes the obtained plurality of digital voice signals, and generates multi-channel
voice data. The audio input unit 102 transmits the obtained audio data to the audio output unit
111, and the audio output unit 111 drives the speaker 112 to be connected to output sound.
Instead of the speaker 112, an earphone (not shown) may be used. The user adjusts the
recording volume while listening to the audio output in this manner.
[0030]
When the user operates the recording button of the operation unit 110, the control unit 109
transmits an imaging start instruction signal to each block of the imaging device 100, and causes
the following operation. Here, it is assumed that the front imaging by the imaging unit 101A and
the rear imaging by the imaging unit 101B are simultaneously performed.
03-05-2019
8
[0031]
The imaging unit 101A converts an optical image of a subject captured by a shooting lens into an
image signal by an imaging element, performs analog-to-digital conversion, image adjustment
processing, and the like, generates image data, and stores the image data in the memory 103.
The display control unit 104 reads the image data in the memory 103 and causes the display
unit 105 to display a corresponding image. The imaging unit 101B operates in the same manner,
image data generated by the imaging unit 101B is also stored in the memory 103, and a
corresponding image is displayed on the display unit 105.
[0032]
The voice input unit 102 takes in surrounding voice, separates voice data from the front and
voice data from the back by a method described later, and stores the voice data in the memory
103.
[0033]
The encoding processing unit 106 reads out the image data and the audio data temporarily
stored in the memory 103 and compresses and encodes the data to generate compressed image
data and compressed audio data.
The control unit 109 combines the compressed image data and the compressed audio data
generated by the encoding processing unit 106 with the front one and the rear one to form a
data stream, and outputs the data stream to the recording and reproduction unit 107. When the
audio data is not compressed, the control unit 109 combines the audio data stored in the
memory 103 and the compressed image data, forms a data stream, and outputs the data stream
to the recording / reproducing unit 107. The recording / reproducing unit 107 writes each data
stream as one moving image file on the recording medium 108 under file system management
such as UDF or FAT.
[0034]
The above operation is continued during shooting.
03-05-2019
9
[0035]
When the user operates the recording button of the operation unit 110 again, the control unit
109 transmits an instruction signal of imaging completion to each block of the imaging device
100, and causes the following operation.
[0036]
The imaging units 101A and 101B stop imaging, and the voice input unit 102 stops capturing
voice.
When the encoding processing unit 106 finishes reading out the image data and audio data
stored in the memory 103 and generating the compressed image data and the compressed audio
data, the encoding processing unit 106 stops the operation.
When the audio data is not compressed, the encoding processing unit 106 naturally stops the
operation when the generation of the compressed image data is completed.
[0037]
The control unit 109 combines the last compressed image data and the compressed audio data
(or audio data) with the front one and the rear one to form a data stream, and outputs the data
stream to the recording / reproducing unit 107. . When the supply of each data stream is
stopped, the recording and reproducing unit 107 completes the moving image file and stops the
recording operation.
[0038]
After the recording operation is stopped, the control unit 109 transmits a control signal
instructing to shift to the imaging standby state to each block of the imaging apparatus 100, and
returns to the imaging standby state.
[0039]
In the reproduction mode, the control unit 109 transmits a control signal instructing to shift to
the reproduction state to each block of the imaging apparatus 100, and causes the following
03-05-2019
10
operation.
[0040]
The recording and reproducing unit 107 reads a moving image file composed of compressed
image data and compressed audio data recorded on the recording medium 108 and sends the file
to the encoding processing unit 106.
The encoding processing unit 106 decodes the compressed image data and the compressed
audio data, and transmits the reproduced image data to the display control unit 104 and the
reproduced audio data to the audio output unit 111.
[0041]
The display control unit 104 causes the display unit 105 to display an image corresponding to
the reproduction image data.
The sound output unit 111 drives the speaker 112 with the reproduction sound data, and
outputs sound from the speaker 112.
[0042]
FIG. 2 is a schematic block diagram of portions of the imaging units 101A and 101B and the
audio input unit 102 of the imaging device 100.
[0043]
The imaging unit 101A includes a lens 201A, an imaging element 202A, an image processing
unit 203A, and a lens control unit 204A.
The lens 201A takes in an optical image of a main subject in front, and enters the imaging
element 202A. The imaging element 202A converts an optical image of a subject captured by the
03-05-2019
11
lens 201A into an electrical signal (image signal). The image processing unit 203A converts an
analog image signal obtained by the imaging element 202A into a digital image signal, performs
image quality adjustment processing to form image data, and writes the image data in the
memory 103. The lens control unit 204A controls the focal length and the zoom of the lens 201A
according to the output of a position sensor (not shown) that detects the position of the lens
201A and the control signal from the control unit 109. The lens 201A and the lens control unit
204A are described as being built in the imaging unit 101A. However, these may be removable
interchangeable lens types.
[0044]
For example, when the user operates the operation unit 110 to input an instruction such as zoom
operation or focus adjustment, the control unit 109 transmits a control signal instructing lens
movement in the instruction direction to the lens control unit 204A. The lens control unit 204A
confirms the position of the lens 201A with a position sensor according to the control signal, and
moves the lens 201A with a drive unit such as a motor. The control unit 109 may also
automatically control the lens 201A in accordance with the contrast of the image obtained by the
image processing unit 203A or the distance to the subject to be separately measured. Further, in
the case of having a so-called anti-vibration function to prevent blurring of the image, the control
unit 109 controls the control signal for moving the lens 201A based on the vibration detected by
the vibration sensor (not shown). It transmits to the part 204A.
[0045]
The lens 201A is, for example, an optical zoom lens having a maximum of six times and a
minimum of one. The lens control unit 204A changes the optical magnification of the lens 201A
by moving the zooming lens of the lens 201A according to an instruction from the control unit
109. The image processing unit 203A has an electronic zoom function of enlarging or reducing a
part of the image signal obtained by the imaging element 202A to the screen size.
[0046]
The imaging unit 101B includes a lens 201B, an imaging element 202B, an image processing
unit 203B, and a lens control unit 204B. The lens 201 </ b> B captures an optical image of a rear
photographer and enters the imaging element 202 </ b> B. The imaging element 202B converts
03-05-2019
12
an optical image of a subject captured by the lens 201B into an electrical signal (image signal).
The image processing unit 203 B converts an analog image signal obtained by the imaging
element 202 B into a digital image signal, performs image quality adjustment processing to form
image data, and writes the image data in the memory 103. The lens control unit 204B controls
the focal length and the zoom of the lens 201B according to the output of a position sensor (not
shown) that detects the position of the lens 201B and the control signal from the control unit
109. Since the imaging unit 101B exclusively targets the photographer, the lens 201B may be a
single focus lens.
[0047]
The voice input unit 102 includes microphones 205A and 205B that convert voice vibration into
electrical signals and outputs voice signals, and an AD conversion unit 206 that converts analog
voice signals output from the microphones 205A and 205B into digital voice signals. doing. The
microphones 205A and 205B constitute a stereo microphone. Thereafter, the output of the
microphone 205A is L channel, and the output of the microphone 205B is R channel.
[0048]
The FFT unit 207 of the audio input unit 102 is a frequency conversion unit that performs fast
Fourier transform on the two-channel audio data from the AD conversion unit 206 and converts
the data into audio data in the frequency domain. The voice band extraction filter 208 extracts a
human voice band, for example, a range of 800 Hz to 4 kHz, from the output data of the FFT unit
207. In this embodiment, the audio signal is sampled at 48 kHz and 16 bits, and the FFT unit 207
calculates a band from 0 to 48 kHz in a two-sample period as a frequency spectrum for 512
points. Up to 24 kHz at the Nyquist frequency results in a 256 point frequency spectrum. One
block of FFT is calculated in about 21 μs in time conversion. Although the number of data of the
FFT unit 207 is 512 points, it is not limited to this. As the voice band extraction filter 208, a band
pass filter, a high pass filter and a low pass filter are appropriately combined to extract a desired
frequency component or band.
[0049]
The subtractor 213 takes the difference between the L channel audio data and the R channel
audio data extracted by the voice band extraction filter 208. On the other hand, the adder 214
03-05-2019
13
adds the L channel audio data and the R channel audio data. The level detection unit 215 detects
the output level of the subtractor 213 and the output level of the adder 214, and the
determination unit 216 determines the necessity of the correction by the correction unit 209
according to the detection result of the level detection unit 215 and corrects it. The unit 209 is
controlled. Although the details will be described later, the determination unit 216 determines
whether the ratio of the addition result (sum) of the adder 214 and the subtraction result
(difference) of the subtracter 213 is equal to or more than a threshold.
[0050]
The correction process of the correction unit 209 is a process of subtracting the result of
multiplying the difference between the L channel and the R channel by a predetermined
coefficient from each of the L channel audio data and the R channel audio data from the FFT unit
207. By this correction processing, an audio signal with a small left / right difference remains,
and the correction unit 209 outputs the corrected two-channel audio data to the inverse FFT unit
210. The correction unit 209 also outputs the two-channel audio data from the FFT unit 207 as it
is to the inverse FFT unit 210 when the determination unit 216 determines that the correction is
unnecessary. Regardless of the presence or absence of correction, the two-channel audio data
supplied from the correction unit 209 to the inverse FFT unit 210 is exclusively composed of
audio data of the main subject. The inverse FFT unit 210 is an inverse frequency conversion unit
that performs inverse Fourier transform on the L channel and R channel audio data from the
correction unit 209 to convert it back to time domain audio data.
[0051]
On the other hand, the output speech data of the subtractor 213 is input to the inverse FFT unit
219. The output sound data of the subtractor 213 is composed of the difference between the L
channel sound data and the R channel sound data, and reflects the sound of the photographer
exclusively. The inverse FFT unit 219 is an inverse frequency transform unit that reverses the
output data of the subtractor 213 into differential speech data in the time domain by inverse
Fourier transform.
[0052]
The two-channel output voice data of the inverse FFT unit 210 is input to an auto level controller
03-05-2019
14
(ALC) 211 and a level detection unit 217. The level detection unit 217 detects the level of the
audio data of each channel from the inverse FFT unit 210, and outputs the detection result to the
calculation unit 218. Arithmetic unit 218 has a relational expression or correspondence table
with the gain of ALC 211 for each level detected by level detection unit 217, and sets the gain
according to the output level value of level detection unit 217 to ALC 211. The ALC 211
amplifies the audio data from the inverse FFT unit 210 with the gain set by the operation unit
218. The ALC 211 can control the amplitude of the audio signal to a predetermined level.
[0053]
The output voice data of the inverse FFT unit 219 is input to the ALC 222 and the level detection
unit 220. The level detection unit 220 detects the level of the audio data from the inverse FFT
unit 219 and outputs the detection result to the calculation unit 221. The arithmetic unit 221
includes, as with the arithmetic unit 218, a relational expression or correspondence table with
the gain of the ALC 222 for each level detected by the level detection unit 220, and the gain
corresponding to the output level value of the level detection unit 220 is ALC 222 Set to The ALC
222 amplifies the audio data from the inverse FFT unit 219 with the gain set by the operation
unit 221. The ALC 222 can control the amplitude of the audio signal to a predetermined level.
Since ALC 222 amplifies the difference signal, its gain is generally set to a value larger than that
of ALC 211.
[0054]
Since the output signal of the ALC 222 corresponds to the difference signal between the L
channel audio signal from the photographer and the R channel audio signal, it is necessary to
convert this into two channels. Therefore, the duplicating unit 223 duplicates the output signal of
the ALC 222 to make it into stereo, and outputs 2-channel audio data.
[0055]
The audio processing unit 212 performs predetermined processing on audio data of two
channels from the ALC 211 and audio data of two channels from the ALC 222 and writes the
processed data to the memory 103.
[0056]
03-05-2019
15
FIG. 3 shows an example of a path of voice from the main subject in front of the microphones
205A and 205B and a path of voice from a photographer behind the microphone.
As shown in FIG. 3, the midpoint between the L channel microphone 205 A and the R channel
microphone 205 B is located laterally away from the center of the display unit 105. At the time
of shooting, the photographer usually shoots while visually confirming the display image on the
display unit 105, and usually holds the imaging apparatus 100 so that the display unit 105 is in
front of the face. In addition, in a normal scene, the photographer is positioned closer to the main
subject than the main subject of the imaging device 100. In this state, the distance difference
between the main subject and the microphones 205A and 205B is small, while the distance
difference between the photographer and the microphones 205A and 205B is relatively large.
That is, there is a large difference between the sound levels of the L channel and the R channel
between the main subject and the photographer.
[0057]
Therefore, the determination unit 216 determines whether the voice from the photographer
behind is significantly included according to the following determination formula. That is, Δ |
L−R | / (L + R) where L represents the level of L channel voice data output from the voice band
extraction filter 208, and R represents the R channel output from the voice band extraction filter
208. Indicates the level of voice data.
[0058]
The difference between voices input to the microphones 205A and 205B from the main subject
increases as the volume increases, but as described above, the distances become almost equal, so
the ratio of sum to difference is extremely small Become. On the other hand, as for the voices
input from the photographer to the microphones 205A and 205B, the distance is significantly
different, so the ratio of the sum to the difference is large. Therefore, the presence or absence of
the voice from the photographer behind can be determined by the above determination formula.
The correction unit 209 subtracts the component corresponding to the voice from the
photographer from the voice data of the L channel and the R channel in the above-described
correction processing, when the voice from the photographer is present to an extent that can not
be ignored.
03-05-2019
16
[0059]
FIG. 4 shows a flowchart of the correction process of this embodiment. The level detection unit
215 continuously or continuously detects the levels of the output of the subtractor 213
(difference between L channel and R channel) and the output of the adder 214 (sum of L channel
and R channel). .
[0060]
The determination unit 216 takes in the detection result of the level detection unit 215 (S401).
Based on the detection result of the level detection unit 215, the determination unit 216
determines whether or not Δ | LR | / (L + R)) X (1) holds for the 256-point frequency spectrum
of the FFT (S402). X is a predetermined threshold. When Expression (1) is established, the voice
from the photographer is included, and the determination unit 216 validates the correction by
the correction unit 209. If expression (1) does not hold, the process returns to step S401, and the
determination unit 216 takes in the level detection result again.
[0061]
When the determination result of the determination unit 216 indicates the execution of the
correction (S402), the correction unit 209 first determines the correction level (S403).
Specifically, the correction unit 209 executes the calculation of NC = α (LR) (2) for the 256-point
frequency spectrum of the FFT unit 207. In Equation (2), L indicates the strength of the L
channel voice data output from the FFT unit 207, and R indicates the strength of the R channel
voice data output from the FFT unit 207. α is a coefficient for adjustment, which is determined
experimentally. FIG. 5 shows the relationship between the audio signals L and R of the L and R
channels and the frequency spectrum of the difference signal (L-R).
[0062]
The NC shown in the equation (2) represents the audio signal from the photographer, and the
correction unit 209 performs the correction process using the L channel audio data output of the
FFT unit 207 and the R channel audio data output in equation (2). The obtained NC is subtracted
03-05-2019
17
(S404). By this subtraction, the audio signal component of the photographer contained in the L
channel audio data output and the R channel audio data output of the FFT unit 207 can be
reduced, and the correction result mainly indicates the audio signal of the main subject.
[0063]
FIG. 6 is an ALC diagram showing input / output characteristics of the ALCs 211 and 222. As
shown in FIG. The arithmetic units 218 and 221 hold gain parameters for forming an ALC
diagram as shown in FIG. For example, an ALC diagram corresponding to a gain parameter for
the main subject sound signal held by the calculation unit 218 indicates input / output
characteristics as indicated by a solid line 601. The ALC diagram corresponding to the gain
parameter for the voice signal of the photographer held by the operation unit 221 shows an
input / output characteristic as indicated by a solid line 602. In order to form an ALC diagram
shown by the solid line 602, the computing unit 221 holds, for example, a plurality of points
indicating output levels corresponding to input levels as parameters. It may be table data
indicating an output level corresponding to an input level, parameters indicating a plurality of
points indicating a gain corresponding to the input level, or table data indicating a gain
corresponding to the input level.
[0064]
In the ALC diagram shown by the solid line 601, for example, in a situation where the sound of
the main subject of 70 dB SPL is input to the microphones 205A and 205B (P1 in FIG. 6), the
ALC 211 outputs an audio signal of -15 dB FS. On the other hand, when the voice of the
photographer of 70 dB SPL is input to the microphones 205A and 205B in the ALC diagram of
the solid line 602, for example, the ALC 222 outputs the voice signal of -15 dB SPL as well as the
voice of the main subject in balance. There is a need to. However, since the input signal of the
ALC 222 is a difference signal of the voice signal of the photographer as described above, it is
input to the ALC 222 at a level lower than the voice signal of the subject by about 20 dB. That is,
audio data corresponding to 50 dB SPL is input to the ALC 222 (P2 in FIG. 6).
[0065]
The level difference between the input signals of ALC 211 and ALC 222 is, for example, L2−L1 =
20 Log (r2 / r1) (dB) L1 [dB SPL] = 20 × 10 <−6> × 10 <(L1 / 20)> (Pa Can be expected in
03-05-2019
18
Here, the distance from the photographer to the L microphone 205A is r1, the distance between
the R microphones 205B is r2, the sound pressure level of the photographer's voice at the L
microphone 205A is L1, the sound of the photographer's voice at the R microphone 205B The
pressure level is L2.
[0066]
In the ALC diagram shown by the solid line 602, for example, when the sound pressure level
input to the microphones 205A and 205B is 50 dB SPL, the input to the ALC 222 corresponds to
30 dB SPL, and this point (P3 in FIG. 6) have. When the sound pressure level input to the
microphones 205A and 205B is 40 dB SPL, the input to the ALC 222 corresponds to 20 dB SPL,
and this point (P4 in FIG. 6) also has an inflection point.
[0067]
As described above, since the left and right difference signals of the photographer's voice are
input to the ALC 222, it is necessary to increase the gain of the ALC 222. In the ALC diagram
shown by the solid line 602, the gain is set large at a low input level.
[0068]
As indicated by solid lines 601 and 602 in FIG. 6, the gain of ALC 211 is set to be lower than the
gain of ALC 222.
[0069]
In FIG. 6, the ALC diagram is represented by a straight line of multiple points, but it is needless to
say that it may be configured as a non-linear diagram.
[0070]
In this embodiment, audio signal components corresponding to respective imaging directions are
separated from audio signals input by a single stereo microphone to two imaging units imaging
different directions, and integrated with corresponding images. It can be recorded.
[0071]
03-05-2019
19
Although FIG. 2 describes the case where two voices are input, the number of channels may be
more than that.
[0072]
Further, although the imaging apparatus has been described in the present embodiment, the
audio signal processing apparatus included in the audio input unit 102 is applicable to an
apparatus that records or inputs external audio, and is other than the imaging apparatus. The
present invention is also applicable to IC recorders and mobile phones.
03-05-2019
20
Документ
Категория
Без категории
Просмотров
0
Размер файла
32 Кб
Теги
jp2015050530
1/--страниц
Пожаловаться на содержимое документа