close

Вход

Забыли?

вход по аккаунту

?

JP2010206451

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2010206451
In an AV system, control of sound reproduction is realized while dynamically reflecting the
direction of a speaker with respect to a listener. An AV system includes a camera-equipped
speaker (100) having a camera (112). The camera 112 is installed integrally with the speaker
body 111, and shoots the direction in which the speaker body 11 outputs a sound. The
recognition unit 103 recognizes the position of the listener P1 from the image of the camera
112, and detects the orientation of the speaker main body 111 with respect to the listener P1.
The voice control unit 102 performs signal processing on the given voice signal, and outputs it to
the speaker body 111 as an acoustic signal. [Selected figure] Figure 1
Speaker with camera, signal processing device, and AV system
[0001]
The present invention relates to a technology for performing sound reproduction preferable to a
listener in an AV (audio visual) system.
[0002]
The propagation of sound differs depending on the positional relationship between the sound
source and the listener, and the environment between the sound source and the listener.
Therefore, the listener can detect the difference in sound propagation and perceive the position
of the sound source and the impression of the environment. For example, when the sound source
09-05-2019
1
position is fixed in front of the listener, if the listener turns the face to the right, the left sound
will turn to the left, and if the listener turns to the left, the right sound will be relatively louder
and earlier in time. , Reach the ear canal (interaural level difference, interaural time difference).
Also, due to the shape of the pinnae, it affects the frequency characteristics of the incoming
sound differently depending on the direction of arrival of the sound. Therefore, the listener can
more clearly perceive the presence of the sound source by the characteristics (frequency
characteristics) of the sound heard by both ears and the change of the sound heard by both ears.
[0003]
The transfer characteristic between the entrance of the ear canal and the sound source is called
head related transfer function (HRTF), which greatly affects the localization of human sound
(perceiving where the sound is emitted) I know that. AV systems such as home theater equipment
that can reproduce more realistic sound using multi-channel speakers such as 5.1ch and 7.1ch
by utilizing the localization ability of this person's sound have become commonplace in recent
years It is spreading to the home.
[0004]
In such an AV system, it is generally recommended that the speaker be disposed at a
predetermined position on a circle centered on the listener, toward the listener. However, due to
the relationship of the installation space, etc., each speaker can not always be disposed at the
recommended position. In this case, the following problems occur.
[0005]
First, there is a problem that it becomes difficult to reproduce the sound intended by the content
producer. For example, when the arrangement position of the speaker is different from the
recommended position, the arrival direction of the sound perceived by the listener does not
necessarily coincide with the originally assumed direction. For this reason, not only the sound
heard from this speaker but also the balance with the sound emitted from the other speakers, the
impression of the sound felt by the listener is largely different from that intended by the content
creator. There is a possibility of
09-05-2019
2
[0006]
In addition, even when the speaker is disposed at the recommended position, the same problem
as described above occurs when the listener does not listen at the recommended position or
moves from the recommended position.
[0007]
In order to solve such a problem, Patent Document 1 discloses an audio reproduction apparatus
including a plurality of speakers, a position detection unit that detects the position of the viewer
in real time, and a control unit that outputs audio signals to the plurality of speakers It is
disclosed.
The control unit calculates the positional relationship of each speaker with respect to the viewer
based on the detection result from the position detection unit, and controls the reproduced sound
by setting the audio signal output timing for each speaker from the calculation result. There is.
[0008]
Further, in Patent Document 2, the face direction and the number of persons of the listener are
detected by a camera, and the reproduction sound is controlled by switching the filter coefficient
for sound image control according to the position of the listener obtained by the camera. We
disclose the method. JP-A-6-311211 JP-A 2003-32776
[0009]
However, the above-mentioned prior art has the following problems.
[0010]
First, in Patent Document 1, the relative positional relationship between a listener and a speaker
is detected, and based on that, the output timing of the audio signal is controlled.
That is, only the position of the speaker with respect to the listener is considered for the control
09-05-2019
3
of sound reproduction. Further, even in Patent Document 2, only the reproduction sound is
controlled according to the position of the listener obtained by the camera.
[0011]
On the other hand, what affects the sound reproduction is not only the positional relationship
between the listener and the speaker. For example, the orientation of the speaker with respect to
the listener also greatly affects how the sound is heard. This is due to the fact that the directivity
characteristics of the speaker are different for each frequency. The speaker is originally designed
to improve the balance of the frequency characteristics of the sound heard in the front direction.
However, since the directional characteristics of the speaker are different for each frequency, for
example, when the sound is heard from the side or the back of the speaker, the balance of the
frequency characteristics is deteriorated, and the acoustic performance of the speaker itself can
not be obtained.
[0012]
Therefore, in order to realize optimal sound reproduction, it is necessary to reflect the control of
sound reproduction also on the orientation of the speaker with respect to the listener.
Furthermore, in consideration of movement of the listener during listening, it is preferable to be
able to obtain information on the orientation of the speaker with respect to the listener in real
time so as to be dynamically controllable.
[0013]
An object of the present invention is to realize control of sound reproduction in an AV system
while dynamically reflecting the direction of a speaker with respect to a listener.
[0014]
A first invention is a camera-equipped speaker including: a speaker body; and a camera installed
integrally with the speaker body and capturing an image in a direction in which the speaker body
outputs a sound.
[0015]
09-05-2019
4
According to the present invention, an image in a direction in which the speaker body outputs a
sound can be acquired by the camera installed integrally with the speaker body.
The image processing technology can recognize the position of the listener from this image and
detect the orientation of the speaker body with respect to the listener.
Therefore, by using this camera-equipped speaker, it is possible to realize control of sound
reproduction while dynamically reflecting the direction of the speaker with respect to the
listener.
[0016]
According to a second invention, as a signal processing apparatus for a camera-equipped speaker
according to the first invention, an image signal output from the camera is input, and a position
of a listener is recognized from an image represented by the image signal. A recognition unit for
detecting the orientation of the speaker body with respect to the listener based on the recognized
listener position, and signal processing for a given audio signal, and audio control to be output as
an acoustic signal to the speaker body And a unit.
[0017]
According to the present invention, the position of the listener can be recognized by the
recognition unit from the image captured by the camera of the camera-equipped speaker, and the
orientation of the speaker body with respect to the listener can be detected.
Therefore, it is possible to realize control of sound reproduction while dynamically reflecting the
direction of the speaker with respect to the listener.
[0018]
According to a third aspect of the present invention, as an AV system, a speaker body and a
camera integrally installed with the speaker body, the camera body capturing a direction in
which sound is output, and an image signal output from the camera A recognition unit for
recognizing the position of the listener from the image represented by the image signal and
detecting the orientation of the speaker body with respect to the listener based on the recognized
09-05-2019
5
listener position; And an audio control unit that performs signal processing and outputs the
signal as an acoustic signal to the speaker body.
[0019]
According to the present invention, an image in a direction in which the speaker body outputs a
sound can be acquired by the camera installed integrally with the speaker body.
From this image, the recognition unit can recognize the position of the listener and detect the
orientation of the speaker body with respect to the listener. Therefore, it is possible to realize
control of sound reproduction while dynamically reflecting the direction of the speaker with
respect to the listener.
[0020]
According to the present invention, by utilizing the camera-equipped speaker, it is possible to
realize control of sound reproduction while dynamically reflecting the direction of the speaker
with respect to the listener, so sound reproduction more appropriate for the listener Is realized.
[0021]
Hereinafter, embodiments of the present invention will be described in detail with reference to
the drawings.
[0022]
First Embodiment FIG. 1 shows an example of a configuration of an AV system according to a
first embodiment.
The AV system of FIG. 1 uses a camera-equipped speaker 100 including a speaker body 111 and
a camera 112 installed integrally with the speaker body 111.
The camera 112 shoots the direction in which the speaker body 111 outputs a sound. Further,
the signal processing device 104 for the camera-equipped speaker 100 includes an audio control
09-05-2019
6
unit 102 and a recognition unit 103. The image signal output from the camera 112 is given to
the recognition unit 103 of the signal processing device 104. The AV reproducing apparatus 101
reproduces AV content and outputs an audio signal and a video signal. The audio signal is
provided to the audio control unit 102 of the signal processing device 104. The video signal is
sent to the display 106.
[0023]
In the signal processing apparatus 104, the recognition unit 103 recognizes the position of the
listener P1 from the image indicated by the image signal output from the camera 112, and based
on the recognized listener position, the recognition unit 103 detects the position of the speaker
main body 111 for the listener P1. Detect the direction. For example, an angle θh formed by a
front direction of the speaker body 111 (an alternate long and short dash line in FIG. 1) and a
straight line (broken line in FIG. 1) connecting the speaker body 111 and the listener P1 is
determined. The voice control unit 102 performs signal processing on the given voice signal, and
outputs it to the speaker main body 111 as an acoustic signal. Then, in the signal processing, the
correction of the output signal is performed based on the directivity characteristic of the speaker
main body 111 measured in advance according to the direction of the speaker main body 111
detected by the recognition unit 103. For example, the gain for each frequency is adjusted.
[0024]
Although only one camera-equipped speaker 100 is shown in FIG. 1, in the AV system, a plurality
of speakers are usually arranged. Some or all of the plurality of speakers may be cameraequipped speakers. In addition, transmission of each signal may be performed by wire or may be
performed wirelessly.
[0025]
FIG. 2 shows an example of the appearance of the camera-equipped speaker 100. In the example
of FIG. 2, the camera 112 is installed on the speaker body 111 so as to face in the same direction
as the speaker body 111. Since the speaker is usually installed so as to face the listener in many
cases, the configuration as shown in FIG. 2 enables the camera 112 to capture an image of the
listener.
09-05-2019
7
[0026]
In addition, the installation form of the camera in a speaker with a camera is not restricted to the
example of FIG. 2, If the figure of a listener can be image | photographed, another installation
form may be sufficient. For example, the camera may be built in the front part of the speaker or
the like, and only the lens part may be exposed to the outside. In addition, if a lens with a wide
angle of view, for example, a fisheye lens is used, the imaging range is expanded, and the
possibility of the listener entering the camera view is increased, and options of the installation
position of the camera can be expanded. For example, the lens may be exposed at the corners of
the top of the speaker.
[0027]
Also, multiple cameras may be installed. As a result, the imaging range is expanded, and the
possibility of the listener entering the camera view increases. In addition, by using information
captured by a plurality of cameras, improvement in detection accuracy of the position of the
listener can also be expected.
[0028]
The process in the recognition unit 103 will be described with reference to FIG. In FIG. 3, the face
image IP1 of the listener P1 is included in the camera image. The horizontal angle of view of the
camera 112 is 2γ. The recognition unit 103 detects the face image IP1 from the camera image
using image recognition technology. For example, the face image IP1 can be detected by
performing signal processing on a camera image signal, detecting an outline by edge detection,
or detecting a face part such as an eye or hair by color detection. Such face recognition
technology has already been applied in recent years to digital cameras and the like, and the
detailed description thereof is omitted here.
[0029]
Then, the horizontal position of the detected face image IP1 in the camera image is obtained.
Here, it is assumed that the center position of the face image IP1 is at a position of length a to
09-05-2019
8
the left from the center of the camera image (0 <a <1 and the horizontal width of the camera
image is 2). Assuming that the angle between the front direction of the camera 112 (the alternate
long and short dashed line in FIG. 3) and the straight line connecting the camera 112 and the
listener P1 (dotted line in FIG. 3) is θh , Θh = γ * a. The angle θh represents the horizontal
orientation of the speaker main body 111 with respect to the listener P1 (the relationship
between the orientation of the speaker main body 111 and the orientation of the camera 112 is
known) from a different viewpoint.
[0030]
Also in the case where the face image IP1 is included in the right half of the camera image, the
angle θh can be similarly detected. Further, the angle θv in the vertical direction can also be
detected by the same method. By performing such processing, the recognition unit 103 can
detect the orientation of the speaker main body with respect to the listener P1.
[0031]
Next, an example of a method of estimating the distance L between the speaker and the listener
P1 will be described with reference to FIG. FIG. 4A is a view schematically showing how the size
of a human face changes in accordance with the distance in the camera image. When the
distance is l0, l1, l2, the width of the face is m0, m1, m2, respectively. FIG. 4B is a graph showing
the relationship between the width of the detected face and the distance L. Measure the face
width on the image at a distance L of several points in advance, and draw a straight line or a
curve that interpolates and extrapolates between the measurement points, to create a graph as
shown in FIG. Can. The recognition unit 103 stores the relationship as shown in FIG. 4B using, for
example, mathematical approximation, and estimates the distance L using the face width detected
from the image.
[0032]
The actual users are not limited to those with a standard head size, and there are also people
with larger or smaller head. Therefore, in FIG. 4 (b), graphs of three patterns of standard, large,
and small head sizes are prepared in advance. Then, the size of the listener's head may be input
by measurement or self-reporting, and a standard, large, or small graph may be selected
according to the size. Of course, the division of the head size is not limited to three. For example,
09-05-2019
9
the head sizes may be divided into groups at intervals of 1 cm, and a graph may be created for
each group.
[0033]
As a method of estimating the distance L between the speaker and the listener P1, other than the
method described here, for example, a method of calculating based on image information from
two cameras whose installation positions are known, a camera A method etc. which estimate a
listener from a focus position which detected a listener by autofocus of, are also considered.
[0034]
As described above, the recognition unit 103 can detect positional information (the angles θh
and θv and the distance L) of the listener P1 using the image signal output from the camera
112.
In particular, since the camera 112 is installed integrally with the speaker body 111, the position
of the listener P1 with respect to the speaker body 111 can be easily detected. For this reason,
more appropriate sound reproduction becomes possible as compared with the prior art.
[0035]
Next, processing in the voice control unit 102 will be described. As shown in FIG. 1, the audio
control unit 102 performs signal processing on the audio signal from the AV reproducing
apparatus 101 and outputs the signal to the speaker body 111 as an acoustic signal. Then, the
position information (the angles θh and θv and the distance L) of the listener P1 detected by
the recognition unit 103 is received, and signal processing according to this is performed.
[0036]
First, a method of using the direction information θh and θv will be described. Here, it is
assumed that the output signal is corrected based on the directivity characteristic of the speaker
body 111 by using the direction information θh and θv for signal processing on the audio
signal. That is, in the present embodiment, the output signal is corrected based on the directivity
09-05-2019
10
characteristic of the speaker main body 111 according to the direction of the speaker main body
111 with respect to the listener P1.
[0037]
FIG. 5 is a graph showing directivity characteristics of a certain speaker. In each of (a) and (b) of
FIG. 5, an axis radially extending from the center of the circle indicates the sound intensity, and
the sound intensity in each direction, that is, the directivity characteristic is drawn by a solid line.
The upper side of the graph is the front direction (front direction) of the speaker. The directional
characteristics differ depending on the frequency of the sound to be reproduced. The directional
characteristics of 200 Hz, 500 Hz and 1000 Hz in (a) and 2 kHz, 5 kHz and 10 kHz in (b) are
plotted, respectively.
[0038]
As can be seen from FIG. 5, the sound in the front direction of the speaker is the strongest, and
roughly speaking, the sound becomes weaker as it goes in the backward direction (the opposite
direction from the front by 180 degrees). Also, the change varies depending on the frequency of
the sound to be reproduced, and the change is small at low frequencies, and becomes large at
high frequencies. The speakers are generally adjusted in sound quality so that the sound balance
is best when listening in the front direction. From the directional characteristics as shown in FIG.
5, when the position of the listener deviates from the front direction of the speaker, the
frequency characteristics of the sound to be heard may be largely changed from the ideal state,
and the sound balance may be deteriorated. I understand. A similar problem arises with the
phase characteristics of the sound.
[0039]
Therefore, the directional characteristic of the speaker is measured, and an equalizer that
corrects the influence of the directional characteristic is calculated in advance, and the equalizer
processing is performed according to the detected direction information θh and θv, that is, the
orientation of the speaker body with respect to the listener. . This makes it possible to realize
well-balanced reproduction regardless of the orientation of the speaker with respect to the
listener.
09-05-2019
11
[0040]
Specific equalizer processing will be described with reference to FIG. FIG. 6 is an example of the
sound pressure level (the left number in the cell) and the correction gain of the equalizer (the
right number in the cell) for each angle and frequency from the front of the speaker. Each unit is
dB. In the example of FIG. 6, by setting the correction gain for the sound pressure level for each
angle and for each frequency, it is possible to listen to the same sound as in the front direction of
the speaker regardless of where the listener is. In other words, by using the correction gain
shown in FIG. 6, it is possible to make the graph of the directivity characteristic at each
frequency almost a circle. FIG. 6 is an example, and for example, the angle and the frequency
may be set more finely. If the detected angle is not in the data, the correction gain may be
calculated by interpolation or the like.
[0041]
Also, although the directional characteristic on the horizontal plane is described here, the
directional characteristic of the speaker is defined on the sphere surrounding the speaker.
Therefore, the correction gain may be set for each of the angle θh in the horizontal direction
and the angle θv in the vertical direction by expanding FIG. Thereby, it is possible to threedimensionally correct the directivity characteristic in accordance with the direction of the
speaker with respect to the listener.
[0042]
In order to perform the equalizer processing, the voice control unit 102 may be provided with a
digital filter such as an analog filter, an IIR filter, or an FIR filter. For example, when using a
parametric equalizer for correction, you may set Q value (value showing the sharpness of the
peak of a frequency characteristic) with correction gain.
[0043]
Next, how to use the distance information L will be described. When a sound is emitted from a
certain point, the sound propagates in all directions and attenuates by a spread amount, but this
09-05-2019
12
attenuation amount is inversely proportional to the square of the distance. For example, as
shown in FIG. 7, when the distance from the sound source is doubled as r1 to r2 (= r1 × 2), the
sound pressure becomes 1/4 (= (1/2) <2>), and r3 When quadrupled as in (= r1 × 4), the sound
pressure becomes 1/16 (= (1/4) <2>). That is, when the listener leaves the speaker, the sound
pressure of the sound perceived by the listener decreases accordingly. In this case, the volume
balance is deteriorated due to the relationship with the sound pressure from the other speakers,
and the localization of the sound is not preferable because the sound different from that intended
by the content producer is heard.
[0044]
Therefore, according to the detected distance information L, gain correction of the sound emitted
from the speaker is performed. Thereby, even when the distance between the listener and the
speaker is not optimum, it is possible to realize well-balanced reproduction.
[0045]
The relationship between the distance and the attenuation described here is an ideal point sound
source (a theoretical sound source having no size and having no directivity) and an ideal free
sound field. In practice, the sound source is not a point sound source but has a size, and also has
directivity. Also, the sound field is not a free sound field because there are various reflections.
Therefore, the correction gain for each distance as shown in FIG. 8 may be measured and held in
advance for an actual speaker and reproduction environment. If the detected distance L is not in
the data, an approximate value of the correction gain may be calculated by interpolation or the
like.
[0046]
The correction gain may be set for each frequency. It is known that the sound of high frequency
components is more attenuated by distance than the sound of low frequency components.
Therefore, sound pressure correction with higher accuracy can be realized by having a data table
as shown in FIG. 8 for each frequency. Such sound pressure correction for each frequency can be
realized by band division and gain setting by a QMF filter bank or the like, and IIR digital filters,
FIR digital filters, etc. are generally used.
09-05-2019
13
[0047]
Further, it is also possible to correct the sound pressure levels from the plurality of speakers to
match. For example, when the speakers are arranged at the distances r1, r2 and r3 shown in FIG.
7 to the listener, the volume of the speaker at the distance r1 is lowered and the distance is
adjusted to match the volume of the speaker at the distance r2. Increase the volume of the r3
speaker. By this correction, it is possible to match the volume that reaches the listener from each
speaker. Of course, correction may be made on the basis of the volume of another speaker, or
may be based on a completely different volume. Moreover, when the efficiency of each speaker is
different, it is also possible to perform the volume control which also considered it.
[0048]
As described above, the voice control unit 102 performs correction according to the angle
information θh, θv and the distance information L, whereby the direction of the speaker
deviates from the listener, or the distance from the speaker to the listener is not optimal. Even in
this case, better sound reproduction can be realized.
[0049]
FIG. 9 shows an example of processing blocks in the voice control unit 102.
In FIG. 9, the voice control unit 102 includes three processing blocks 121, 122, and 123, and the
processing block 121 performs the correction according to the angle information as described
above, and the processing block 122 As described above, the gain correction according to the
distance is performed. Further, the processing block 123 corrects the output timing of the sound
in accordance with the detected distance so that the timings of the sounds from the plurality of
speakers coincide with each other at the listener position.
[0050]
Here, although the correction values for each angle and each distance are realized by gains for all
bands or frequencies, it is also possible to hold each as a correction FIR filter and use it for
correction. By using the FIR filter, it is possible to control the phase and to perform correction
09-05-2019
14
with higher accuracy.
[0051]
Next, an example of the operation timing of image shooting by the camera 112, detection
processing by the recognition unit 103, and correction by the voice control unit 102 will be
described.
[0052]
For example, the camera 112 always takes an image and keeps outputting an image signal to the
recognition unit 103.
The recognition unit 103 constantly detects the position of the listener from the image signal,
and continues to output the position information of the listener to the voice control unit 102 in
real time. The voice control unit 102 receives the position information output in real time,
switches the correction process in real time, and continues to correct the sound signal. Thereby,
even when the position of the listener dynamically changes, it is possible to realize voice control
following it.
[0053]
However, in such control, the correction process is switched even by a minute movement of the
listener, but there may be a case where only a change to the extent that can not be detected in
the aural sense occurs, and such switching of the correction process is audible and meaning It
will be Therefore, for example, the position information of the listener may be output to the voice
control unit 102 only when the recognition unit 103 detects movement (change in angle or
distance) greater than or equal to a predetermined threshold value for the listener.
[0054]
Alternatively, the image shooting by the camera 112 and the detection processing by the
recognition unit 103 may be performed at a predetermined time interval. This can reduce the
processing load on the system. Alternatively, when the user turns on the trigger switch using a
09-05-2019
15
remote control or the like, the recognition unit 103 and the voice control unit 102 may execute
the processing. This makes it possible to further reduce the processing load on the system.
[0055]
Alternatively, an initial value of the position information of the listener may be set in advance by,
for example, execution of a measurement mode provided in the system, and an image signal
obtained by photographing the dynamic correction accompanying the movement of the listener
thereafter. It may be performed using
[0056]
The correction data table as shown in the present embodiment is recorded in, for example, a nonvolatile memory or the like in the voice control unit 102.
[0057]
In addition, since the actual AV system is provided with a plurality of speakers, by applying the
technology described here to each speaker, control according to the user position for each sound
reproduced from each speaker is performed. It can be carried out.
[0058]
Second Embodiment FIG. 10 shows an example of the configuration of an AV system according to
a second embodiment.
In FIG. 10, the same components as in FIG. 1 are assigned the same reference numerals as in FIG.
1, and the description thereof is omitted here.
[0059]
In the configuration of FIG. 10, the speaker body of the camera-equipped speaker 200 is an array
speaker 113 formed of a plurality of speaker units.
An array speaker can realize sharp directivity characteristics by increasing the number of
09-05-2019
16
speaker units and lengthening the length (for example, Nishikawa et al., "Directive array speaker
using a two-dimensional digital filter", electronic information communication Dissertation Journal
A Vol.
J78-A No. 11 pp. 1419-1428, November 1995). By using this technology for sound reproduction,
it is expected to prevent the diffusion of sound in unnecessary directions, but for that purpose it
is necessary to direct the peak of directivity of the array speaker 113 to the direction of the
listener.
[0060]
In the present embodiment, the camera 112 is installed in the array speaker 113, and in the
signal processing device 204, the recognition unit 103 detects the direction of the array speaker
113 with respect to the listener. This detection can be realized as in the first embodiment. Then,
the audio control unit 202 performs signal processing on the audio signal so that the peak of the
directivity of the array speaker 113 is directed to the listener, and outputs an audio signal to
each speaker unit.
[0061]
The direction of the directivity peak of the array speaker 113 can be easily controlled, for
example, by setting the delay and gain to be added to the sound signal to each speaker unit. For
example, when it is desired to shift the direction of the directivity peak slightly to the right, the
delay of the acoustic signal may be reduced and the gain may be increased and the sound may be
output more quickly for the left speaker unit.
[0062]
In addition, in order to direct the peak of directivity of the array speaker 113 to the listener P1
more accurately, a data table as shown in FIG. 11 that holds FIR filter coefficients used for audio
control of each speaker unit for each angle You may use. FIG. 11A shows the angle θh and the
FIR filter coefficient Hx_y (x is the angle θh and y is the speaker unit number) for each speaker
unit. Further, FIG. 11B is an example of the FIR filter coefficient of each speaker unit when the
angle θh = 30 °. For example, a data table as shown in FIG. 11 is stored in the non-volatile
09-05-2019
17
memory in the voice control unit 202, and the voice control unit 202 generates an FIR filter from
the data table according to the angle information θh detected by the recognition unit 103. Read
out the coefficients to realize voice control.
[0063]
Here, although directivity control in the horizontal plane has been described, directivity control
according to angle information θv in the vertical direction can be realized similarly by using a
speaker array in which speaker units are arranged in the vertical direction.
[0064]
Also, by arranging the speaker units in a planar manner, directivity control can be realized
according to horizontal and vertical angle information.
[0065]
Further, as for the control according to the distance information L, as in the first embodiment,
gain correction according to the distance may be performed on the sound signal to each speaker
unit.
[0066]
When an array speaker is used, so-called local reproduction can be performed, and the present
embodiment may be applied to control of the local reproduction.
Local reproduction is reproduction in which the sound is reproduced only within a certain
predetermined range and the volume drops sharply at a position away therefrom.
For example, when the position of the listener P1 is detected by the camera 112 and the listener
P1 is out of the assumed range, the voice control unit 202 switches the control parameter, and
the position of the listener P1 is included in the range of local reproduction. To control.
[0067]
09-05-2019
18
Third Embodiment FIG. 12 shows an example of the configuration of an AV system according to
a third embodiment.
In FIG. 12, the same components as in FIG. 1 are assigned the same reference numerals as in FIG.
1, and the description thereof is omitted here.
[0068]
In the configuration of FIG. 12, the camera-equipped speaker 300 includes the movable
mechanism 114 for changing the direction of the speaker main body 111. The movable
mechanism 114 is realized, for example, by a motorized rotary table. The signal processing
device 304 is provided with a movable mechanism control unit 301 for controlling the movable
mechanism 114. The recognition unit 103 adds positional information of the listener P1 detected
from the image signal to the voice control unit 102 and outputs the positional information to the
movable mechanism control unit 301. The movable mechanism control unit 301 receives
positional information of the listener P1 and sends a control signal to the movable mechanism
114 so that the speaker main body 111 faces the listener P1. Such an operation makes it
possible to dynamically adjust the orientation of the speaker body 111 to the position of the
listener P1.
[0069]
The control for actually changing the orientation of the speaker as described above may be
performed in combination with the correction process of the directivity characteristic of the
speaker described in the first embodiment. Specifically, for example, when the angle information
θh and θv representing the orientation of the speaker main body 111 with respect to the
listener P1 is equal to or less than a predetermined threshold, the correction processing of the
directivity characteristic corresponds, and when exceeding the predetermined threshold, the
movable mechanism Control such as changing the direction of the speaker may be performed by
114. If the direction of the speaker deviates significantly from the listener, a large correction gain
must be given to correct the directivity. However, when the correction gain is increased, an
overflow problem occurs in the digital signal, and distortion may occur in the sound due to the
reproduction upper limit gain of the speaker itself. Therefore, such a problem can be avoided by
combining the control in the present embodiment with the directional characteristic correction.
09-05-2019
19
[0070]
In addition, the present embodiment may be applied to the array speaker described in the second
embodiment. That is, it is possible to realize directivity control and control for local reproduction
by installing the array speaker in the movable mechanism and controlling the movable
mechanism to change the direction of the array speaker.
[0071]
Fourth Embodiment FIG. 13 shows an example of the configuration of an AV system according to
a fourth embodiment. In FIG. 13, the same components as in FIG. 1 are assigned the same
reference numerals as in FIG. 1, and the description thereof is omitted here.
[0072]
In the configuration of FIG. 13, in the signal processing device 404, the recognition unit 403
recognizes the positions of the listeners P1, P2 and P3 from the image represented by the image
signal output from the camera 112, and detects the number of listeners. . Then, for each of the
listeners P1, P2 and P3, position information is detected as in the first embodiment. When a
plurality of listeners P1, P2 and P3 are detected by the recognition unit 403, the voice control
unit 402 uses the positional relationship between the listeners P1, P2 and P3 in addition to the
orientation of the speaker main body 111 to generate a signal Do the processing. For example,
when a plurality of listeners exist within the range of a predetermined angle viewed from the
speaker main body 111, directivity characteristic control is performed on the centers of the
plurality of listeners. In addition, when only one listener is away, directivity characteristic control
for the other listeners is performed or correction itself is not performed. Thus, when there are a
plurality of listeners, more appropriate reproduction is realized by performing signal processing
according to the positional relationship between the listeners.
[0073]
When the number of listeners is detected from the camera image, for example, when a plurality
of listeners overlap with each other when viewed from a speaker, there is a possibility that a
plurality of listeners may be recognized as one person. However, even in such a case, if
09-05-2019
20
directional characteristic control is performed toward a listener who is recognized as one person,
there is no particular problem in sound quality. That is, when a plurality of listeners appear to
overlap, it is not necessary to detect the number of people strictly, and the processing is
simplified accordingly.
[0074]
In each of the above-described embodiments, the correction of the directivity characteristic has
been mainly described, but in addition, for example, the face direction of the listener viewed from
the speaker or the distance between the speaker and the listener is detected. A configuration is
also possible in which the voice control unit performs control by estimating the transfer function.
The voice control unit holds in advance control parameters according to the face direction and
the distance, and switches and reproduces the control parameters according to the detection
result. An example of a simple correction is the correction of the distance from the speaker to the
listener. For example, when the distance from a certain speaker to the listener is closer than that
of the other speakers, the timing of emitting sound is delayed. Thereby, the same effect as
expanding the speaker distance can be expected.
[0075]
In the present invention, in the AV system, sound reproduction more appropriate for the listener
is realized, and thus, the present invention is useful, for example, for improving the sound quality
of home theater equipment and the like.
[0076]
5 is an example of a configuration of an AV system according to Embodiment 1;
It is an example of the appearance of a speaker with a camera. It is a figure for demonstrating the
process which detects angle information among the processes in a recognition part. It is a figure
for demonstrating the process which detects distance information among the processes in a
recognition part. It is a graph which shows an example of the directional characteristic of a
speaker. It is an example of the data table of the correction gain in equalizer processing. It is a
figure for demonstrating the relationship between the distance from a sound source, and the
attenuation amount of a sound. It is an example of the data table of the correction gain for
attenuation correction. It is an example of the processing block in a voice control part. 5 is an
09-05-2019
21
example of a configuration of an AV system according to Embodiment 2; It is an example of the
data table of a filter correction coefficient. 15 is an example of a configuration of an AV system
according to Embodiment 3. 21 is an example of a configuration of an AV system according to
Embodiment 4.
[0077]
DESCRIPTION OF SYMBOLS 100, 200, 300 Speaker with a camera 102, 202, 402 Sound control
part 103, 403 Recognition part 104, 204, 304, 404 Signal processing apparatus 111 Speaker
main body 112 Camera 113 Array speaker (speaker main body) 114 Movable mechanism 301
Movable mechanism control Part P1, P2, P3 listeners
09-05-2019
22
Документ
Категория
Без категории
Просмотров
0
Размер файла
36 Кб
Теги
jp2010206451
1/--страниц
Пожаловаться на содержимое документа