close

Вход

Забыли?

вход по аккаунту

?

JP2007072273

код для вставкиСкачать
Patent Translate
Powered by EPO and Google
Notice
This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate,
complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or
financial decisions, should not be based on machine-translation output.
DESCRIPTION JP2007072273
An audio signal discrimination device capable of accurately discriminating speech / non-speech
from an input speech signal is provided. SOLUTION: A musicality detecting means 11a for
detecting the degree of musicality of an input audio signal, a speechality detecting means 11b for
detecting the degree of speechiness of an input audio signal, and an input audio A speech / nonspeech determining means 12 is provided for making a determination to determine whether the
signal corresponds to speech or non-speech. The speech / non-speech determination means 12
uses speech / non-speech based on the detection result of the musicality detection means 11a
and the detection result of the speechability detection means 11b, using different calculation
expressions according to the degree of speechiness and the degree of musicality. Make a speech
decision. [Selected figure] Figure 1
Audio signal discrimination device, sound quality adjustment device, content display device,
program, and recording medium
[0001]
The present invention relates to an audio signal determination device, a sound quality adjustment
device, a content display device, a program, and a recording medium, and more specifically, an
audio signal determination device for performing speech / non-speech determination on an audio
signal, the audio signal determination The present invention relates to a sound quality
adjustment device provided with the device, a content display device provided with the sound
quality adjustment device, a program thereof, and a computer readable recording medium
recording the program.
[0002]
08-05-2019
1
Conventionally, in general audio devices, various sound quality adjustment devices such as bass
adjustment for adjusting the output frequency characteristic in the bass region, treble adjustment
for adjusting the output frequency characteristic in the treble region, and loudness adjustment
for emphasizing the bass region and the treble region Is provided.
[0003]
As such a sound quality adjustment device, it is determined whether the input signal is music
information or other information by detecting the presence or absence of the periodicity from
the audio information of the input audio signal itself, and it is determined based on the result.
According to this, one that controls acoustic parameters is also proposed (see, for example,
Patent Document 1).
Japanese Patent Application Laid-Open No. 61-93712
[0004]
However, in a device that receives television broadcasts and radio broadcasts in particular, if the
propriety of the music information is judged from only the audio information, an erroneous
judgment may occur.
[0005]
For example, when a cappella flows in a music program, the sense of rhythm can not be detected
because of its style, and it is determined that it is not music information, and the optimum
acoustic parameter for this music information is not selected by an equalizer etc. An incorrect
decision occurs.
As a result, since this music information may also cause the equalizer to select, for example, an
acoustic parameter or the like that is optimal for speech, it is possible to use the clarity of the
word for the acapella's music information that emphasizes the live sound. As a result, the sound
characteristic is emphasized (the midrange is relatively emphasized) and the sound setting does
not become what the user originally wants to hear.
08-05-2019
2
[0006]
In addition, while watching a news program, it is preferable to select parameters and the like that
are most suitable for speech that emphasizes the clearness of the language originally, but
depending on the content of the news, news coverage may sometimes be parallel to the
announcer's speech. In some cases, the sound collected at the site may be output as it is.
If music is mixed in such collected sound information, it is also assumed that the music
information output from the collected sound has an advantage over the speech of the news
program depending on the balance between the volume of the both. Therefore, even in such a
case, it is a problem that can sufficiently occur as an example opposite to the acapella example
described above.
[0007]
Then, even if the device solves the above-mentioned problems and can execute accurate speech /
non-speech determination on the input audio signal, the determination and sound quality
adjustment based on the determination are executed in the device. As a result, there arises a
problem that the user can not understand why the sound quality has been changed. In particular,
when the voice output as a result of the sound quality adjustment based on the speech / nonspeech determination is not the user preference, the user can not understand the cause of the
sound quality adjustment and can not change the setting, which may cause discomfort. I have to
hold
[0008]
The present invention has been made in view of the above circumstances, and includes an audio
signal discrimination device capable of accurately discriminating between speech and non-speech
with respect to an input audio signal, and the audio signal discrimination device It is an object of
the present invention to provide a sound quality adjustment device, a content display device
provided with the sound quality adjustment device, a program therefor, and a computer readable
recording medium storing the program.
[0009]
Further, the present invention determines the speech / non-speech with respect to the input
audio signal, and adjusts the sound quality based on the determination result, the sound quality
08-05-2019
3
adjustment device capable of causing the user to visually recognize the determination result.
Another object of the present invention is to provide a content display device provided with the
sound quality adjustment device, a program thereof, and a computer readable recording medium
recording the program.
[0010]
The present invention is configured by the following respective technical means in order to solve
the problems as described above.
[0011]
The first technical means comprises: a musicality detection means for detecting the degree of
musicality of the input audio signal; a speechality detection means for detecting the degree of
speechality of the input audio signal; A speech signal discrimination device comprising speech /
non-speech decision means for making a decision to decide whether the speech signal
corresponds to speech or non-speech, said speech / non-speech decision means The
determination of speech / non-speech using different formulas depending on the degree of
speech and the degree of musicality based on the detection result of the musicality detection
means and the detection result of the speech nature detection means The
[0012]
The second technical means is the first technical means, wherein the speech / non-speech
judging means classifies the detection result of the musicality detecting means into a
predetermined number of stages, and the detection result of the speech property detecting
means The speech / non-speech determination is performed by classifying into a predetermined
number of steps equal to or different from the predetermined number, and using different
calculation formulas for each combination of classifications according to the degree of musicality
and the degree of speechiness. The
[0013]
A third technical means according to the first or second technical means includes monaural /
stereo judging means for judging whether the input audio signal is a monaural signal or a stereo
signal, and the speech / non-speech The determination means is characterized in that the
correction component of the formula is adjusted based on the determination result of the
monaural / stereo determination means.
[0014]
08-05-2019
4
A fourth technical means is the sound quality adjustment device including the audio signal
discrimination device according to any one of the first to third technical means, wherein the
audio signal discriminated as speech / non-speech by the audio signal discrimination device On
the other hand, the present invention is characterized in that it comprises sound quality
adjustment means for adjusting the sound quality to different for speech and non-speech.
[0015]
A fifth technical means according to the fourth technical means includes a determination result
display means for displaying the determination result in the speech / non-speech determination
means, and the determination result display means speaks the determination result to the user.
Alternatively, it is characterized in that it is displayed stepwise according to the degree of nonspeech.
[0016]
A sixth technical means is the fifth technical means, wherein the sound quality adjusting means
has adjustment setting means for setting whether or not to execute the sound quality adjustment
based on the determination result of the speech / non-speech determining means. The
determination result display means displays the determination result only when the adjustment
setting means is set to execute the sound quality adjustment.
[0017]
A seventh technical means is the fifth or sixth technical means, wherein the determination result
display means has a display setting means for setting whether or not to display the determination
result, and the display setting means The display of the determination result is performed only
when the display of the determination result is set.
[0018]
An eighth technical means is a content display device including the sound quality adjustment
device and the content input device according to any one of the fifth to seventh technical means,
the audio being included in the content input by the content input device. A signal is input to the
sound quality adjustment device, the sound quality is adjusted and sound is output, and a video
signal included in the content is displayed, and the determination result is displayed by the
determination result display means as necessary. It is a feature.
[0019]
A ninth technical means comprises: a musicality detection step of detecting the degree of
08-05-2019
5
musicality of the input audio signal; a speechality detection step of detecting the degree of
speechality of the input audio signal; A program for making a computer execute a speech / nonspeech determination step of making a determination to determine whether a speech signal
corresponds to speech or non-speech, said speech / non-speech In the determination step, based
on the detection result in the musicality detection step and the detection result in the speech
characteristic detection, speech / non-speech determination using different calculation formulas
according to the degree of speechiness and the degree of musicality It is characterized by doing.
[0020]
The tenth technical means is the ninth technical means, wherein the speech / non-speech
determination step classifies the detection result of the musicity detection step into a
predetermined number of stages, and detects the detection result of the speech property
detection step. The speech / non-speech determination is performed by classifying into a
predetermined number of steps equal to or different from the predetermined number, and using
different calculation formulas for each combination of classifications according to the degree of
musicality and the degree of speechiness. The
[0021]
An eleventh technical means relates to the ninth technical means, wherein the sound quality of
speech and non-speech is adjusted to different sound quality for the speech signal determined to
be speech / non-speech by the speech / non-speech determination step. It is characterized by
including an adjustment step.
[0022]
A twelfth technical means according to any one of the ninth to eleventh technical means includes
a determination result display step of displaying the determination result in the speech / nonspeech determination step on a display unit, wherein the determination result display step
includes The determination result is displayed stepwise to the user according to the degree of
speech or non-speech.
[0023]
A thirteenth technical means is a computer readable recording medium recording the program
according to any of the ninth to twelfth technical means.
[0024]
08-05-2019
6
According to the present invention, it is possible to accurately distinguish speech / non-speech
from an input speech signal.
Further, according to the present invention, when speech / non-speech is determined for the
input audio signal and the sound quality is adjusted based on the determination result, it is
possible to allow the user to visually recognize the determination result.
[0025]
The speech signal discrimination apparatus according to the present invention is provided with
musicity detection means, speech quality detection means, and speech / non-speech
determination means.
Hereinafter, although the sound quality adjustment device provided with such a sound signal
determination device and provided with sound quality adjustment means for performing sound
quality adjustment based on the determination here will be described, the sound signal
determination device according to the present invention For example, the present invention is
also applicable to separate recording (recording) of content (content including the audio signal)
based on discrimination.
[0026]
Further, the sound quality adjustment apparatus according to the present invention includes
sound quality adjustment means, and preferably a determination result display means, in
addition to such an audio signal determination apparatus.
Hereinafter, in the description of the present invention, the speech / non-speech determination
will be described using a preferred example of optimizing the determination criteria in the
speech / non-speech determination based on the mono / stereo determination and the
determination result. In this case, it is of course possible to adopt a form which does not execute
such mono / stereo determination and optimization.
08-05-2019
7
As such another embodiment, although a mode in which sound / non-speech determination is
performed instead of monaural / stereo determination and optimization will be described, it goes
without saying that monaural / stereo determination and optimization and speech / speech
determination are used in combination. You may employ the form which
[0027]
FIG. 1 is a block diagram showing a configuration example of a sound quality adjustment
apparatus according to an embodiment of the present invention. In the figure, 1 is a sound
quality adjustment apparatus, 10 is an audio signal input means, 11a is a musicity detection
means, and 11b is 12 is a speech / non-speech judging means, 13 is a monaural / stereo judging
means, 14 is a reference optimizing means, 14a is a switch, 14b is a setting means for threshold
(threshold) VSL1, 14c is a threshold VSL2 The setting means 15 is a sound quality adjustment
means, 16 is an audio signal output means, and 17 is a determination result display means.
[0028]
The musicality detection means 11a is a means for detecting the degree of musicality of the input
audio signal, and can also be called non-speechiness determination means.
The speech property detection means 11b is a means for detecting the degree of speech property
of the input speech signal, and can also be referred to as a speech property determination means.
The musicality indicates the possibility that the audio signal is a music signal, and the speech
indicates the possibility that the audio signal is a signal including speech and the like.
The music property detection means 11a and the speech property detection means 11b may be
configured entirely or partially by hardware or software.
[0029]
The speech / non-speech determination means 12 makes a determination to determine whether
the speech signal input by the speech signal input means 10 corresponds to speech or non-
08-05-2019
8
speech.
In the audio signal input means 10, the input source and the input method are not limited.
Also, the speech / non-speech determining means 12 may be configured entirely or partially by
hardware or software.
[0030]
Then, the speech / non-speech determination means 12 in the present invention is based on the
detection result of the musicity detection means 11a and the detection result of the speechity
detection means 11b, and a different formula is obtained according to the degree of speechiness
and the degree of musicality. Use to make a speech / non-speech decision.
Therefore, for example, when the degree of speech is detected at 0 to 100 and the degree of
music is also at 0 to 100, speech / non-speech determination is executed by thresholding 101 ×
101 detection results.
[0031]
Since such determination is complicated, more preferably, the speech / non-speech
determination means 12 first corresponds to a predetermined number of stages of the detection
result of the musicality detection means 11a classified in advance , And it is determined at which
stage of the predetermined number of stages the same or different from the predetermined
number as the detection result of the speech property detection means 11b.
Then, the speech / non-speech determination means 12 performs speech / non-speech
determination using different calculation formulas for each combination of classifications
according to the degree of musicality and the degree of speechiness.
For example, when both music and speech are classified into three levels, nine calculation
formulas of 3 × 3 are used, and these formulas are selected based on the result of detection of
08-05-2019
9
music and speech. Calculations are made.
[0032]
In addition, the speech / non-speech judging means 12 uses a rule of thumb that "generally, news
programs and the like are mostly monaural broadcasts, while CMs and music programs where
music flows are often set to stereo broadcasts". It is preferable to determine whether the program
currently being broadcast is suitable for speech / non-speech (music) by detecting a monaural /
stereo signal superimposed on the audio signal.
For this reason, the sound quality adjustment device described here comprises monaural / stereo
judgment means 13 and reference optimization means 14, by which speech / non-speech
judgment is optimized, and based on the judgment, the above-mentioned formula or the like is
calculated. Control of the acoustic parameters of the formula is carried out.
[0033]
The monaural / stereo determination means 13 determines whether the input audio signal is a
monaural signal or a stereo signal.
The monaural / stereo determination means 13 may be configured entirely or partially of
hardware or software, or may be determined based on information such as monaural / stereo
switching when an audio signal is input. May be
Furthermore, when the original content of the audio signal is posted in the electronic program
guide (EPG) and can be reserved for recording, etc., since the monaural / stereo information in
the EPG is also posted, the information is It is also possible to perform monaural / stereo
judgment by acquiring.
[0034]
The reference optimization means 14 optimizes the judgment criteria in the speech / non-speech
08-05-2019
10
judgment means 12 based on the judgment result of the monaural / stereo judgment means 13.
This optimization may be performed by changing the parameter of the correction term
(correction component) of the above-mentioned formula, or, for example, a parameter of a
threshold such as threshold processing after calculation by the above-mentioned formula (for
example, (VSL1, VSL2) may be changed, or both of them may be changed.
As described above, the accuracy of the detection function can be improved by optimizing the
determination criterion of the automatic speech detection function by the monaural / stereo
determination. Therefore, it is possible to accurately discriminate speech / non-speech with
respect to the input speech signal, that is, to perform suitable speech / non-speech detection
according to the mono / stereo signal of the speech signal.
[0035]
For example, optimization control can be performed so that it is easy to determine that it is
speech in the case of monaural signals such as news, and that it is easy to determine that it is
non-speech when there is much music including BGM. Moreover, in this example, in order to
accurately determine speech / non-speech of an audio signal, it is assumed that the monaural /
stereo determination and reference optimization are performed on the audio signal in advance.
The speech / non-speech determination may be performed by sequentially performing mono /
stereo determination and criterion optimization each time an audio signal is input.
[0036]
Further, it is preferable that the detection in the musicity detection means 11a and the speech
property detection means 11b be performed by performing a plurality of signal analysis on the
input audio signal. The signal analysis may be, for example, analysis of change in energy versus
time of signal, analysis of evenness of syllables, analysis of frequency versus voice intensity, and
the like. By such signal analysis, for example, (I) signal versus time energy change, (II) frequency
versus speech intensity, (III) order of vowel and consonant, (IV) syllable length, (V) consonant
and vowel Amount of energy of Then, as a difference between the musicality detection means
11a and the speechiness detection means 11b, some or all of the parameters of the signal
analysis may be made different.
08-05-2019
11
[0037]
Then, based on the detection results, speech / non-speech may be determined in consideration of,
for example, the following points. (I) In speech, there is a division with low speech energy
between syllables (high speech energy) and syllables, and there is often no such division in nonspeech. (II) The speech has a strong mid-range intensity of 100 Hz to 3 kHz, and the non-speech
has strong low- and high-pass intensities. (III) In speech, the order in syllable often continues
from consonant to vowel. (IV) Speech often has uniform syllable length. (V) In speech, the
amount of energy of vowels is often larger than the amount of energy of consonants.
Furthermore, weighting is performed on (I) to (V), summation is performed, and statistical
processing is performed to obtain a final signal analysis result. When the value is monaural, the
threshold value VSL1 for that is obtained. In the case of stereo, determination of speech / nonspeech (for example, determination of the degree of the possibility of speech) may be performed
by determining with the threshold value VSL2 for that purpose. Alternatively, the reference
optimization means 14 may change the set of thresholds for each signal analysis as speech /
non-speech criteria based on monaural / stereo decisions.
[0038]
The sound quality adjusting means 15 adjusts the sound quality of the speech signal determined
to be speech / non-speech according to the above-described configuration to at least different
speech and non-speech. The sound quality setting method here is arbitrary, and the setting value,
the setting value of increase / decrease, or the setting value in each frequency band may be
different depending on the degree of possibility of speech / non-speech etc. For example, the
sound quality setting in which the center frequency of an equalizer such as a graphic equalizer
and the Q value of the filter (sharpness of peaks and valleys in a curve of the graphic equalizer) is
fixed, or can be changed like a parametric equalizer It may be a sound quality setting. Then, the
audio signal output unit 16 outputs the audio signal adjusted by the sound quality adjustment
unit 15.
[0039]
Then, the determination result display means 17 which is the feature of the present invention is,
for the user, the determination result in the speech / non-speech determination means 12 as the
degree of speech or non-speech Display in stages according to). In fact, the speech / non-speech
08-05-2019
12
judging means 12 detects speech and non-speech (music) as described above, selects a formula
according to the detection result, and selects the calculation result by the formula. The
thresholding is performed with a predetermined threshold to determine whether it is speech or
non-speech. The determination result display means 17 may display such speech / non-speech
determination results in stages according to the level (for example, the degree of speech). When
performing such a stepwise display, a plurality of threshold processes (it is better to prepare at
least two or more sets of threshold groups depending on the degree of monaural / stereo) etc.
are performed together. By adjusting to the sound quality according to each step, more stepwise
display becomes effective.
[0040]
Further, in the determination result display means 17, the speech property detection result or the
music property (music signal) detection result as the source of such speech / non-speech
determination is graded according to the detection level (for example, the degree of speech) It
may be displayed as In such a case, both the speech detection result and the music detection
result are used only for displaying the judgment result, and the speech detection result is
adopted as the speech / non-speech judgment result as it is for sound quality adjustment. It is
also good. However, in this case, the data that is the source of the sound quality adjustment and
the data of the determination result will differ, for example, in the music program, but the extent
that the difference can not be understood by the viewer (for example, the extent It is necessary to
devise to become
[0041]
Further, the sound quality adjustment means 15 may have adjustment setting means for setting
whether or not to execute the sound quality adjustment by the sound quality adjustment means
15 based on the determination result of the speech / non-speech determination means 12. The
sound quality adjustment due to other than the speech / non-speech determination may be set
separately. In this adjustment setting means, setting is made by user operation. The setting
referred to here is, for example, (a) automatically performing the sound quality adjustment based
on the speech / non-speech determination, and (b) fixing the sound quality adjustment (the
sound quality adjustment performed for a predetermined speech and And (c) sound quality
adjustment (sound quality adjustment based solely on speech / non-speech determination) is not
performed, or the like. Based on the user setting in the adjustment setting means, the sound
quality adjustment means 15 performs the sound quality adjustment conforming to each of (a),
(b) and (c), and in the judgment result display means 17 in the case of (a) In the case of (b) and
08-05-2019
13
(c), the display of the determination result (detection result) is not displayed. As described above,
the determination result display unit 17 may display the determination result only when the
adjustment setting unit is set to perform the sound quality adjustment. For example, the
judgment result is not displayed when the sound quality adjustment for speech is merely
performed as described in (b) above.
[0042]
Further, the determination result display means 17 may have a display setting means for setting
whether or not to display the determination result. The determination result display means 17
may display the determination result only when the display setting means is set to execute the
determination result display. The display setting means may be provided regardless of the abovementioned adjustment setting means, but in the form provided with the adjustment setting
means, the judgment result display means 17 is a judgment result by the adjustment setting
means. The determination result is displayed only when the sound quality adjustment based on is
performed and the determination result display is performed.
[0043]
FIG. 2 is a flowchart for explaining an example of sound quality adjustment processing and
determination result display processing in the sound quality adjustment device of FIG. 1, and FIG.
3 is a sound quality setting equalizing used in sound quality adjustment processing in the sound
quality adjustment device of FIG. FIG. 4 is a view showing an example, and FIG. 4 is a view
showing a screen display example in the determination result display process of FIG.
[0044]
For the sake of simplicity, the speech / non-speech decision criteria will be described as being
performed by one threshold process, but in the case of performing multi-step threshold process,
the following description should read the threshold as a set of threshold values. Good.
First, when an audio signal is input, monaural / stereo determination is performed by the
monaural / stereo determination means 13 (step S1). In this determination, for example,
assuming that L is a left input signal and R is a right input signal, it is preferable to execute the
operation of (L−R) / (L + R) on the input signal to perform phase difference determination.
08-05-2019
14
[0045]
If it is determined that the signal is a monaural signal by this determination, the reference
optimization unit 14 connects the switch 14a to the setting unit 14b for the threshold VSL1, and
the threshold of the determination in the speech / non-speech determination unit 12 is It is set to
VSL1 (step S2). On the other hand, when it is determined in step S1 that the signal is a stereo
signal, the reference optimization unit 14 connects the switch 14a to the setting unit 14c for
setting the threshold VSL2, and the determination of the speech / non-speech determination unit
12 is performed. The threshold value is set to VSL2 (step S3). In this way, by optimizing the
setting of the threshold value, it is easy to determine that it is speech when monaural signals
such as news etc., and to be easily determined as non-speech when there is a lot of music
including BGM. Can. The configuration of the reference optimization means 14 is not limited to
that shown in the drawings.
[0046]
Next, the musicity detection means 11a and the speech characteristics detection means 11b
execute the detection of musicity and the detection of speechiness (steps S4 and S5). The order
of steps S4 and S5 does not matter. Then, the speech / non-speech determination means 12 first
selects a calculation formula based on the detection results in steps S4 and S5 and executes the
calculation, and further the threshold value VSL1 / set in any one of steps S2 / S3. Speech / nonspeech determination is performed based on VSL2 (step S6). Then, if it is determined to be
speech, the sound quality setting A is selected to adjust the sound quality (step S7). On the other
hand, when it is determined in step S6 that the speech is not speech, the sound quality setting B
is selected to adjust the sound quality (step S8).
[0047]
Here, an example of the difference between the sound quality setting A and the sound quality
setting B will be described with reference to FIG. In the case of the sound quality setting A
(speech), the frequency characteristic of the equalizer is set as indicated by a graph 21. In the
sound quality setting B (non-speech), the frequency characteristic of the equalizer is set as
indicated by a graph 22. The difference between the graph 21 and the graph 22 is that the
vicinity of the predetermined low frequency 22a and the vicinity of the predetermined high
frequency 22b are emphasized in the non-speech as compared to the speech.
08-05-2019
15
[0048]
Before and after the process of step S7 / S8 (at least after the speech / non-speech determination
in step S6), the determination result is displayed (step S9). As a method of this display, an LED
may be displayed on the sound quality adjustment device, and when an audio signal is input
together with a video signal, a screen for displaying the video signal as illustrated in FIG. 4 for
example An OSD (On Screen Display) may be displayed on the display 31.
[0049]
Further, when displaying the determination result in step S9, the degree of speech (or the degree
of non-speech) by the speech / non-speech determination is displayed stepwise so as to be
visible. The degree of speech or the degree of non-speech in this case is usually different from
the degree of musicality and the degree of speechiness detected by the musicity detection means
11a and the speechality detection means 11b. In addition, as the lowest stage display processing
in this case, processing of speech / non-speech determination is performed with one threshold as
a result and sound quality adjustment is performed corresponding to at least two stages of
speech and non-speech indicate.
[0050]
In the following, in an example in which the degree of speech is visually recognized by the user,
as illustrated in FIG. 4, for example, characters 32 and the like representing “speech degree”
are displayed on the screen 31. It is preferable to display the mark 33 of the number according
to. The number of marks 33 corresponds to the degree of speech and is also called a speech
sensor mark, and as a result indicates how close the sound quality is adjusted to the speech.
Examples of marks 33 are green. For example, a speech mark may be displayed in an image of
the face of the person who opened the mouth in color. In addition, for example, depending on the
user setting, it may be possible to select a color (for example, green for Japanese, orange for
English characters, etc.) and to select a shape (speaker mark, sine, cosine mark, flashing flashing,
etc.). Note that, in the example of FIG. 4, the name of the sound quality adjustment based on the
speech / non-speech determination (here, named as “lively voice”) is shown as the character
32 representing the “speech degree”. Also, the certainty of the determination result of speech
/ non-speech may be displayed as a percentage in the vicinity of the speech sensor mark. This
08-05-2019
16
certainty may be low if the detection result of the musicality and the detection result of the
speechability are too contradictory.
[0051]
Further, when displaying the determination result, the face image may be displayed in the
horizontal direction at the lower part of the screen 31 like the mark 33, or the display position
can be moved to any position automatically or manually. Furthermore, it is preferable to be able
to change the vertical display / horizontal display. Further, as a method of moving the display
position, for example, when characters are displayed at the lower or upper part of the screen, it is
preferable to be able to move to a position not overlapping with the characters. More specifically,
for example, it may be moved to a position not overlapping the character display of news
broadcast of data broadcast or the like at the bottom of the screen, for example, Japanese
dubbing display of audio multiplex broadcast. In addition, the program type information (for
example, a song program or another program) is acquired from the EPG, and in the case of a
song program, the size of the display is reduced or enlarged, and a position not overlapping the
display of the lyrics displayed on the screen Applications such as displaying on the screen are
also possible.
[0052]
Furthermore, the present invention is also applicable to a content display apparatus (for example,
a broadcast reception apparatus that receives broadcast signals of television broadcast and radio
broadcast, not limited to digital / analog) provided with the above-described sound quality
adjustment apparatus and content input apparatus. It is applicable. In this content display device,
an audio signal included in the content input by the content input device is input to the sound
quality adjustment device, the sound quality is adjusted and audio is output, and the video signal
included in the content is displayed The judgment result is displayed by the judgment result
display means 17 in accordance with. The content display apparatus according to the present
invention is also applicable to, for example, a general-purpose personal computer (hereinafter
abbreviated as PC) including a television receiver, a content reproduction program, and a module
such as a video card (also referred to as a video adapter). , Which will be described later. Also, in
the present invention, the delivery and broadcast form of the content is basically not limited.
Next, a television receiver (television receiver) will be described more specifically as an example
of a content display device incorporating the sound quality adjustment device.
08-05-2019
17
[0053]
FIG. 5 is a block diagram showing one configuration example of a television receiver which is one
of the application examples of the sound quality adjustment apparatus of FIG. 1, and FIG. 6 is an
example of a formula table stored in the microcomputer in FIG. FIG. 7 is a view showing an
example of a mark display target table stored in the microcomputer in FIG. In FIG. 5, 4 is a
television receiver body, 40 is a tuner unit, 41 is an external input unit, 42 is a body operation
unit, 43 is an image processing IC (Integrated Circuit), 44 is a microcomputer of the body
(hereinafter microcomputer), 45 is an audio processing IC, 46 is a display, 47L is a left speaker,
47R is a right speaker, 48 is a light receiving unit, 49 is a remote control unit (hereinafter
referred to as remote control). 6 and 7, 51 is a formula table stored in the ROM (Read Only
Memory) or the like in the microcomputer 44, and 52 is a speech sensor mark display target
table stored in the ROM or the like in the microcomputer 44. .
[0054]
8 is a flowchart for explaining speech / non-speech determination and determination result
display processing in the television receiver of FIG. 5, and FIG. 9 describes determination result
display processing in the television receiver of FIG. FIG. 6 is a flow chart for describing the
determination result display process in the flow chart of FIG. 10 to 12 are diagrams showing an
example of the setting screen of the determination result display in the sound quality adjustment
apparatus of FIG. 1. FIG. 10 shows an example of setting items of voice adjustment and FIG. 11
shows an example of setting items of FIG. Among them, FIG. 12 shows items of the display
setting for the sound quality adjustment according to the present invention among the setting
items shown in FIG. 10, respectively. Further, in FIGS. 10 to 12, 6 is a setting screen example of
voice adjustment, 61 is a setting menu list, 62 is a voice adjustment item list, 63 is an operation
setting item, and 64 is a display setting item.
[0055]
The television receiver main body 4 illustrated here mainly uses the main body microcomputer
44 as an example of the control means, the antenna and the video / audio input unit such as the
tuner unit 40 and the external input unit 41, and various video processing for the input video
signal. Image processing IC 43 for applying various audio processing to the input audio signal,
main body operation unit 42 for receiving user operation, LCD (display device) such as LCD, PDP,
organic EL etc. And 46, left and right speakers 47L and 47R for outputting an audio signal
08-05-2019
18
subjected to audio processing, and a light receiving unit 48 for receiving light from the remote
control 49. A calculation formula table 51 and a speech sensor mark display target table 52 are
stored in the ROM or the like in the microcomputer 44. The microcomputer 44 and the audio
processing IC 45 (and the video processing IC 43) can also be incorporated as a system LSI
(Large Scale Integrated Circuit).
[0056]
Further, the setting of the periodic processing time is set in the adjustment process of the
television receiver 4. The setting of the periodic processing time is a process of setting a period
in which the microcomputer 44 reads the determination result of speech / non-speech made by
the voice processing IC 45 when performing the determination result display processing
according to the present invention. It is good to set reading. Here, for example, it may be variable
between 100 ms and 2000 ms, or may be variable depending on not only the adjustment process
but also user setting. As described above, if the reading time is not fixed to some extent, the
smoothness of the determination result display is affected. For example, -100 to 0 to +100
(FFFF9C to 000000 to 0000064) is prepared as the movable range of the register as data
actually read in the cycle set here, that is, data of the determination result of speech / nonspeech, for example The initial setting value of this register is set to "000000". Then, the sound
quality adjustment itself is controlled such that the register value becomes the sound quality
setting of the speech in the positive direction and the non-speech sound in the negative direction.
In the mode in which the sound quality adjustment is not performed, the sound quality setting of
speech may be forcedly performed in the microcomputer 44, but the above-mentioned speech /
non-speech determination result is not written in the register.
[0057]
Further, the calculation formula of the sound quality setting is set in advance by the following
formula illustrated in FIG. First, the speech quality detection results are classified into three
stages of (I) 0 ≦ SP result ≦ SPEECH LP, (II) SPEECH LP <SP result <SPEECH HP, and (III)
SPEECH HP ≦ SP result . The musicality detection results are classified into three stages of (i) 0
≦ MU result ≦ MUSIC LP, (ii) MUSIC LP <MU result <MUSIC HP, and (iii) MUSIC HP ≦ MU
result. For example, the SP result may adopt the integer part of the speech property detection
result / 83886, and the MU result may adopt the integer part of the music property detection
result / 83886. The SP result and the MU result may be, for example, values in the range of 0 to
100 (000000h to 7FFFFFh).
08-05-2019
19
[0058]
And in the case of (I) and (i), | SP result-MU result | + α, in the case of (I) and (ii) | SP result-MU
result |, in the case of (I) and (iii)- MU results, (II) and (i), SP results-MU results, (II) and (ii), | SP
results-MU results | + α, (II) and (iii), SP results -MU results, (III) and (i), SP results, (III) and (ii),
SP results-MU results + α, (III) and (iii) | SP results-MU results A calculation formula such as | +
α is used.
[0059]
Here, SPEECH LP, SPEECH HP, MUSIC LP, MUSIC HP are the border of the state in the range of 0
to 100, and MONO and STE are in the range of 0 to 100 at the time of stereo judgment by
monaural / stereo judgment “STE”, At the time of monaural judgment, the value of “MONO”
is added to the calculation result as + α.
Also, these values "SPEECH LP", "SPEECH HP", "MUSIC LP", "MUSIC HP", "MONO" and "STE" may
be prepared in the adjustment process. In the case of “STE”, α = + 5, in the case of
“MONO”, α = + 10 or the like may be determined, and α may be a negative value.
[0060]
In addition to the calculation formula of the sound quality setting, the display target number is
set in advance by the following formula and the setting of the values of MIN and MAX in the
following formula. Here, the set value of each display number is "less than or equal to". The
following equation may be stored as a speech sensor mark display target table 52 or the like.
[0061]
MIN + (MAX-MIN) x variable [1 to 9] ÷ 9
[0062]
In the above equation, MAX and MIN are the maximum value and the minimum value preset as a
value between -100 and +100 in the above-mentioned example, for example, MIN is preset to -
08-05-2019
20
80, MAX to 90, etc. You should do it.
Furthermore, in the following equation, it is exemplified that the determination result display is
performed in 10 steps (that is, MAX), that is, as an example of the number of displays, it is preset
to display 0 to 10 marks 33 in FIG. But it is not limited to this.
[0063]
Referring to FIG. 8, the processing of the microcomputer 44 in the television receiver 4 as
described above first performs periodic processing (for example, in units of 100 ms) with the
period set as described above (step S11). In step S11, the following steps S12 to S16 are
executed by the arrival of the processing cycle. First, in step S12, it is determined whether the
operation setting is automatic. If it is automatic, the processing of steps S13 to S16 is executed to
perform the sound quality adjustment based on the speech / non-speech determination result,
but if it is not automatic (fixed), the subsequent processing is not performed, For example, sound
quality setting for speech may be performed forcibly.
[0064]
In step S13, the microcomputer 44 instructs the speech processing IC 45 to detect speech and
music and reads the detection result. Next, or before the step S13, the microcomputer 44
instructs the sound processing IC 45 to make a monaural / stereo determination, and reads the
detection result (step S14). Then, the microcomputer 44 selects a formula by comparing the read
detection result in the voice processing IC 45 with the table 51 (step S15). In step S15, the
calculation formula is determined by comparing the speech property detection result and the
music property detection result with "SPEECH LP", "SPEECH HP", "MUSIC LP", and "MUSIC HP".
Then, the microcomputer 44 substitutes the monaural / stereo determination result together
using the corresponding calculation formula on the table 51 to calculate the calculation result,
and calculates the speech / non-speech determination result (calculation result of the sound
quality setting) And write to the register (step S16). The value of this register is used to set the
display target value in step S22 of FIG.
[0065]
08-05-2019
21
In the display processing of the microcomputer 44, first, periodic processing (for example, in
units of 100 ms) in the cycle set as described above is performed (step S21). In step S21, the
following steps S22 to S32 are executed by the arrival of the processing cycle. First, in step S22,
the register value of the determination result obtained as a result of the process described in FIG.
8 is substituted into the above equation (table 52), that is, the calculation result by sound quality
setting (sound quality adjustment) is substituted into table 52. Thus, the display target value is
set, that is, the number of displays is determined.
[0066]
Here, when there is no synchronization and when there is no sound, the display is immediately
set to "0" (steps S23 and S24). In step S23, it is determined whether the input signal is
synchronized or not and the silence state is determined. If there is no input signal
synchronization or there is no speech, the calculation for forcibly setting "0" in step S24 is
performed. Go and go to step S30. The determination of the silent state will be described later in
another embodiment. The determination in step S23 and the calculation in step S24 are effective,
for example, when the user views a news program and then a sandstorm screen is displayed by
music selection. In such a case, as a speech / non-speech judgment result, for example, the
judgment result that it is speech (for example, the register value is +100) gradually falls to 0, but
remains in the register, periodically The user misunderstands that it is executed even for
sandstorms in which speech / non-speech determination can not be performed, since the proper
display reads and executes the register value (the remaining value). I will. Therefore, in order to
prevent such misunderstanding, it is necessary to forcibly set the register value to zero.
[0067]
On the other hand, in the case of NO at step S23, it is determined whether the display number of
the previous cycle is the display target value set at step S22 (step S25). In the case of YES at step
S25, the display number is maintained (step S26), and the process proceeds to step S30. If NO in
step S25, it is determined whether the display number of the previous cycle is smaller than the
display target value set in step S22 (step S27). In the case of YES at step S27, calculation of "the
display number of the previous cycle + 1" is performed (step S28), and the process proceeds to
step S30. In the case of NO at step S27, calculation of "the display number of the previous cycle1" is executed (step S29), and the process proceeds to step S30.
[0068]
08-05-2019
22
Then, after steps S24, S26, S28, and S29, the display number is stored in the display number of
the previous cycle (step S30), and it is determined whether or not to display (step S31). Is
displayed on the screen (step S32), and if not, the processing in this cycle is ended and the next
cycle is awaited. As described above, the microcomputer 44 performs periodic processing and
calculation as described above based on the table 52 stored in the ROM.
[0069]
Next, the determination in step S31 will be described. This determination is made by reading a
default value or user settings. Here, the user setting corresponds to the setting in the adjustment
setting means and the display setting means described above, and is performed in the following
procedure. First, as shown in FIG. 10, the user menu list 61 (image adjustment, sound
adjustment, main body setting, function switching) is displayed, and the user selects the sound
adjustment, and the item list 62 related to sound adjustment Display balance, surround, lively
voice, reset). When the user selects the sound quality adjustment ("lively voice" 62a) according to
the present invention from among them, as shown in FIG. 11 or 12, operation setting items
(setting items in adjustment setting means) 63 and display setting items 64 (setting item in the
display setting means) is displayed.
[0070]
As the operation setting item 63, for example, "OFF" 63a corresponding to the setting not
performing the sound quality adjustment according to the present invention, the sound quality
closer to speech (or non-speech) with or without determination of speech / non-speech A "fixed"
63b corresponding to the setting for adjustment and an "automatic" 63c corresponding to the
setting for automatically performing the speech / non-speech determination and the sound
quality adjustment based on the determination result are prepared. The speech sensor mark is
displayed when the "operation setting" is "automatic" 63c, and the speech sensor mark is not
displayed when the "fixed" 63b and the "off" 63a. As in the flow, it is preferable to read data even
when "OFF" 64a is set. On the other hand, as the display setting item 64, "no display" 64a and
"displayed" 64b are prepared, and the speech sensor mark is displayed only when "display
setting" is "displayed" 64b. Of course, reading the data every set period (for example, in units of
100 ms) and displaying the speech sensor mark at the lower part of the screen may be executed
only when "displayed" 64b is set.
08-05-2019
23
[0071]
With the above-described configuration and processing, in the present embodiment, when speech
/ non-speech is determined for the input audio signal, it is possible to allow the user to visually
recognize the determination result. By making the user visually recognize such a determination
result, it is possible to make the user understand the correct factor of the sound quality
adjustment being processed based on the determination result. In addition, the visual
confirmation also enables further user setting. In addition, by making monaural / stereo
judgment when judging speech / non-speech, judgment (speech / non-speech) not only from the
audio information of the audio signal but according to the purpose of the program (the program
including the audio signal) By simultaneously making judgments, it is possible to reduce
erroneous judgments of acoustic parameter control such as an equalizer based on the
characteristics of the input audio signal as much as possible, and to control the acoustic
parameters properly and adjust the sound quality properly. Also, for example, the purpose of the
program is determined by the monaural / stereo signal superimposed simultaneously with the
audio information on the audio signal, and it is determined whether the input audio signal is
speech or non-speech (music) according to the result. By optimizing the judgment criteria, free
control of speech / non-speech detection according to the content of the broadcasted program,
and control of equipment based on the control (for example, sound quality adjustment, separate
recording, etc.) are also possible. become.
[0072]
In addition, the content display device according to the present embodiment, for example, uses a
speech automatic detection function, and is provided with a display function capable of visually
recognizing whether a TV program or video / DVD is speech speech or non-speech speech. It is
possible to make the user visually recognize whether the content currently displayed is speech
speech or non-speech speech. That is, an audio system (speech / non-speech) such as a TV
program or a video / DVD can be visually understood in real time. Also, the above-mentioned
speech / non-speech determination may be applied to recording of content (including rerecording), in which case the content is displayed on the content display device via broadcast, via
a network, via a recording medium, etc. It is preferable to add a function to record or reserve and
record acquired content as well as acquired. For example, various recorders can use speech /
non-speech determination for CM determination and other separate recording, and at that time,
the user also determines whether the content corresponds to speech or non-speech It may be
displayed in a visible manner.
08-05-2019
24
[0073]
In addition, the content display devices such as the sound quality adjustment device 1 and the
television receiver 4 described above with reference to FIGS. 1 to 12 and the respective means
serving as their constituent elements may be configured by hardware as described above. May be
partially configured by software. For example, the program may be incorporated in a computer
as shown by the microcomputer in FIG. 5 or a general-purpose computer such as a PC, etc. With
respect to various processes in that case, the configuration of a general information processing
apparatus shown in FIG. This will be described with reference to an example. FIG. 13 is a block
diagram showing a typical configuration of the information processing apparatus. In the figure, 7
is an information processing apparatus, 71 is a CPU (central processing unit), 72 is a RAM
(random access memory), 73 is rewritable. A ROM 74 is an input device, 75 is a display device,
76 is an output device, and 77 is a bus.
[0074]
Further, a program for causing a computer to function as an apparatus and each means
according to the present invention, or a program for causing a computer to execute each
processing step is stored in the ROM 73 and executed by being read by the CPU 71. This
program when installed in a computer or the like is a program (a program that causes the
computer to function) that controls the CPU 71 or the like of the computer as the abovedescribed means. Information handled by the device and means according to the present
invention is temporarily stored in the RAM 72 at the time of processing, and thereafter stored in
the various ROMs 73, read out by the CPU 71 as needed, and correction / writing is performed.
Here, the information related to the present invention includes information of the item selected
by the user, a threshold value, and an audio signal when it is input and analyzed by audio signal
input means as one of the input devices 74. Further, for example, the setting may be maintained
by reading out the set value out of the setting options stored in the ROM 73 into the RAM 72.
[0075]
In addition, the process progress and results are presented to the device user through the display
device 75 such as LCD, PDP, organic EL, CRT, etc., and when user setting is required, an input
device such as a keyboard or mouse (pointing device) A user of the apparatus 74 may input
designation or select / input parameters necessary for processing from 74 (for example,
08-05-2019
25
designation of input audio signal or content including it, selection of various user setting items,
etc.). Also, the program may be provided with a graphical user interface (GUI) for the display
device 75 to facilitate use by the device user. Examples of GUIs are also illustrated in FIGS. 10-12.
The output device 76 includes a speaker which is an output device of an audio signal, a
communication device such as a network board for communication by connecting to a network,
and an output device for an output device such as a printing device. . The CPU 71, the RAM 72,
the ROM 73, the input device 74, the display device 75, and the output device 76 may be
connected by the bus 77 or the like.
[0076]
Further, specifically, the recording medium in which the program as described above is recorded
includes a CD-ROM, a magneto-optical disk, a DVD-ROM, an FD, a flash memory, and various
other ROMs (including rewritable ROMs) and RAMs. It is easy to realize this function by
recording the program that causes the computer to execute the functions of the above-described
embodiments of the present invention on such recording media and distributing it. Then, the
recording medium as described above is attached to an information processing apparatus such as
a computer and the program is read out by the information processing apparatus, or the program
is stored in the recording medium provided in the information processing apparatus. By reading
out accordingly, the function according to the present invention can be performed.
[0077]
FIG. 14 is a block diagram showing an example of the configuration of the sound quality
adjustment apparatus according to another embodiment of the present invention, in which 8 is a
sound quality adjustment apparatus, 80 is an audio signal input means, 81a is a musicity
detection means, 81b. Is a speech property detection means, 82 is a speech / non-speech
determination means, 83 is a speech / non-speech determination means, 85 is a sound quality
adjustment means, 86 is an audio signal output means, and 87 is a determination result display
means.
[0078]
The sound quality adjustment apparatus 8 according to the present embodiment includes a
music property detection unit 81a, a speech property detection unit 81b, a speech / non-speech
determination unit 82, a speech / non-speech determination unit 83, a sound quality adjustment
unit 85, an audio signal output unit 86, And the judgment result display means 87 shall be
08-05-2019
26
provided.
The sound / non-speech determining means 83 determines whether the sound signal input by
the sound signal input means 80 is in the sound state or the non-speech state. In the audio signal
input means 80, the input source and the input method are not limited. Further, in the presence /
non-speech determining means 83, for example, by detecting the signal level of the input audio
signal (such as setting the presence level above the predetermined level), it is determined
whether the state is the presence or the non-speech. Good. The noise / non-speech determining
means 83 may be configured entirely or partially of hardware or software.
[0079]
The sound quality adjustment means 85 makes the voice signal noise and silence based on the
determination result of the speech / non-speech determination means 82 (similar to that
described in FIG. 1 etc.) and the determination result of the speech / non-speech determination
means 83. Set different sound quality and adjust the sound quality based on the setting. The
sound quality adjustment means 85 may be configured entirely or partially with hardware or
software. Then, the sound quality setting at the time of silence by the sound quality adjustment
means 85 is performed by changing only a part of the sound quality setting at the time of sound
presence immediately before being determined as silence by the sound / non-speech
determination means 83. For example, in the case of silence, the output level of the
predetermined low band and the predetermined high band may be lowered by 1 to 2 dB as
compared to the case of speech. By changing only a part, adjustment will be made with the set
value close to the previous set value at the time of sound, and when transitioning from the silent
state to the sound state again, this state is the signal level close to the previous sound. Since it is
assumed that there is a state where the setting value is changed, it is possible to return quickly.
Note that this effect becomes more remarkable by configuring the setting of the sound quality
based on the sound quality adjustment means 85 by hardware. Then, the audio signal output unit
86 outputs the audio signal adjusted by the sound quality adjustment unit 85.
[0080]
In addition, although the music property detection means 81a, the speech property detection
means 81b, and the speech / non-speech determination means 82 are as described in FIG. 1, the
optimization of the threshold based on the monaural / stereo determination is not performed
here. An example is shown. The accuracy of the detection function can be improved by
08-05-2019
27
optimizing the determination criteria of the automatic speech detection function by monaural /
stereo determination. Also, the parameter corresponding to α in the calculation formula table 51
may be made different depending on the sound / no sound. Further, instead of the speech / nonspeech determination means 82, detailed time-series information of the content may be acquired
by EPG information, in which case the determination result is also displayed based on the
acquired information. It will be. Also, the arrangement of the speech / non-speech determination
means 82 is not limited to that shown in FIG. Then, based on the determination result of the
speech / non-speech determination means 82, the sound quality adjustment means 85 in this
embodiment may make the values of the partial change different between speech and nonspeech.
[0081]
The sound quality setting method here is arbitrary, and the setting value, the setting value of
increase / decrease, or the setting value in each frequency band may differ depending on speech
/ non-speech. For example, the sound quality setting in which the center frequency of the
equalizer such as the graphic equalizer and the Q value of the filter are fixed, or the sound
quality setting in which these can be changed as in the parametric equalizer may also be used.
The sound quality setting at the time of transition from sound to silence is partially changed to
that of the previous sound. Furthermore, the change of only the above part is, as exemplified in
the example of lowering the output level of the predetermined low band and the predetermined
high band in the case of silence, by 1 to 2 dB as compared to the case of speech. It is preferable
to change the output level to be reduced locally in the frequency band.
[0082]
Further, the determination result display means 87 is means for making the user visually
recognize the result of the speech / non-speech determination, but similarly, the user may be
made to visually recognize the presence / absence determination result.
[0083]
FIG. 15 is a flow chart for explaining an example of sound quality adjustment processing in the
sound quality adjustment device of FIG. 14, and FIG. 16 is a view showing an example of sound
quality setting equalization used in sound quality adjustment processing in the sound quality
adjustment device of FIG. is there.
08-05-2019
28
Here, FIG. 16A shows an example of speech and FIG. 16B shows an example of non-speech.
[0084]
It is assumed that the sound quality is initially set to the basic sound quality. Also, an example
will be described in which speech / non-speech is determined from the speech signal, and when
it is determined to be speech, the sound quality of A is set, and when it is determined to be nonspeech, the sound quality of B is set.
[0085]
First, the input level is confirmed by the sound / non-voice determination means 83 (step S41).
Here, if there is sound, the process proceeds to step S45, and if it is silent, the basic sound quality
is corrected (step S42), and the input level is confirmed again in step S41. In step S42, if it is
determined that the silent state in step S41 is a second time or later, the basic sound quality may
not be corrected. In this case, the setting is continued even if it is corrected again. deep. The
processes in steps S41 and S42 are processes before an audio signal is input and the sound
quality is first set to any one of the sound quality A / B, and thereafter, the setting change and
retention are performed in the processes after step S43. It will be carried out.
[0086]
Next, the musicity detection means 11a and the speech characteristics detection means 11b
execute the detection of musicity and the detection of speechiness (steps S43 and S44). The
order of steps S43 and S44 does not matter. Next, speech / non-speech is determined (step S45).
It should be noted that the determination criterion in speech / non-speech may be made by one
threshold process or by threshold process of multiple parameters. The sound quality is set and
adjusted based on the determination in step S45 (steps S46 and S47). In this sound quality
setting, when it is determined to be speech, the sound quality of A is selected to adjust the sound
quality (step S46), and when it is determined not to be speech, the sound quality of B is selected
to adjust the sound quality (step S47).
[0087]
08-05-2019
29
Here, an example of the difference between the sound quality setting A and the sound quality
setting B will be described with reference to FIG. In the case of the sound quality setting A
(speech), the frequency characteristic of the equalizer is set as indicated by the graph 91, and in
the sound quality setting B (non-speech), the frequency characteristic of the equalizer is set as
indicated by the graph 93. The difference between the graph 91 and the graph 93 is that in the
case of non-speech, in the vicinity of the predetermined low frequency 93a and in the
predetermined level compared to the output level in the vicinity of the predetermined low
frequency 91a and in the vicinity of the predetermined high frequency 91b. It emphasizes the
output level near the high frequency 93b of
[0088]
In the processes of steps S46 and S47, the selected sound quality is held, and then in step S48,
the determination result of the speech / non-speech as the original is displayed. Then, the speech
/ non-speech determining means 83 confirms the input level (step S49). Here, if there is a sound,
the processing is ended, and if it is silent, the sound quality is adjusted. The adjustment of the
sound quality performed here corrects the sound quality according to the previous state (step
S50). When the sound quality (sound quality before becoming silent) set and held is sound
quality A, the sound quality A 'as shown in the graph 92 of FIG. The sound quality is corrected to
B 'as in the graph 94 of FIG. The difference between the graph 92 and the graph 91 at the time
of speech is that the vicinity of the predetermined low frequency 91 a and the vicinity of the
predetermined high frequency 91 b are emphasized. Similarly, the difference between the nonspeech graphs 94 and 93 is that they emphasize the vicinity of the predetermined low frequency
93 a and the vicinity of the predetermined high frequency 93 b. In this embodiment, like the
sound quality A 'and B', when the automatic speech detection function is used, in addition to the
sound quality setting A and B when there is a sound, the sound quality setting for silence, that is,
when there is no voice input signal Alternatively, a sound quality setting is provided when the
input signal is small (background noise).
[0089]
Next, it is determined whether the silent state has returned to the sounded state (step S51). If it
does not return and remains silent, the setting at that time (such as the sound quality parameter)
continues without changing, and waits for return to the sounded state. On the other hand, if the
sound quality is restored, the sound quality A 'or the sound quality B' is returned to the sound
08-05-2019
30
quality setting A or B when there is a sound (step S52), and the process is ended.
[0090]
As described above, the following problems of the prior art can be solved by performing the
speech / non-speech determination as in the present embodiment. That is, in the prior art, it is
not only difficult to accurately adjust the sound quality by such an erroneous determination
caused by judging whether the music information is valid only from the audio information, but
also the signal or input level of the audio signal is silent. Is a small signal, low-high band noise is
output from the speaker. Even if the equipment is configured to perform sound quality
adjustment such as shutting out the input signal when the signal level is 0 or small in order to
eliminate such a situation, when the signal level rises and the voice is restored The sound quality
can not be set accurately and quickly. Such phenomena occur when loading a recording medium,
when switching to external input, when switching from speech to non-speech, when switching to
a channel to receive, or when transitioning from a CM to a main story This is particularly
problematic for audio signals such as when the level of the signal level changes rapidly.
[0091]
That is, according to the sound quality adjustment device according to the present embodiment,
it is possible to reduce the output of low-to-high range noise from the speaker at the time of
silence and to set the sound quality near the previous state. Quick response (sound quality
setting) becomes possible. That is, with this sound quality adjustment apparatus, it is possible to
perform sound quality setting such as to reduce noise output at the time of silence properly and
to return to the sounded state quickly even for an audio signal whose input level changes rapidly.
It becomes.
[0092]
According to the present embodiment, in addition to such effects, the determination (speech /
non-speech determination) not only from the audio information of the audio signal but also
according to the purpose of the program (the program including the audio signal) is
simultaneously made. Therefore, it is possible to reduce erroneous determination of control of
acoustic parameters such as an equalizer due to the characteristics of the input audio signal as
much as possible, and to control sound parameters accurately and adjust sound quality properly.
08-05-2019
31
When determining speech / non-speech with respect to the voice signal, it is possible to allow the
user to visually recognize the determination result. For example, the main point of the program is
determined by the monaural / stereo signal superimposed simultaneously with the audio
information on the audio signal, and the determination for determining whether the input audio
signal is speech or non-speech (music) according to the result By optimizing the criteria, free
control of speech / non-speech detection depending on the content of the broadcasted program,
characteristics, sound quality adjustment based on the control, and presentation of detection
results to the user are enabled.
[0093]
Further, the sound quality adjustment device 8 described above with reference to FIGS. 14 to 16
can also be incorporated in the content display device, similarly to the sound quality adjustment
device shown in FIG. Further, each means serving as a component in the sound quality
adjustment apparatus 8 or the content display apparatus may be configured by hardware, but a
part thereof may be configured by software. An example configured by incorporating a program
into a general-purpose computer such as a PC (personal computer) and an example of a
computer-readable recording medium storing the program are as described with reference to
FIG. The programs stored in are different. This program includes processing steps corresponding
to the respective means described above, ie, sound / non-speech determination step, speech /
non-speech determination step, sound quality adjustment step, and determination result display
step based on speech / non-speech determination. It is a program to make it run. Then, the sound
quality setting at the time of silence in the sound quality adjustment step is performed by
changing only a part of the sound quality setting at the time of sound presence immediately
before being judged as silence at the sound / silence determination step. Further, the sound
quality adjustment step in the case where the sound quality adjustment is performed by the
sound quality adjuster (hardware) is a step of performing control to cause the sound quality
adjustment device to adjust the sound quality of the audio signal based on the sound quality
setting.
[0094]
It is a block diagram showing an example of 1 composition of a sound quality adjustment device
concerning one embodiment of the present invention. It is a flowchart for demonstrating an
example of the sound quality adjustment process in the sound quality adjustment apparatus of
FIG. 1, and a determination result display process. It is a figure which shows an example of the
sound quality setting equalization used by the sound quality adjustment process in the sound
08-05-2019
32
quality adjustment apparatus of FIG. It is a figure which shows the example of a screen display in
the determination result display process of FIG. It is a block diagram which shows one structural
example of the television receiver which is one of the application examples in the sound quality
adjustment apparatus of FIG. It is a figure which shows an example of the calculation formula
table stored in the microcomputer in FIG. It is a figure which shows an example of the mark
display target table stored in the microcomputer in FIG. It is a flowchart for demonstrating the
speech / non-speech determination in the television receiver of FIG. 5, and a determination result
display process. It is a flowchart for demonstrating the determination result display process in
the television receiver of FIG. It is a figure which shows an example of the setting screen of the
determination result display in the sound quality adjustment apparatus of FIG. It is a figure which
shows an example of the setting screen of the determination result display in the sound quality
adjustment apparatus of FIG. It is a figure which shows an example of the setting screen of the
determination result display in the sound quality adjustment apparatus of FIG. It is a block
diagram showing an example of composition of a general information processor. It is a block
diagram which shows one structural example of the sound quality adjustment apparatus which
concerns on other embodiment of this invention. It is a flowchart for demonstrating an example
of the sound quality adjustment process in the sound quality adjustment apparatus of FIG. It is a
figure which shows an example of the sound quality setting equalization used by the sound
quality adjustment process in the sound quality adjustment apparatus of FIG.
Explanation of sign
[0095]
1, 8: sound quality adjustment device, 4: television receiver, 7: information processing device, 10,
80: audio signal input means, 11a, 81a: music property detection means, 11b, 81b: speech
property detection means, 12, 82 ... Speech / non-speech determination means, 13 ... monaural /
stereo determination means, 14 ... reference optimization means, 14a ... switches, 14b ... setting
means for threshold VSL 1, 14c ... setting means for threshold VSL 2, 15, 85 ... sound quality
Adjustment means 16, 86 Audio signal output means 17, 87 Judgment result display means 40
Tuner part 41 External input part 42 Main body operation part 43 Video processing IC 44
Microcomputer 45 Audio processing IC, 46: display, 47L, 47R: speaker, 48: light receiver, 49:
remote controller, 71: CPU, 72: RAM, 73: rewritable ROM, 74 ... input apparatus, 75 ... display, 76
... output device, 77 ... bus, 83 ... sound / silence decision unit.
08-05-2019
33
Документ
Категория
Без категории
Просмотров
0
Размер файла
55 Кб
Теги
jp2007072273
1/--страниц
Пожаловаться на содержимое документа