Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2013179388 Abstract: The present invention provides an acoustic signal emphasizing device that reproduces an acoustic signal of a sound source by accurately obtaining an estimated value of a direct ratio of an acoustic signal. A direct sound direction power estimation unit estimates the power of a direct sound direction signal obtained by performing a process of passing only a signal component coming from a direct sound source direction by a predetermined beam former realized by a microphone array. Get the value. The reverberation direction power estimation unit directly generates a direct source direction by using a signal component having the same directivity shape as that of the above-described beamformer and in which the main beam direction directly avoids the source direction. A power estimate value of the reverberation direction signal obtained by processing to pass through the signal components arriving from other sources is obtained. The in-between ratio estimation unit uses the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, and represents the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal Get DRR. [Selected figure] Figure 6 Acoustic signal enhancement device, perspective determination device, method thereof and program [0001] TECHNICAL FIELD The present invention relates to a technique for estimating the direct ratio of an acoustic signal. [0002] 11-04-2019 1 In the prior art shown in Patent Document 1, the sound reception signal of the microphone array is converted to the frequency domain to obtain the in-between ratio, and the power of the direct sound and the indirect sound is calculated using the spatial correlation matrix obtained from the signal. (For example, see paragraphs [0034] to [0061] of the first embodiment). [0003] JP, 2009-201724, A [0004] In the method disclosed in Patent Document 1, it is not possible to distinguish between direct sound and indirect sound coming from the same direction, so all sounds coming from the direction of the direct sound are judged to be direct sounds. As a result, there is a problem that the direct sound power is overestimated (or the indirect sound power is underestimated), and the finally obtained ratio of in-between is larger than the true value. [0005] The present invention has been made in view of such problems, and distinguishes reverberant sound coming from the direction of direct sound, and estimates the direct sound power and the reverberant sound power, so that it is more true than the conventional method. An acoustic signal emphasizing device and a perspective judgment device which reproduces an acoustic signal of a sound source with high accuracy by obtaining a direct-to-reverteration energy ratio (DRR) close to the value and based on the accurate direct-to-revolution energy ratio And, it aims to provide those methods and programs. [0006] An acoustic signal enhancement device according to the present invention includes a received sound power estimation unit, a direct sound direction power estimation unit, a reverberation direction power estimation unit, a subtraction unit, a direct ratio calculation unit, and a target signal adjustment unit. 11-04-2019 2 The received sound power estimation unit obtains a power estimated value of the frequency domain signal using a frequency domain signal obtained by converting a received sound signal received by a plurality of microphones included in the microphone array into a frequency domain. The direct sound direction power estimation unit is configured to perform a process of mainly passing a signal component that has directly arrived from the sound source direction to the frequency domain signal, and to the power estimation value of the direct sound direction signal or the sound reception signal obtained. A power estimation value of a direct sound direction signal obtained by converting a signal that has been processed to mainly pass signal components that have arrived directly from the sound source direction into a frequency domain is obtained. The reverberation direction power estimation unit mainly passes signal components that arrive directly from other than the sound source direction, with the same directivity shape as the processing that mainly passes signal components that arrive from the direct sound source direction of the direct sound direction power estimation unit. The power estimation value of the reverberation direction signal obtained by performing the processing, or the signal obtained by performing processing for passing the signal component mainly coming from other than the direct sound source direction to the sound reception signal is converted into the frequency domain Obtain a power estimate of the reverberant sound direction signal obtained. The subtractor outputs a direct sound power estimated value obtained by subtracting the power estimated value of the reverberation direction signal from the power estimated value of the direct sound direction signal. The inter-area ratio calculation unit uses the power estimation value of the frequency domain signal and the power estimation value of the reverberation sound direction signal, and represents the ratio of the power estimation value of the direct sound to the power estimation value of the reverberation sound direction signal Get The target signal adjustment unit obtains a processed signal by multiplying the processing target signal obtained from the sound reception signal by the gain according to the directness ratio estimated value. Then, the gain multiplied by the processing target signal whose ratio represented by the in-between ratio estimated value is larger than a predetermined threshold is larger than the gain multiplied by the processing target signal whose ratio is smaller than the predetermined threshold. [0007] Further, in the distance determination device of the present invention, the same as the abovedescribed acoustic signal enhancement device, the received sound power estimation unit, the 11-04-2019 3 direct sound direction power estimation unit, the reverberation sound direction power estimation unit, the subtraction unit, and the ratio calculation And a distance determination unit. The distance determination unit includes a determination value corresponding to the estimated directness ratio value obtained based on the sound reception signal received in the judgment section including one or more frames, and a judgment value more than the judgment section. The direct determination in the determination section is performed by comparison and determination using a plurality of reference values corresponding to the plurality of closeness ratio estimated values obtained based on the sound reception signal received in the reference section including the number of frames. Perform distance determination of sound source. [0008] The sound signal emphasizing device according to the present invention emphasizes the sound signal of the sound source by using the estimated quotient ratio obtained by the method relating to the estimated quotient ratio according to the present invention. The direct ratio estimation method is a new method that focuses on the isotropy of the arrival direction due to the strong diffusivity of the reverberation, and two or more beamformers with the same directivity shape realized by the microphone array Of the signals coming from the direct sound direction, the direct sound and the reverberation sound are distinguished, and their respective powers are correctly estimated. As a result, it is possible to improve the estimation accuracy of the direct ratio, which makes it possible to accurately emphasize the sound signal of the sound source. [0009] Further, since the distance determination device of the present invention determines the distance between the sound sources of sounds having different sounding times based on the estimated value of the estimated ratio obtained by the method of estimating the estimated ratio of the present invention Can also make an accurate judgment. [0010] The figure which shows an example of the scene which utilizes the acoustic signal emphasis apparatus 400. FIG. The figure which shows the propagation path of the sound indoors. The figure which shows the relationship between the ratio between direct and the distance between microphones. The figure 11-04-2019 4 which shows notionally the principle corresponding to each Example. It is a figure which shows two beamformers which have the same directivity shape, and the main beam was turned to a different direction, (a) is a beamformer which directed the beam to a sound source direction, (b) points a null to a sound source direction Shows a beamformer. FIG. 2 is a diagram showing an example of a functional configuration of the acoustic signal enhancement device 400 of the first embodiment. FIG. 7 is a diagram showing an operation flow of the acoustic signal enhancement device 400. FIG. 7 is a diagram showing an example of a functional configuration of a processing target signal generation unit 43. FIG. 7 is a diagram showing an example of a functional configuration of an in-between ratio calculation unit 44. FIG. 7 is a diagram showing an example of a functional configuration of a direct-to-interior ratio calculation unit 44 ′. The figure which shows typically the example of the directivity shape of each reverberation directivity formation part 44311-4431N. FIG. 7 is a diagram showing an example of a functional configuration of a direct-to-interior ratio calculation unit 44 ′ ′. FIG. 7 is a diagram showing an example of a functional configuration of a distance determination device 130 according to a second embodiment. The figure which shows the experimental condition of an effect confirmation experiment. The figure which shows the simulation result of direct ratio estimation. FIG. 2 is a diagram showing an example of a functional configuration of a direct-to-right ratio estimation device 160 [0011] Hereinafter, embodiments of the present invention will be described with reference to the drawings. The same reference numerals are given to the same components in the drawings, and the description will not be repeated. Also, in the following description, the symbols “¯”, “^”, etc. used in the text should be written directly above the letter immediately after the letter, but due to the restriction of the text notation, just before the letter Described in. In the formula, these symbols are described at their original positions. [0012] Before describing the embodiments, the principle corresponding to each embodiment will be described. [Principle] The acoustic signal enhancement device according to the first embodiment uses a single microphone array to enhance or suppress only the sound within a specific distance range from the microphone array to collect the sound of the sound source within a predetermined range. The purpose is to sound. The distance determination device of the second embodiment determines the distance between the sound source position of the sound reception signal. 11-04-2019 5 [0013] FIG. 1 illustrates a scene in which the acoustic signal enhancement device 400 of the first embodiment is used. For example, assume that a small microphone array 11 is being surrounded by four speakers 12 to 14, for example. It is assumed that a television 16, a telephone 17, and a speaker 18 for in-house broadcasting are arranged in the conference room. In such a scene, the utterers 12 to 14 are positioned within a predetermined distance range (in a circle indicated by a broken line) centering on the small microphone array 11 without collecting the sound of the indoor broadcast, the sound of a telephone call, etc. Want to pick up only the utterances of [0014] Therefore, in order to identify the distance from the microphone array to the sound source, attention is focused on the ratio of direct sound to indirect sound (also referred to as reverberation) included in the received sound (hereinafter, referred to as direct ratio). FIG. 2 shows a propagation path of sound from the sound source 21 to the microphone 22 when the microphone is placed indoors and sound is recorded. The direct sound is a sound wave indicated by a thick solid line which directly reaches from the sound source 21 to the microphone. One reverberation sound is a sound wave indicated by a broken line which reaches the microphone 22 after the sound emitted from the sound source 21 is reflected by a wall, a floor, a ceiling or the like. [0015] FIG. 3 shows the relationship between the in-plane ratio and the distance between microphones. The horizontal axis in FIG. 3 is the distance from the microphone to the sound source, and the vertical axis is the direct ratio. In general, indirect sound exhibits a constant magnitude that does not depend on the distance from the microphone. With respect to the indirect sound, the direct sound exhibits a monotonically decreasing characteristic as the distance from the microphone increases. The direct ratio divided by the indirect sound by the indirect sound becomes a characteristic that monotonously decreases with the increase of the distance as the direct sound. [0016] 11-04-2019 6 From this direct ratio, it is possible to estimate a predetermined distance range around the microphone array 11. Therefore, it becomes possible to emphasize only the acoustic signal from a desired sound source by using this direct ratio. [0017] FIG. 4 conceptually illustrates the principle of the direct ratio estimation of the present invention. It is generally known that reverberation can be assumed to be diffuse when reverberation is sufficient, and that reverberation can be modeled as sound arriving with the same magnitude from all directions when viewed from a microphone. When an arbitrary beam former BF1 is applied to the output signal of the small microphone array 11, the reverberation direction power 23 can be received with a predetermined directivity shape D1. Three arrows of the reverberation direction power 23 schematically represent the magnitude of the reverberation obtained by the directional shape D1. [0018] Assuming that the position of the sound source 21 is already known, the direct sound power 25 directly coming from the sound source 21 to the small microphone array 11 has the directivity shape D0 of the beam former BF0 as D1 and the directivity direction is By setting it as the sound source 21 direction, the direct sound direction power 26 including the reverberation direction power having the same magnitude as the reverberation direction power 23 can be received. [0019] The direct sound power 25 can be obtained by subtracting the reverberation direction power 23 from the direct sound direction power 26 including the same reverberation component as the reverberation direction power 23. Next, this principle will be described theoretically. [0020] 11-04-2019 7 <Isotropic Arrival Model of Reverberation> In the proposed method, a model considering the isotropy of reverberation is introduced. Here, although an example using a power spectral density or its estimated value as a power estimated value is described, this does not limit the present invention. [0021] When the sound reception signal at the m-th microphone of the microphone array consisting of M (M22) microphones is converted into the frequency domain by short-time Fourier transform or the like, the following frequency domain signal X <(m)> (ω, t) is obtained. [0022] [0023] Where ω is the frequency, HD <(m)> (ω) is the transfer function of direct sound from the sound source to the mth microphone, and HR <(m)> (ω) is the mth microphone from the sound source S (ω, t) is a signal obtained by converting the sound of the sound source into the frequency domain. t is the index of the time frame. [0024] Here, it is assumed that direct sound is coherent, while indirect sound is diffuse because its main component is reverberation. That is, when focusing on each direction of arrival, the direct sound only comes from the direction of the sound source, while the indirect sound has the property of coming with uniform power from all directions (hereinafter referred to as isotropy). In the proposed method, the direct sound power and the indirect sound power are estimated by focusing on the difference in these spatial arrival characteristics to obtain the directness ratio. 11-04-2019 8 [0025] As a precondition, the arrival direction of direct sound (hereinafter referred to as “direct sound source direction”) is known, and direct sound and indirect sound coming from any direction can be regarded as plane waves, and direct sound and indirect sound are Do not correlate with each other. At this time, transfer functions HD <(m)> (ω) and HR <(m)> (ω) of the direct sound and indirect sound from the sound source to the m-th microphone can be expressed as follows. [0026] [0027] Where HDref (ω) is the direct sound component of the transfer function from the sound source to the reference point (referred to as the “reference point”) of the microphone array, and HRref, θ (ω) is an indirect sound component in the direction θ viewed from the reference point is there. The reference point may be inside the microphone array or at any position of the microphone of the microphone array. [0028] Each of the transfer functions HD <(m)> (ω) and HR <(m)> (ω) of direct and indirect sounds are transfer function components from the sound source to the reference point, and from the reference point to the m th microphone And the phase difference component due to the propagation delay of Therefore, a microphone array input vector <→> x (ω, t) = [X <(1) whose element is frequency domain signal X <(m)> (ω, t) (m ∈ {1, ..., M}) )> (Ω, t),..., X <(M)> (ω, t)] <T> is represented by the following formula. T represents transposition. [0029] 11-04-2019 9 [0030] However, SD (ω, t) = HDref (ω) S (ω, t), SR, θ (ω, t) = HRref, θ (ω) S (ω, t). <→> aθ (ω) is an array manifold vector in the θ direction expressed by equation (5). Each element of the array manifold vector depends on the propagation delay τθ <(m)>. When direct sound and indirect sound can be regarded as plane waves, the propagation delay τθ <(m)> depends on the relative position and direction θ of each microphone with respect to the reference point of the microphone array. For details of the array, manifold, and vector, for example, reference 1: "Asano Ta," Array signal processing of sound-localization, tracking and separation of sound source (The Acoustical Society of Japan, Acoustic Technology Series) ", Corona Co., Ltd. Co., February 25, 2011, ISBN 978-4-339-01116-6, Chapter 1 (P1-26). [0031] When any beamformer (BF) is applied to this microphone array input, its output power spectral density (PSD) can be converted to a beamformer to the output power spectral density (PSD) of each of direct sound and indirect sound shown in equation (6) The power gain of (BF) | Dθ (ω) | <2> is the sum of multiplication. [0032] [0033] However, PD (ω) = E [| SD (ω, t) | <2>] t, PR, θ (ω) = E [| SR (ω, t) | <2>] t, <→> w (Ω) is the filter coefficient of the beamformer (BF), and R (ω) is a microphone array having Rij (ω) = E [Xi (ω, t) Xj <*> (ω, t)] t in the ij component It is an input signal space correlation matrix. E [•] represents expected value calculation. [0034] In the sound field where it is assumed that the indirect sound arrives isotropically in Equation (6), the reverberant sound power PR, θ (ω) is a constant not depending on the direction θ. It can be replaced by PR (ω), and the output power spectral density can be expressed by equation 11-04-2019 10 (7). [0035] [0036] Here, assuming that there are two beam formers BF0 and BF1 having the same directivity shape as shown in FIG. 5 and the main beams directed in different directions, the second term of the right side of equation (7) Dθ | Dθ (Ω) | <2> dθ is equal, and the output of each beamformer changes only by the first term of the right side, ie, the power gain of the beamformer for direct sound. [0037] Therefore, the output power spectral density P1 (ω) of the beam former BF1 directed null (point with low directivity sensitivity) in the sound source direction from the output power spectral density P0 (ω) of the beam former BF0 which directed the beam toward the sound source The direct sound power 25 can be determined by subtraction. [0038] [0039] According to the above principle, reverberant sounds coming directly from the sound source direction can be distinguished, and as a result, it is possible to improve the estimation accuracy of the direct ratio. [0040] FIG. 6 shows a functional configuration example of the acoustic signal enhancement device 400 of the first embodiment. The operation flow is shown in FIG. The acoustic signal enhancement apparatus 400 includes a microphone array 41, a plurality of 11-04-2019 11 frequency domain conversion units 421 to 42M, a processing target signal generation unit 43, an inter-area ratio calculation unit 44, a target signal adjustment unit 45, and inverse frequency domain conversion. And 46. Each functional component except the microphone array 41 is realized by, for example, a predetermined program being read into a computer including a ROM, a RAM, a CPU, and the like, and the CPU executing the program. [0041] The microphone array 41 comprises a plurality of microphones m1,. A plurality of frequency domain conversion units 421, ..., 42M receive the received signals xm (n) received by the plurality of microphones m1, ... mM, respectively, and convert the respective received signals into signals in the frequency domain ( Step S42). The frequency domain conversion units 421, ..., 42M sample the sound reception signal xm (n) at, for example, a sampling frequency of 16 kHz and convert it into a digital signal. For example, 256 samples are made into one frame, and discrete Fourier transform is performed in each frame. The conversion is performed to output the frequency component Xm (ω, t) (step S42). ω is a frequency and t is a frame number. The A / D converter for converting the sound reception signal xm (n) into a digital signal is omitted. [0042] The processing target signal generation unit 43 combines the signals Xm (ω, t) in the frequency domain output by the plurality of frequency domain conversion units 421, ..., 42 M to generate the processing target signal Y (ω, t) (step S43). [0043] 11-04-2019 12 The ratio calculation unit 44 receives the signal Xm (ω, t) in the frequency domain output by the plurality of frequency domain conversion units 421, ..., 42 m as an input ratio estimate value DRR (ω, t) of the reception signal Is calculated (step S44). A detailed operation description of the direct-to-interior ratio calculation unit 44 will be described later. [0044] The target signal adjustment unit 45 receives the processing target signal Y (ω, t) and the inbetween ratio estimated value DRR (ω, t) as inputs, and adjusts the amplitude of the processing target signal Y (ω, t) according to the values. The post-processing signal Z (ω, t) is generated (step S45). [0045] The inverse frequency domain conversion unit 46 converts the processed signal Z (ω, t) into a time domain signal z (n) (step S46). The operations from step S41 to step S46 are continued until all the sound reception signals xm (n) are finished. [0046] Here, adjustment according to the value of the in-between ratio estimate value DRR (ω, t) means threshold processing of DRR (ω, t) or the amplitude of the processed signal Z (ω, t) as the value is larger. And the processing such as reducing the amplitude of the processed signal Z (ω, t) as the value thereof becomes larger. Details will be described later. [0047] 11-04-2019 13 By the above operation, for example, only the sound within a specific distance range is emphasized by the microphone array, and the noise removal is performed to suppress and collect the sound outside the range. Hereinafter, the present invention will be described in more detail by showing a more specific functional configuration example of each part. [0048] [Processing Target Signal Generating Unit] FIG. 8 shows a more specific functional configuration example of the processing target signal generating unit 43. The processing target signal generation unit 43 includes a plurality of weight multiplication units 4311 to 431 M and an addition unit 432. The plurality of weight multiplying units 4311 to 431 M weight the respective frequency components X1 (ω, t),..., XM (ω, t) of the plurality of received signals xm (n) received by the M microphones. Multiply the coefficient wm (ω). [0049] For the weights used by the weight multiplying means 4311 to 431 M, for example, if M microphones are nondirectional, all frequency components X1 (ω, t),. The processing target signal Y (ω, t) is stabilized by taking an average of ω, t). Also, when M microphones have directivity, it is possible to use only the signal of a specific microphone by setting w1 = 1, wm = 0 (m = {2,..., M}) . For example, if the filter coefficients of weight beamforming are used by using the method as described in reference 2 “Oga, Yamazaki, Kanada,“ Acoustic system and digital signal processing ”published by the Institute of Electronics, Information and Communication Engineers” The microphone array can also form any directivity. [0050] The addition means 432 adds all the frequency components X1 (.omega., T),..., XM (.omega., T) multiplied by the weight, and outputs a processing target signal Y (.omega., T). [0051] A microphone may be installed separately from the microphone array at a position close to the sound source without using the addition means, and the sound collection signal of the installed microphone may be used as the processing target signal Y (ω, t). 11-04-2019 14 [0052] [In-Plane Ratio Calculation Unit] FIG. 9 shows an example of a functional configuration of the inplane ratio calculation unit 44. The in-between ratio calculating unit 44 includes a received sound power estimating unit 441, a direct sound direction power estimating unit 442, a reverberant sound direction power estimating unit 443, a subtracting unit 444, and an in-between ratio calculating unit 445. [0053] The reception sound power estimation unit 441 converts the reception signals received by the plurality of microphones included in the microphone array 41 into the frequency domain to obtain frequency domain signals X1 (ω, t), ..., XM (ω). , T) to generate and output a power estimation value of a frequency domain signal corresponding to a received signal. This power estimation value is a power estimation value of the frequency domain signal Xm (ω, t) corresponding to any one microphone m (mε {1,..., M}) as in equation (9). Alternatively, the power estimated values of the frequency domain signals X 1 (ω, t),..., X M (ω, t) may be weighted and averaged as shown in the equation (10). In the first embodiment, the power spectral density PX, L (ω) is determined as the power estimation value of the frequency domain signal corresponding to the sound reception signal. [0054] [0055] Here, L is the number of frames, and α m is a non-negative weight to the microphone m which is set to become equation (11). E [•] represents expected value calculation. 11-04-2019 15 [0056] [0057] Direct sound direction power estimation unit 442 is a direct signal obtained by performing processing for passing only signal components that directly come from the sound source direction to frequency domain signals X 1 (ω, t),..., X M (ω, t). Estimated power value PDD (ω) of the sound direction signal or a direct sound direction obtained by converting a signal obtained by processing only the signal component that has directly arrived from the sound source direction to the sound reception signal into the frequency domain Obtain a power estimate PDD (ω) of the signal. The power PDD (ω) of the direct sound direction signal is the same as P0 (ω) in the abovementioned equation (8). [0058] The direct sound direction power estimation unit 442 includes a directivity forming unit 4421 and a power estimation unit 4422. The directivity forming unit 4421 forms directivity so that a directional beam is directed in a predetermined direction, and outputs a signal that has passed the directivity. The directivity of the directivity forming unit 4421 is set so that the main beam of directivity is directed to the direct sound direction. As a method of directivity formation, for example, the delay-sum beam described in reference 1 (by Asano Ta, "Array signal processing of sound-localization, tracking and separation of sound sources" Corona, pp. 70-79)) A method such as forming can be used. [0059] When the output of the directivity forming unit 4421 is expressed as YBF (ω, t), the power estimated value PDD (ω) of the direct sound direction signal output from the power estimating unit 4422 is obtained by Expression (12). [0060] 11-04-2019 16 [0061] Also, the output power spectral density of the power estimation value PDD (ω) of the direct sound direction signal is expressed by equation (13). [0062] [0063] Here, | D0θ (ω) | <2> corresponds to the power gain of the beam former BF0 described in FIG. [0064] The reverberation direction power estimation unit 443 mainly has the same directivity shape as the process of mainly passing the signal component coming from the direct sound source direction of the direct sound direction power estimation unit 442 the signal component coming from other than the direct sound source direction. Power estimation value of reverberated sound direction signal obtained by processing to pass through, or signal obtained by processing to pass a signal component that comes from other than direct sound source direction directly to the sound receiving signal is converted to frequency domain To obtain a power estimate of the reverberant sound direction signal obtained. [0065] Ideally, the reverberation direction power estimation unit 443 includes a reverberation directivity formation unit 4431 and a reverberation power estimation unit 4432. The directivity of the reverberation directivity forming unit 4431 is set so that the main beam of directivity avoids the direct sound direction. The directivity shape is set to be the same as the directivity forming unit 4421. It is desirable to set the directivity shapes of the reverberation directivity forming unit 4431 and the directivity forming unit 4421 to be the same shape as much as possible. 11-04-2019 17 The setting of the directivity shape can be easily realized by the prior art. The estimation of the direction of the sound source is described, for example, in chapter 7.2 of reference 2 “Oga, Yamazaki, Kanada”, “Acoustic system and digital signal processing” published by the Institute of Electronics, Information and Communication Engineers. [0066] The reverberation power estimation unit 4432 receives the reverberation sound received so as to avoid the direct sound direction, and outputs a power estimation value PRD (ω) of the reverberation sound direction signal (Equation 14). The power estimation value PRD (ω) of the reverberation direction signal is received so as to avoid the direct sound direction, so by setting | D1, θD | <2> << 1, the direct sound component | D0, θ (ω) | <2> PD (ω) becomes sufficiently small. [0067] [0068] Here, | D1 (ω) | <2> corresponds to the power gain of the beam former BF1 described in FIG. [0069] From the power estimation value PDD (ω) of the direct sound direction signal output by the direct sound direction power estimation unit 442, the subtraction unit 444 calculates the power estimation value PRD (ω) of the reverberation sound direction signal output by the reverberation power estimation unit 4432. The direct sound power estimated value ^ PD (ω) resulting from the subtraction is output (equation (15)). [0070] [0071] 11-04-2019 18 The denominator of Expression (15) is used to normalize the direct sound power estimated value ^ PD (ω) by the difference between the power gain of each of the beam formers (BF) of the directivity forming unit 4421 and the reverberation directivity forming unit 4431. It is a term. [0072] The inter-period ratio calculation unit 445 uses the power spectral density PX, L (ω) and the direct sound power estimated value ^ PD (ω) output from the received sound power estimation unit 441, and calculates the direct sound power estimated value ^ PD (ω). Then, an in-between ratio estimated value DRR (ω) which is a ratio of power of the power estimated value of the reverberation direction signal is obtained (Equation (16)). [0073] [0074] Further, if the received sound power output from the received sound power estimation unit 441 is expressed by the equation (9) corresponding to any one microphone m (mε {1,. Can also be estimated by equation (17). [0075] [0076] Further, the in-plane ratio can also be estimated by equations (18) and (19) as the in-plane ratio not depending on the frequency. In addition, since it is a value calculated | required for every frame number L, although it describes with DRR ((omega)), the value calculated | required for every frequency for every one frame is described with DRR ((omega), t). [0077] [0078] 11-04-2019 19 The above-described direct-to-right ratio estimation method is a new method that focuses on the isotropic signal arrival of reverberant sound to a microphone array since it is a signal with strong diffusion. The direct sound component and the indirect sound component are correctly separated by obtaining a signal including direct sound and reverberation and a signal including only reverberation by two beamformers having the same directivity shape realized by the microphone array As a result, it is possible to improve the estimation accuracy of the direct ratio. [0079] Equations (16), (17), (18), and (19) may be estimated ratio DRR not represented in decibels as follows. [0080] [0081] [Modification 1] FIG. 10 shows an example of the functional configuration of the direct-to-IR ratio calculation unit 44 ′, which is a modification of the functional configuration of the reverberation sound direction power estimation unit 443 of the direct-inside ratio calculation unit 44. The ratio calculation unit 44 'calculates the reverberation sound direction power PRD (ω) by averaging reverberation sound direction powers PRD1 (ω) to PRDN (ω) of a plurality of (two or more) directivity directions. It is a thing. [0082] The reverberation direction power estimation unit 443 'of the in-between ratio calculation unit 44' includes two or more reverberation directivity formation units 44311 to 4431N, two or more reverberation power estimation units 44321 to 4432N, and a reverberation direction power 11-04-2019 20 calculation unit. It differs from the ratio calculation unit 44 in that the unit 4433 is provided. The direction of the main beam of the beam former of the reverberation directivity forming unit 44311 is, for example, the direction θ1 from the reference point. The direction of the main beam of the beam former of the reverberation directivity forming unit 44312 is the direction θ1, and the direction of the main beam of the beam former of the reverberation directivity forming unit 4431N is the direction θN. [0083] FIG. 11 schematically shows the directivity shape of each reverberation directivity forming unit 44311 to 4431N. The directivity shape of each of the reverberation directivity forming portions 44311 to 4431 N is different only in the direction θ of the main beam and the shape is the same. Reverberation sound power estimated values PRD1 (ω) to PRDN (ω) in each directivity direction by the reverberation power estimation units 44321 to 4432N connected to the signals that have passed through the directivity of each reverberation directivity formation unit 44311 to 4431N Is required. [0084] The reverberation direction power calculation unit 4433 calculates a reverberation direction power PRD (ω) by performing weighted averaging (equation 20) on a plurality of power estimated values PRD1 (ω) to PRDN (ω). [0085] [0086] 11-04-2019 21 Here, β n is a non-negative weighting coefficient, which is set in advance so as to satisfy equation (21). Since the reverberation direction power PRD (ω) determined in this manner is a value determined by averaging reverberation direction powers in a plurality of directions, the accuracy can be improved. As a result, the accuracy of the direct-to-right ratio estimated value DRR (ω) can be improved. [0087] [Modification 2] FIG. 12 shows a functional configuration example of the direct-to-inside ratio calculation unit 44 ′ ′ in which the functional configuration of the reverberation sound direction power estimation unit 443 of the direct-inside ratio calculation unit 44 is changed. The inter-period ratio calculation unit 44 ′ ′ is configured to be able to automatically set the direction of the main beam of the beam former of the directivity forming unit 4421 and the reverberation directivity forming unit 4431. [0088] The point-to-point ratio calculation unit 44 ′ ′ differs from the point-to-point ratio calculation unit 44 in that the point-to-point ratio calculation unit 44 ′ ′ includes a sound source direction estimation unit 446 and a beamformer generation unit 447. The sound source direction estimation unit 446 converts the received sound signals received by the plurality of microphones included in the microphone array 41 into the frequency domain to obtain frequency domain signals X 1 (ω, t),..., X M (ω, Based on t), the direction of the sound source is estimated and a sound source direction signal is output. The direction of the sound source can be determined, for example, from the phase difference of the frequency domain signals X1 (ω, t),..., XM (ω, t), etc. 11-04-2019 22 [0089] A beam former generation unit 447 receives a sound source direction signal, generates a beam former BF0 having a main beam in the sound source direction, and a beam former BF1 in which the main beam is set so as to avoid the sound source direction. The former BF0 is output to the direct sound direction power estimation unit 442, and the beam former BF1 is output to the reverberation sound direction power estimation unit 443. The directivity forming unit 4421 of the direct sound direction power estimation unit 442 applies the beam former BF0 and outputs the above-mentioned output signal YBF (ω, t). The reverberation direction power estimation unit 443 applies a beam former BF1 to output reverberation direction power PRD (ω). [0090] Thus, the direct-to-interior ratio calculation unit 44 ′ ′ can automatically set the directivity shape of the direct sound direction power estimation unit 442 and the reverberation sound direction power estimation unit 443. In the above, the operation of the direct-ratio-ratio calculation unit 44, 44 ', 44' 'has been described as an example of operating in the frequency domain, but the technical concept of the present invention including the modification is applied as it is to the time domain operation. It is possible. It is also possible to apply the concept of the direct-to-inside ratio calculation unit 44 ′ ′ to the direct-to-inside ratio calculation unit 44 ′. [0091] [Target Signal Adjustment Unit] The target signal adjustment unit 45 receives the processing target signal Y (ω, t) and the in-range ratio estimated value DRR (ω, t) as input, and generates an in-between ratio estimated value DRR (ω, t). In response, the amplitude of the processing target signal Y (ω, t) is adjusted, and the processed signal Z (ω, t) is generated and output. In other words, the target signal adjustment unit 45 multiplies the processing target signal Y (ω, t) by the gain (filter coefficient) according to the in-between ratio estimated value DRR (ω, t), and thereby 11-04-2019 23 the processed signal Z (ω , T) are generated and output (step S45). [0092] The magnitude of the gain determined in accordance with the in-plane ratio estimate value DRR depends on what distance range from the microphone array 41 the sound emitted from the direct sound source is to be enhanced. For example, when emphasizing the sound emitted from the direct sound source close to the microphone array 41, the ratio of the power estimation value of the direct sound to the power estimation value of the indirect sound represented by the direct ratio estimation value DRR is the first value. The gain by which the processing target signal is multiplied in some cases is larger than the gain by which the processing target signal is multiplied when the ratio is a second value smaller than the first value. For example, when emphasizing the sound emitted from the direct sound source far from the microphone array 41, the ratio of the power estimate of the direct sound to the power estimate of the indirect sound represented by the direct ratio estimate DRR is the first value. The gain G (ω, t) by which the processing target signal is multiplied in some cases is smaller than the gain by which the processing target signal is multiplied when the ratio is a second value smaller than the first value. [0093] The target signal adjustment unit 45 can be configured by, for example, a filter coefficient calculation unit 451 and a multiplication unit 452 (FIG. 6). The filter coefficient calculation unit 45 receives the direct current ratio estimated value DRR (ω, t) as input and calculates and outputs a filter coefficient G (ω, t). For calculation of the filter coefficient G (ω, t), for example, a binary filter using a threshold as shown in equations (22) and (23) is used. [0094] [0095] The threshold value Th1 can be set to any value between the minimum value and the maximum value of the in-between ratio estimated value DRR (ω, t). 11-04-2019 24 When the threshold value Th1 approaches the minimum value (0), the sound quality is improved. Conversely, when the threshold value Th1 approaches the maximum value, the noise suppression effect is enhanced, but the distortion of the sound reception signal becomes large and the sound quality is degraded. [0096] As described above, the threshold Th1 has a trade-off relationship between the sound quality and the noise suppression. Therefore, the threshold value Th1 is empirically determined according to the purpose of use in consideration of the trade-off relationship. [0097] In addition, as shown in the equations (24) and (25) when calculating the filter coefficient G (ω, t), if the temporal frequency band in which the in-between ratio estimated value falls below the threshold Th2 is emphasized, a specific distance can be obtained. It is possible to emphasize sound sources far from the range. [0098] [0099] Although a binary filter of 0 or 1 has been mentioned as an example of the filter coefficient G (ω, t), the filter coefficient G (ω, t) does not necessarily have to be 0 and 1, for example, 0.1 And 0.9 as long as they have sufficiently different values. [0100] Further, one or more real numbers may be set as the filter coefficient G (ω, t). That is, a gain G (ω, t) for amplifying the processing target signal Y (ω, t) may be determined. Further, a gain G (ω, t) (for example, a value of 0.1 or less) that largely suppresses the processing 11-04-2019 25 target signal Y (ω, t) may be determined. Further, instead of determining the gain G (ω, t) by the threshold determination, the estimated value of the in-between ratio or its function value may be used as the gain G (ω, t). For example, the gain G (ω, t) may be determined as in the following equations (26) to (29). [0101] [0102] However, F is a function such as a monotonically increasing function or a monotonously decreasing function. [0103] The filter coefficient G (ω, t) thus obtained is multiplied by the processing target signal Y (ω, t) in the multiplication means 452 to process the processed signal Z (ω, t) = G (ω, t) · Y (ω, t) is generated. Therefore, the post-processing signal Z (ω, t) can be composed of only the processing target signal Y (ω, t) having a large in-between ratio estimated value DRR (ω, t). That is, only direct sound can be extracted. [0104] As a second embodiment, a distance determination device 120 that determines the distance between sound sources using the estimated direct ratio DRR (ω, t) described in the first embodiment will be described. FIG. 13 shows a functional configuration example of the distance determination device 120. The distance determination device 120 includes a microphone array 41, a plurality of frequency domain conversion units 411 to 41m, a distance ratio calculation unit 44, and a distance determination unit 121. The microphone array 41, the plurality of frequency domain conversion units 411 to 41m, and the inter-area ratio calculation unit 44 are 11-04-2019 26 the same as those of the noise removal device 400. The perspective determination device 120 is also realized by, for example, a predetermined program being read into a computer including a ROM, a RAM, a CPU, and the like, and the CPU executing the program. [0105] The perspective determination device 120 determines whether the sound source of the sound received at a certain time is far or near when the sound sources at a plurality of different distances sound at different times. The distance determination unit 121 configuring the distance determination device 120 includes a frequency averaging unit 1210, an accumulation unit 1211, and a determination unit 1212. [0106] The distance determination unit 121 determines the determination value corresponding to the estimated value of the directness ratio obtained based on the sound reception signal received in the judgment section including one or more frames, and the number of frames more than the judgment section. The distance determination of the direct sound source in the determination section is performed by comparison and determination using reference values corresponding to a plurality of distance ratio estimated values obtained on the basis of the received signal received in the reference section. [0107] Frequency averaging means 1210 receives estimated range ratio DRR (ω, t) as input, averages the values in the frequency direction, and outputs estimated range average ratio EtEt (equation (30)). [0108] [0109] Here, K is the total number of frequency bins of the Fourier transform performed by the frequency domain transform units 421 to 42M. [0110] 11-04-2019 27 The accumulation means 1211 accumulates the frequency average in-phase ratio estimated value EtEt for the past L time frames, and outputs the comparison object in-between ratio estimated value EE. For the comparison target area ratio estimated value ^ E, for example, an average value ^ E = 1 / L 平均 t <L> of the accumulated average value of average frequency ratio ¯Et or an average value E of the minimum value and the maximum value ^ = 1/2 (max Et Et + min Et Et) or the like is used. [0111] The judging means 1212 compares the frequency average in-plane ratio equivalent value EtEt with the comparison target in-room ratio equivalent value EE and, when ¯Et> ^ E, indicates that the distance determination result Yl is close, for example, 1 If <El <^ E, the perspective determination result Yt outputs, for example, 0 representing that the distance is far. The perspective determination result Yt indicates whether the sound reception signal for the latest past L hours is a sound from a relatively near sound source or a sound from a relatively far sound source. [0112] By using this perspective determination result Yt, it is possible to divide the sound reception signal that is sequentially input according to the distance between the microphone and its sound source. That is, the sounds of a plurality of sound sources can be selected according to the distance from the microphone. [0113] [Experimental Results] For the purpose of confirming the effects of the present invention, simulation experiments using the mirror image method were performed. 11-04-2019 28 [0114] The simulation conditions are shown in FIG. FIG. 14 is a plan view, assuming a room with a width of 4 m, a depth of 6 m, and a height of 2.7 m. The sound absorption coefficient of the wall was set to α = 0.05 (reverberation time T60 = 1.8 seconds). The height of the reference point was 1.5 m using a microphone array in which eight microphones are arranged in a circle. The height of the sound source was also 1.5 m. [0115] Under these conditions, FIG. 15 shows the result of comparison between the measured value DRRactual (□) of DRR estimated from the impulse response, the present invention (▽), and the conventional method (○). The DRR (▽) estimated by the method of the present invention is closer to the measured value DRRactual (□) compared to the conventional method, and improves by about 3 dB particularly when the sound source is at a distance. [0116] In general, the power of the indirect component is constant regardless of the distance of the sound source, while the power of the direct component is inversely proportional to the square of the distance. Therefore, in the case of a distant sound source, the power of the direct component becomes smaller than that of the indirect component, and even if the error included in the estimated direct component is small, the DRR estimation result is greatly affected. According to the method of the present invention, the directivity control of the microphone array minimizes the influence of the signal coming from the sound source direction and finds the power of the indirect sound, so that more accurate estimation becomes possible, and the far sound source is DRR. Can be estimated correctly. 11-04-2019 29 [0117] As described above, the new direct-to-right ratio estimation method of the present invention is a new method that assumes that the reverberation sound arrives isotropically to the microphone array since it is a signal with strong diffusion. The direction of the sound source is the same as that of the beamformer in which the directivity shape realized by the microphone array is the same and the direction of the main beam is directly set to the sound source direction, and the beamformer set to direct the main beam directly The direct component and the indirect component coming from can be correctly separated, and as a result, it is possible to increase the accuracy of the estimated value of the direct ratio. [0118] In the above description, the direct ratio estimation method of the present invention has been described as an example incorporated in the acoustic signal enhancement device 400 or the distance determination device 130, but as shown in FIG. It may be configured as a directness ratio estimation device 160 that realizes only the above. In that case, the inter-area ratio estimation device 160 can be configured by the microphone array 41, a plurality of frequency domain conversion parts 421 to 42M, and the inter-area ratio calculation part 44. [0119] It should be noted that although an example in which the direct current ratio estimated value DRR is expressed in digital notation is shown in the equations (16) to (19), it is needless to say that the direct current ratio estimated value may be obtained by the ratio of power spectral density The value of the DRR represented by the above equation may be multiplied by any constant as the estimated ratio value, or the reciprocal of the DRR represented by the above equation may be multiplied by the constant It is good also as a direct ratio estimate. Also, the constant may be a monotonically increasing function value. That is, the estimated direct ratio DRR of the present invention is not limited to those represented by the above-described equations (16) to (19). [0120] Note that the processes described in the above method and apparatus are not only performed in chronological order according to the order of description, but also may be performed in parallel 11-04-2019 30 or individually depending on the processing capability of the apparatus that executes the process or the need. Good. [0121] Further, when the processing means in the above-mentioned device is realized by a computer, the processing content of the function that each device should have is described by a program. Then, by executing this program on a computer, the processing means in each device is realized on the computer. [0122] The program describing the processing content can be recorded in a computer readable recording medium. As the computer readable recording medium, any medium such as a magnetic recording device, an optical disc, a magneto-optical recording medium, a semiconductor memory, etc. may be used. Specifically, for example, as a magnetic recording device, a hard disk device, a flexible disk, a magnetic tape or the like as an optical disk, a DVD (Digital Versatile Disc), a DVDRAM (Random Access Memory), a CD-ROM (Compact Disc Read Only) Memory), CD-R (Recordable) / RW (Rewritable), etc. as magneto-optical recording medium, MO (Magneto Optical disc) etc., as semiconductor memory, EEP-ROM (Electronically Erasable and Programmable Only Read Memory) etc. It can be used. [0123] Further, this program is distributed, for example, by selling, transferring, lending, etc. a portable recording medium such as a DVD, a CD-ROM or the like in which the program is recorded. Furthermore, the program may be stored in a storage device of a server computer, and the program may be distributed by transferring the program from the server computer to another computer via a network. [0124] Further, each means may be configured by executing a predetermined program on a computer, 11-04-2019 31 or at least a part of the processing content may be realized as hardware. 11-04-2019 32
1/--страниц