Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2015164267 Abstract: The present invention provides a sound collection device that extracts sound waves from a desired sound source more reliably and outputs the sound waves to a speech recognition device. A sound collection device (11) comprises a first microphone array (121) in which a plurality of microphones for observing a sound wave coming from a desired sound source are linearly arranged, and a desired one parallel to the first microphone array. A plurality of microphones arranged linearly in front of the first microphone array as viewed from the sound source, and a plurality of microphones for observing sound waves coming from the desired sound source, and 122 Relationship between the sound wave from the desired sound source included in one sound wave signal and the sound wave from the desired sound source included in the plurality of second sound wave signals, and the sound source other than the desired sound source included in the plurality of first sound wave signals The sound wave coming from the desired sound source is extracted based on the relationship between the sound wave from the sound source and the sound wave from the sound source other than the desired sound source contained in the plurality of second sound wave signals. [Selected figure] Figure 1 Sound collecting device, sound collecting method, and program [0001] The present disclosure relates to a sound collection device and a sound collection method, and a program, and more particularly, to a sound collection device and a sound collection method, and a program capable of more reliably extracting a sound wave from a desired sound source. [0002] 11-04-2019 1 Heretofore, in a sound pickup apparatus that picks up sound by receiving sound with a microphone and capturing it as an electric signal, a technique for extracting (or emphasizing) a sound coming from a desired sound source is used in order to eliminate an adverse effect due to ambient noise. ing. For example, ambient noise has been a cause of a decline in speech recognition rate in speech recognition technology, a drop in intelligibility in a video conference system, and a lack of sense of reality. [0003] Therefore, in order to solve these causes, research has been conducted for some time using microphone array processing technology in which a plurality of microphones are arranged. For example, in microphone array processing technology, the incident angle at which the microphone array arrives at the sound from a desired sound source such as human voice, the sound of the air conditioner, the ambient speech, and the noise flowing from a television or radio Array processing is performed using different things. [0004] For example, Patent Document 1 discloses a technique for extracting a sound source signal or the like with a high signal-to-noise ratio by performing spatial Fourier transform domain processing on a signal collected by a two-dimensional planar microphone array. There is. [0005] However, under normal circumstances, noise is often behind the speaker. For example, assuming a vending machine that can be purchased by voice recognition, a car may come and go behind the purchaser of the vending machine. Sound, passing people's voices, street announcements, etc. will be an environment where sounds come from all directions. As described above, in an environment where sound and noise from a desired sound source come from the same direction, as in the prior art, microphone array technology that utilizes the fact that the incident angles from which those sounds arrive is different is desirable. It was difficult to 11-04-2019 2 extract the sound from the sound source. [0006] In addition, as one application of the conventional microphone array processing technology, for example, studies of acoustic holography have been conducted to determine which part such as an engine or a vehicle is generating a large amount of sound. [0007] For example, Patent Documents 2 and 3 disclose techniques for searching for the position of a sound source by improving the acoustic holography method. [0008] In these techniques, it is assumed how the sound physically propagates, and the propagation source is traced back to identify the sound source. In these techniques, a large number of microphones are prepared on a two-dimensional plane, or a method of acquiring an area by sweeping about two microphones or the like is adopted. [0009] JP, 2012-165273, A JP, 6-109528, A JP, 8-233931, A [0010] As described above, in the conventional microphone array processing technology, sound is extracted based on the arrival angle of the sound wave to the microphone array. For this reason, when the desired sound and the unnecessary sound such as noise come from the same direction, it is difficult to distinguish between the two, so even if the microphone array processing technology is applied, the desired sound is obtained. It was difficult to extract reliably. 11-04-2019 3 [0011] The present disclosure has been made in view of such a situation, and enables sound waves from a desired sound source to be extracted more reliably. [0012] A sound collection device according to one aspect of the present disclosure includes a predetermined number of sound waves coming from a desired sound source and a sound wave coming from a sound source other than the desired sound source and observed by a predetermined number of microphones arranged linearly. A first microphone array for outputting a first sound wave signal, and a linear arrangement of the first microphone array arranged substantially parallel to the first microphone array, in front of the first microphone array as viewed from the desired sound source A second microphone that outputs a predetermined number of second sound signals obtained by observing a sound wave coming from the desired sound source and a sound wave coming from a sound source other than the desired sound source with a predetermined number of microphones arranged in An array, a component of the sound wave from the desired sound source included in a predetermined number of the first sound signal and a component of a sound wave from the desired sound source included in a predetermined number of the second sound signal A relationship, and a sound wave component from a sound source other than the desired sound source included in a predetermined number of the first sound signal and a sound wave from a sound source other than the desired sound source included in a predetermined number of the second sound signal And an extraction processing unit that extracts a sound wave that comes from the desired sound source based on the relationship with the component of. [0013] A sound collection method or program according to one aspect of the present disclosure is obtained by observing a sound wave coming from a desired sound source and a sound wave coming from a sound source other than the desired sound source with a predetermined number of microphones arranged linearly. A first microphone array for outputting a number of first sound signals, and a plurality of first microphone arrays arranged substantially parallel to the first microphone array and in front of the first microphone array as viewed from the desired sound source; A second number of second sound wave signals are obtained, which are obtained by observing a sound wave coming from the desired sound source and a sound wave coming from a sound source other than the desired sound source with a predetermined number of microphones arranged linearly. A microphone array, a component of the sound wave from the desired sound source included in a predetermined number of the first sound signal, and the desired sound source included in a predetermined number of the second sound signal Of the sound wave from the sound source other than the desired sound source included in the 11-04-2019 4 predetermined number of the first sound signal and the desired sound source included in the predetermined number of the second sound signal The sound collection method of a sound collection device comprising a sound extraction device for extracting a sound wave coming from the desired sound source based on the relationship with the component of sound waves from other sound sources, or the computer of the sound collection device is made to execute In the program, in each of the microphones of the first and second microphone arrays, the sound wave from the desired sound source is observed as a spherical wave propagating while spreading in a spherical shape, and sound waves from sound sources other than the desired sound source are flat. Of a predetermined number of first frequency domain signals by performing fast Fourier transform on a predetermined number of the first sound signals, which are observed as plane waves propagating A predetermined number of second frequency domain signals are calculated by performing fast Fourier transformation on each of the predetermined number of the second sound wave signals, and calculating the predetermined number of the first frequency domain signals. A first wave number domain signal represented by a function having as an argument the wave number of the sound wave arriving at the first microphone array by performing a spatial Fourier transform according to the position of each microphone of the first microphone array By performing spatial Fourier transformation according to the position of each microphone of the second microphone array on a predetermined number of the second frequency domain signals, the sound wave arriving at the second microphone array A second wave number area signal represented by a function having a wave number as an argument is determined, and the first sound is calculated from the first wave number area signal and the second wave number area signal. Calculating a spherical wave component wave number domain signal in which the component of the spherical wave included in the wave signal or the second sound wave signal is space Fourier transformed. [0014] In one aspect of the present disclosure, a first microphone array having a predetermined number of microphones arranged linearly is obtained by observing a sound wave coming from a desired sound source and a sound wave coming from a sound source other than the desired sound source. A predetermined number of first sound signals are output. Also, a second microphone array having a predetermined number of microphones disposed substantially in parallel with the first microphone array and on the front side of the first microphone array as viewed from the desired sound source and disposed linearly A predetermined number of second sound wave signals obtained by observing a sound wave coming from a desired sound source and a sound wave coming from a sound source other than the desired sound source are output. 11-04-2019 5 Then, the relationship between the component of the sound wave from the desired sound source included in the predetermined number of first sound wave signals and the component of the sound wave from the desired sound source included in the predetermined number of second sound wave signals, Based on the relationship between the component of the sound wave from a sound source other than the desired sound source included in the sound signal of 1 and the component of the sound wave from the sound source other than the desired sound source contained in the predetermined number of second sound signals An incoming sound wave is extracted. [0015] According to one aspect of the present disclosure, sound waves from a desired sound source can be extracted more reliably. [0016] BRIEF DESCRIPTION OF DRAWINGS FIG. 1 is a block diagram illustrating a configuration example of an embodiment of a sound collection device to which the present technology is applied. It is a spatial sensitivity distribution figure of a single nondirectional microphone. It is a space sensitivity distribution figure in case the frequency of the sound extracted by the sound collection apparatus is 500 Hz. It is a space sensitivity distribution figure in case the frequency of the sound extracted by the sound collection apparatus is 1 kHz. It is a space sensitivity distribution figure in case the frequency of the sound extracted by the sound collection apparatus is 2 kHz. It is a flowchart explaining the process in which a sound collection apparatus extracts the sound from a desired sound source. Fig. 21 is a block diagram illustrating a configuration example of an embodiment of a computer to which the present technology is applied. [0017] 11-04-2019 6 Hereinafter, specific embodiments to which the present technology is applied will be described in detail with reference to the drawings. [0018] FIG. 1 is a block diagram showing a configuration example of an embodiment of a sound collection device to which the present technology is applied. [0019] As shown in FIG. 1, the sound collection device 11 is configured to include two microphone arrays 121 and 122 and a sound wave extraction processing unit 13. Then, the sound collection device 11 extracts (or emphasizes) the sound wave from the desired sound source from the sound wave including both the sound wave coming from the desired sound source and the sound wave coming from the sound source other than the desired sound source. [0020] Each of the microphone arrays 121 and 122 is similarly configured by linearly arranging a predetermined number of microphone elements. That is, the microphone array 121 is configured by arranging M microphone elements 211-1 to 211-M in a linear array, and the microphone array 122 includes M microphone elements 212-1 to 212-M. Arranged in a shape. [0021] Also, the microphone arrays 121 and 122 are arranged to be parallel to each other. The microphone array 122 is disposed on the front side of the microphone array 121 with respect to the desired sound source, for example, on the left side of the microphone array 121 when the desired sound source is on the left as shown in FIG. 11-04-2019 7 [0022] The microphone elements 211-1 to 211-M and the microphone elements 212-1 to 212-M convert mechanical vibrations of a diaphragm or the like generated by sound waves into electrical signals (sound signals). That is, the microphone elements 211-1 to 211-M and the microphone elements 212-1 to 212-M observe sound waves including both sound waves coming from a desired sound source and sound waves coming from a sound source other than the desired sound source Output a sound wave signal obtained. [0023] The sound wave extraction processing unit 13 includes M fast Fourier transform units 311-1 to 311 -M, M fast Fourier transform units 312-1 to 312 -M, two space Fourier transform units 321 and 322, and spherical wave extraction A processing unit 33, an inverse space Fourier transform unit 34, a signal determination unit 35, and an inverse fast Fourier transform unit 36 are provided. [0024] The fast Fourier transform units 311-1 to 311 -M perform fast Fourier transform on sound wave signals supplied from the microphone elements 211-1 to 211 -M, respectively, and the resulting frequency domain signals are subjected to space Fourier transform unit Supply to 321. Similarly, the fast Fourier transform units 312-1 to 312-M perform fast Fourier transform on the sound wave signals supplied from the microphone elements 212-1 to 212-M, respectively, and the resulting frequency domain signals are The signal is supplied to the Fourier transform unit 322. [0025] The spatial Fourier transform unit 321 performs spatial Fourier transform according to the positions of the microphone elements 211-1 to 211-M on the frequency domain signals supplied from the fast Fourier transform units 311-1 to 311-M, and the result The obtained wave number 11-04-2019 8 domain signal is supplied to the spherical wave extraction processing unit 33. Similarly, the spatial Fourier transform unit 322 performs spatial Fourier transform according to the positions of the microphone elements 212-1 to 212-M on the frequency domain signals supplied from the fast Fourier transform units 312-1 to 312-M. Then, the wave number domain signal obtained as a result is supplied to the spherical wave extraction processing unit 33. [0026] The spherical wave extraction processing unit 33 uses the wave number domain signal supplied from the spatial Fourier transform unit 321 and the wave number domain signal supplied from the spatial Fourier transform unit 322 to generate a wave number domain based on the sound wave signal arriving from the desired sound source. Extract the signal. [0027] Here, the sound waves observed in the sound collection device 11 will be described. For example, a desired sound source emitting a sound wave to be extracted by the sound collection device 11 is at a position near the sound collection device 11 and is a sound source other than the desired sound source (Referred to as a noise source) at a position farther than the desired sound source. At this time, in the sound collection device 11, the sound wave coming from the desired sound source is observed in the state of propagating while spreading in a spherical shape, and the sound wave coming from the noise source is sufficiently spread and propagates in a plane It is observed. That is, in the sound collection device 11, the sound wave from the desired sound source is observed as a spherical wave, and the sound wave from the noise source is observed as a plane wave. Therefore, the wave number domain signal supplied to the spherical wave extraction processing unit 33 includes both the component corresponding to the plane wave and the component corresponding to the spherical wave. [0028] Therefore, the spherical wave extraction processing unit 33 corresponds to the component corresponding to the plane wave included in the wave number domain signal supplied from the spatial Fourier transform unit 321 and the plane wave included in the wave number domain signal supplied from the spatial Fourier transform unit 322 And the component corresponding to the spherical wave included in the wave number domain signal supplied from the spatial Fourier 11-04-2019 9 transform unit 321 and the spherical wave included in the wave number domain signal supplied from the spatial Fourier transform unit 322 Based on the relationship with the corresponding component, processing of extracting the component of the spherical wave is performed. Then, the spherical wave extraction processing unit 33 supplies the component corresponding to the spherical wave included in the wave number domain signal, that is, the spherical wave component wave number domain signal obtained by subjecting the spherical wave to space Fourier transform to the inverse space Fourier transform unit 34 Do. [0029] The inverse spatial Fourier transform unit 34 performs inverse spatial Fourier transform on the spherical wave component wave number domain signal supplied from the spherical wave extraction processing unit 33, and the resultant spherical wave component frequency domain signal is determined by the signal determination unit 35. Supply to For example, when extracting a spherical wave observed in the microphone array 121, the inverse space Fourier transform unit 34 performs the inverse space Fourier transformation according to the positions of the microphone elements 211-1 to 211-M. A plurality of spherical wave component frequency domain signals are calculated according to the number of -1 to 211-M (i.e., M). [0030] Among the plurality of spherical wave component frequency domain signals determined by the inverse space Fourier transform unit 34, the signal determination unit 35 is a target to which the inverse fast Fourier transform unit 36 performs an inverse fast Fourier transform to output as a sound wave signal. To determine the spherical wave component frequency domain signal. For example, of the plurality of spherical wave component frequency domain signals determined by the inverse spatial Fourier transform unit 34, the signal determination unit 35 corresponds to the spherical wave corresponding to the microphone element 211 at an arbitrary position (for example, the center position). A component frequency domain signal is determined as an object to be subjected to inverse fast Fourier transform. [0031] The inverse fast Fourier transform unit 36 performs an inverse fast Fourier transform on the spherical wave component frequency domain signal determined by the signal determination unit 35, and the resulting acoustic wave signal of the spherical wave is obtained by a downstream device not shown (for example, , Voice recognition device, recording device, etc.). 11-04-2019 10 [0032] In the sound collection device 11 configured in this way, the sound wave from the desired sound source is selected from among the sound wave from the desired sound source near the microphone arrays 121 and 122 and the sound wave from the noise source far away. It can be extracted more reliably. As described above, by extracting the sound at any place from the plural sounds generated from the plural positions, the adverse effect due to the ambient noise is eliminated in the telephone, the videophone, the television relay, the speech recording, etc. For example, a decrease in speech recognition rate can be suppressed. [0033] Next, the process in which the spherical wave extraction processing unit 33 extracts a spherical wave will be described in detail using mathematical formulas. [0034] First, as shown in FIG. 1, the direction in which the microphone elements 21 are linearly arranged in the microphone arrays 121 and 122 is set as the X-axis direction. Also, the direction orthogonal to the X-axis direction, that is, the spacing direction of the microphone arrays 121 and 122 arranged in parallel is set as the Y-axis direction. In FIG. 1, the desired sound source and the microphone arrays 121 and 122 are arranged at the same position (height) in the Z direction with respect to the Z direction perpendicular to the paper surface, and the following description will be made. Then, the parameter in the Z direction is omitted. [0035] Further, the microphone elements 21 in the microphone arrays 121 and 122 are arranged at an interval dx in the X-axis direction, and the microphone arrays 121 and 122 are arranged at an 11-04-2019 11 interval dy in the Y-axis direction. Further, the desired sound source is arranged at the position (xr, yr) with the center position of the microphone array 121 as the reference (0, 0). In addition, a plane wave as noise flies from a distance from the desired sound source so as to be at an angle θ with respect to the X-axis direction. [0036] In the following description, sound waves are treated as being separated into single frequencies by Fourier transform or the like, and the function exp (jωt) representing single frequency sounds is omitted. Here, ω is an angular frequency, and the angular frequency ω and the frequency f have a relationship of ω = 2πf. In addition, when combining as a signal that spans the entire band, combining can be performed by performing inverse Fourier transform after processing for each frequency. [0037] The frequency domain signal D1 obtained by subjecting the sound wave signal observed by the microphone array 121 to the fast Fourier transform in the fast Fourier transform units 311-1 to 311-M is represented by the following equation (1). Be done. Similarly, a frequency domain signal D2 obtained by subjecting a sound wave signal observed by the microphone array 122 to fast Fourier transform in the fast Fourier transform units 312-1 to 312-M is expressed by the following equation (2): expressed. [0038] [0039] That is, as shown in the equation (1), the frequency domain signal D1 is a plane wave component frequency domain signal P1 in which the component of the plane wave included in the sound wave signal is fast Fourier transformed, and the component of the spherical wave included in the sound wave signal is high speed It is expressed as that on which the Fourier-transformed spherical wave component frequency domain signal Q1 is superimposed. 11-04-2019 12 Similarly, as shown in the equation (2), the frequency domain signal D2 is a plane wave component frequency domain signal P2 in which the component of the plane wave included in the sound wave signal is fast Fourier transformed, and the component of the spherical wave included in the sound wave signal It can be expressed as a superposition of the fast Fourier transformed spherical wave component frequency domain signal Q2. [0040] Here, in the equation (1), the first argument represents the position x in the X-axis direction of each of the microphone elements 211-1 to 211 -M that constitute the microphone array 121. Similarly, in the equation (2), the first argument represents the position x in the X-axis direction of each of the microphone elements 212-1 to 212 -M that constitute the microphone array 122. In the equations (1) and (2), the second argument represents the position of each of the microphone arrays 121 and 122 in the Y-axis direction. Here, the position of the microphone array 121 is used as a reference in the Y-axis direction, and the position of the microphone array 121 in the Y-axis direction is 0 as shown in equation (1), and microphone array 122 as shown in equation (2). The position in the Y-axis direction of is the spacing dy. [0041] The plane wave component frequency domain signals P1 and P2 have a parameter A representing the magnitude of the sound from the noise source, a trace wavelength kx in the X axis direction, and the plane wave coming from the angle θ as shown in FIG. The trace wavelength ky in the Y-axis direction can be expressed by the following equations (3) and (4). [0042] [0043] Here, the trace wavelength kx in the X-axis direction is expressed by the following equation (5) with the wave number k = ω / c (ω: angular frequency, c: sound velocity) determined by the frequency of the sound Have. Further, the trace wavelength ky in the Y-axis direction has the relationship shown in the following equation (6) between the trace wavelength kx in the X-axis direction and the wave 11-04-2019 13 number k. [0044] [0045] On the other hand, the spherical wave component frequency domain signal Q1 has a parameter B representing the magnitude of the sound from the desired sound source and the desired sound source, as shown in FIG. 1, from the arrival of the spherical wave from the position (xr, yr). The distance r1 (x) to the microphone element 211 at the position x can be expressed by the following equation (7). Similarly, using the parameter B representing the magnitude of the sound from the desired sound source and the distance r2 (x) from the desired sound source to the microphone element 212 at position x, the spherical wave component frequency domain signal Q2 is It can be expressed by equation (8). [0046] [0047] Further, the distance r1 (x) and the distance r2 (x) are expressed by the following equations (9) and (10) using the position (xr, yr) of the desired sound source. [0048] [0049] By the way, as shown in the equations (1) and (2), in the frequency domain signals D1 and D2 obtained by fast Fourier transform of the sound signals observed by the microphone arrays 121 and 122, the plane wave and the spherical wave are mixed. It is a waveform. 11-04-2019 14 Therefore, in order to extract the spherical wave component frequency domain signal Q1 or Q2 from the frequency domain signals D1 and D2 in the sound wave extraction processing unit 13, first, the spatial domain transform of the frequency domain signals D1 and D2 along the X-axis direction Will be applied. When performing such space Fourier transform, the distance dx in the X-axis direction between the microphone elements 21 needs to be constant. Note that the frequency domain signals D1 and D2 are windowed with a Hanning window or the like in the same manner as normal Fourier transform processing to reduce the level of the signal observed by the microphone elements 21 at both ends, and both ends are smooth. Can be processed to lead to outside of the section. [0050] Here, the spatial Fourier transform along the X-axis direction is defined as the following equation (11). [0051] [0052] However, Equation (11) represents, for example, a wave number domain signal S ′ (x, y) obtained by performing space Fourier transform on the frequency domain signal S (x, y) along the X-axis direction. In 11), M is the number of the microphone elements 21 arranged in the Xaxis direction. Further, in the equation (11), k ′ x is a variable when performing inverse Fourier transform on the trace wavelength kx in the X-axis direction, and can take any value. [0053] Thus, defining the spatial Fourier transform along the X-axis direction, the wave number domain 11-04-2019 15 signal D1 ′ obtained as a result of performing the spatial Fourier transform on the frequency domain signal D1 is expressed by the following equation (12) . Also, a wave number domain signal D2 'obtained as a result of performing the space Fourier transform on the frequency domain signal D2 is expressed by the following equation (13). [0054] [0055] Similarly, a plane wave component wave number domain signal P1 'obtained as a result of the space Fourier transform being performed on the plane wave component frequency domain signal P1 is expressed by the following equation (14). Further, a plane wave component wave number area signal P2 'obtained as a result of performing the space Fourier transformation on the plane wave component frequency area signal P2 is expressed by the following equation (15). [0056] [0057] Here, in the equations (14) and (15), δ is a delta function. [0058] Furthermore, a spherical wave component wave number domain signal Q1 'obtained as a result of performing space Fourier transformation on the spherical wave component frequency domain signal Q1 is expressed by the following equation (16). Further, the spherical wave component wave number domain signal Q2 ′ obtained as a result of performing the space Fourier transformation on the spherical wave component frequency 11-04-2019 16 domain signal Q2 is expressed by the following equation (17) using the position yr of the desired sound source in the Y axis direction. It is represented by). [0059] [0060] Here, in the equations (16) and (17), H0 <(2)> is a second-order Hankel function of zeroth order, and K0 is a modified Bessel function of zeroth order. Further, as shown in the equations (16) and (17), the spherical wave component wave number domain signals Q1 ′ and Q2 ′ differ according to the magnitude relationship with the variable k′x and the absolute value of the wave number k. It becomes. [0061] Next, the relationship between the component of the plane wave of the sound wave observed by the microphone array 121 and the component of the plane wave of the sound wave observed by the microphone array 122, and the component of the spherical wave of the sound wave observed by the microphone array 121 The relationship with the component of the spherical wave of the sound wave observed by the microphone array 122 is considered. That is, when the parameter A representing the magnitude of the sound from the noise source is eliminated from the above equations (14) and (15), the following equation (18) holds. Similarly, when the parameter B representing the magnitude of the sound from the desired sound source is eliminated from the above equations (16) and (17), the following equation (19) holds. [0062] [0063] 11-04-2019 17 And based on the relationship shown to Formula (18) and Formula (19), Formula (13) mentioned above can be represented by following Formula (20). [0064] [0065] Here, in the equation (20), G0 is a zero-order second-order Hankel function H0 <(2)> according to the magnitude relation between the absolute value of the variable k'x and the absolute value of the wave number k as described above It is a function to replace one of the zeroth-order modified Bessel function K0. [0066] Then, using the equation (20) and the above equation (12), the spherical wave extraction processing unit 33 calculates a spherical wave component by computing the following equation (21). Wave number domain signal Q1 'can be extracted. [0067] [0068] Thereafter, the inverse space Fourier transform unit 34 performs an inverse space Fourier transform on the spherical wave component wave number region signal Q1 ′ obtained as a result of the spherical wave extraction processing unit 33 computing equation (21) to obtain a spherical wave component. The frequency domain signal Q1 can be determined. Then, the inverse fast Fourier transform unit 36 performs an inverse fast Fourier transform on the spherical wave component frequency domain signal Q1 obtained by the inverse space Fourier transform unit 34 to obtain the component of the spherical wave contained in the sound wave signal. Can be output. [0069] As described above, the sound wave extraction processing unit 13 extracts the component of the spherical wave based on the relationship between the plane waves included in the sound waves 11-04-2019 18 observed by the microphone arrays 121 and 122 and the relationship between the spherical waves, ie, , Sound waves from the desired sound source can be extracted. [0070] By the way, as described above, among the plurality of spherical wave component frequency domain signals obtained by the inverse spatial Fourier transform unit 34, the spherical wave component frequency domain signal to be subjected to the inverse fast Fourier transform by the inverse fast Fourier transform unit 36 Is determined by the signal determination unit 35. At this time, the signal determination unit 35 determines the spherical wave component frequency domain signal corresponding to the microphone element 21 at an arbitrary position as a target of the inverse fast Fourier transform, for example, each of a plurality of spherical wave component frequency domain signals. The signals summed after the timing may be determined as a target of the inverse fast Fourier transform. That is, in the microphone element 21 linearly arranged from the desired sound source, the spherical wave that spreads in a spherical shape is delayed more than the microphone element 21 closer to the desired sound source in the microphone element 21 farther from the desired sound source Observed. [0071] Therefore, for example, the signal determination unit 35 assumes a position of a desired sound source, and configures a delay-and-sum array in which the delay from the position until the sound wave reaches each of the microphone elements 211-1 to 211-M is considered. And the signal determination part 35 can determine the signal which took the sum after compensating each delay as object of an inverse fast Fourier transform. [0072] 11-04-2019 19 Here, the delay and sum array will be described. For example, assuming that the desired sound source is located at the position (xr, yr), the distance r1 (x) from the desired sound source to each of the microphone elements 211-1 to 211M is given by the following equation (22) expressed. Further, the time difference τ (x) for reaching each of the microphone elements 211-1 to 211-M from the desired sound source is expressed by the following equation (23) using the sound velocity c. [0073] [0074] Therefore, the signal determination unit 35 determines the inverse time transfer function exp (jωτ (x)) that compensates for the time difference τ (x) according to the position of each of the microphone elements 211-1 to 211-M. A wave number domain signal Q1 'can be multiplied at each corresponding position x to calculate a sum signal, which can be determined as a target of inverse fast Fourier transform. [0075] As described above, the signal determining unit 35 uses the delay-and-sum array to calculate the spherical wave component wave number domain signal Q1 ′ to be subjected to the inverse fast Fourier transform in order to output as a sound wave signal. It can be amplified, for example, to improve the signal-to-noise ratio. [0076] Now, assuming that the function representing the relationship between the microphone arrays 121 and 122 for the plane wave in the above equation (20) is a relationship function E (k'x, dy), the relationship function E (k'x, dy) is It is expressed by the following equation (24). [0077] [0078] 11-04-2019 20 Similarly, assuming that the function representing the relationship between the microphone arrays 121 and 122 for the spherical wave in the above equation (20) is a relationship function F (k'x, dy), the relationship function F (k'x, dy) Is expressed by the following equation (25). Then, according to the magnitude relationship between the absolute value of the variable k'x and the absolute value of the wave number k, using the function G0 to replace the second kind Hankel function and the modified Bessel function, the relationship function F (k'x, dy) Is expressed by the following equation (26). [0079] [0080] Therefore, from the equation (24) and the equation (26), the above equation (20) can be expressed as the following equation (27). [0081] [0082] Then, based on simultaneous equations of this equation (27) and the above equation (12), the spherical wave component wave number region signal Q1 'can be expressed as the following equation (28). [0083] [0084] Here, as described above, the relation function E (k'x, dy) represents the relation between the microphone arrays 121 and 122 for plane waves, and the relation function F (k'x, dy) is It represents the relationship between the microphone arrays 121 and 122 for spherical waves. 11-04-2019 21 Therefore, as long as the relationship between the microphone arrays 121 and 122 is fixed, the relationship does not change, and the relationship function E (k'x, dy) and the relationship function F (k'x, dy) As an output value obtained by computing 値, a value obtained by prior observation or the like can be used. [0085] Specifically, in a state in which only a plane wave is observed in advance using the sound collection device 11, a plane wave arriving from any angle θ is observed, and variables k′x for each angle θ are in the plane wave component wave number region The output value of the relational function E (k'x, dy) can be obtained by inputting to the signals P1 'and P2' and calculating the following equation (29). Similarly, in a state where only the spherical wave is observed in advance using the sound collection device 11, the spherical wave coming from every angle θ and the position yr is observed, and the variable k′x and the position for each angle θ The output value of the relational function F (k′x, dy) can be determined by inputting yr to the spherical wave component wave number domain signals Q1 ′ and Q2 ′ and operating the following equation (30). [0086] [0087] Then, the spherical wave extraction processing unit 33 holds the output values of the relation function E (k'x, dy) and the relation function F (k'x, dy) thus obtained in advance, By calculating (28), the spherical wave component wave number region signal Q1 'can be determined. [0088] Thus, in the sound collection device 11, when observing the sound wave to be extracted, the relational function E (k'x, dy) and the relational function F obtained by calculating the abovementioned equations (24) and (26) Besides using (k'x, dy), from the desired sound source using the output values of the relationship function E (k 'x, dy) and the relationship function F (k' x, dy) obtained by prior measurement etc. Sound waves may be extracted. 11-04-2019 22 [0089] Next, with reference to FIGS. 2 to 5, an effect of extracting a desired sound source by the sound collection device 11 will be described using a spatial sensitivity distribution chart. The spatial sensitivity distribution chart indicates how large the sound at the position (x, y) can be collected when the sound is collected by the microphone. [0090] FIG. 2 shows the spatial sensitivity distribution of a single omnidirectional microphone. In FIG. 2, the microphones are arranged at the position (0, 0), and the spatial sensitivity distribution map is represented as a contour map. In the space sensitivity distribution chart of FIG. 2, for example, when the sound source is at a distance of 1.5 m from the microphone, it is shown that the sound is collected as a small sound at -30 dB. [0091] FIGS. 3 to 5 show spatial sensitivity distribution maps of the sound collection device 11. In these spatial sensitivity distributions, the microphone arrays 121 and 122 are arranged at intervals of 10 cm (interval dy = 10 cm), and in the microphone arrays 121 and 122, 64 microphone elements 21 are 5 cm in the X-axis direction. It is calculated | required by computer simulation noting that it arrange | positions by space | interval (space | interval dx = 5 cm). The center of the microphone array 121 is assumed to be position (0, 0). 11-04-2019 23 [0092] Also, since the spatial sensitivity distribution has different characteristics for each frequency, it is shown for frequencies (500 Hz, 1 kHz, 2 kHz) that are important for human voice as a representative frequency. That is, FIG. 3 shows a spatial sensitivity distribution chart when the frequency of the collected sound is 500 Hz. Further, FIG. 4 shows a space sensitivity distribution chart when the frequency of the collected sound is 1 kHz, and FIG. 5 shows a space sensitivity distribution map when the frequency of the collected sound is 2 kHz. It is shown. [0093] As shown in FIGS. 3 to 5, the sound collection device 11 can pick up sound from a sound source located at a position distant from a predetermined distance as being small. For example, as shown in FIG. 3, when the frequency of the sound to be collected is 500 Hz, the sound collection device 11 has a sound volume that is reduced by -30 dB or more for a sound source located at a distance of 0.75 m. It can be picked up as Thus, it is shown that only the sounds close to the microphone arrays 121 and 122 can be extracted even if the close sounds and the distant sounds are mixed. As shown in FIGS. 4 and 5, when the frequencies of the collected sounds are 1 kHz and 2 kHz, only the sounds close to the microphone arrays 121 and 122 can be extracted with a larger difference. [0094] As described above, the sound collection device 11 can suppress noise (plane wave) coming from a distance and extract only sound (spherical wave) coming from a desired sound source near the 11-04-2019 24 microphone arrays 121 and 122. [0095] Next, FIG. 6 is a flowchart for explaining a process of the sound collection device 11 of FIG. 1 extracting a sound from a desired sound source. [0096] For example, sound waves are observed by the microphone arrays 121 and 122, and sound wave signals are supplied from the microphone elements 211-1 to 211-M to the fast Fourier transform units 311-1 to 311-M, and the microphone elements 212-1 to 212- Processing is started when a sound signal is supplied from M to the fast Fourier transform units 312-1 to 312-M. [0097] In step S11, the fast Fourier transform units 311-1 to 311-M perform spatial Fourier transform unit 321 on frequency domain signals obtained by performing fast Fourier transform on sound wave signals supplied from the microphone elements 211-1 to 211-M. Supply to Similarly, the fast Fourier transform units 312-1 to 312 -M transmit, to the spatial Fourier transform unit 322, frequency domain signals obtained by performing fast Fourier transform on sound wave signals supplied from the microphone elements 212-1 to 212 -M. Supply. [0098] In step S12, the spatial Fourier transform unit 321 performs spatial Fourier transform according to the positions of the microphone elements 211-1 to 211-M with respect to the frequency domain signals supplied from the fast Fourier transform units 311-1 to 311-M. The applied wave number domain signal is supplied to the spherical wave extraction processing unit 33. Similarly, the spatial Fourier transform unit 322 performs spatial Fourier transform according to the position of the microphone elements 212-1 to 212-M on the frequency domain signals supplied from the fast Fourier transform units 312-1 to 312-M. The wave number domain signal is supplied to the spherical wave extraction processing unit 33. 11-04-2019 25 [0099] In step S 13, the spherical wave extraction processing unit 33 uses the wave number domain signal supplied from the spatial Fourier transform unit 321 and the wave number domain signal supplied from the spatial Fourier transform unit 322 to obtain a sound wave signal that has arrived from the desired sound source Extract a wave number domain signal based on For example, the spherical wave extraction processing unit 33 can extract the spherical wave component wave number region signal Q1 ′ included in the sound wave signal output from the microphone array 121 by computing the following equation (21) described above . [0100] In step S14, the inverse space Fourier transform unit 34 performs inverse space Fourier transform according to the positions of the microphone elements 211-1 to 211-M on the spherical wave component wave number area signal supplied from the spherical wave extraction processing unit 33. To calculate a plurality of spherical wave component frequency domain signals. [0101] In step S <b> 15, the signal determination unit 35 obtains spherical wave component frequency domain signals obtained using the delay sum array as described above for the plurality of spherical wave component frequency domain signals obtained by the inverse space Fourier transform unit 34. In order to output as a sound wave signal, the inverse fast Fourier transform unit 36 determines as a target to be subjected to the inverse fast Fourier transform. [0102] In step S16, the inverse fast Fourier transform unit 36 performs inverse fast Fourier transform on the spherical wave component frequency domain signal determined by the signal determination unit 35, and outputs the acoustic wave signal of the spherical wave obtained as a result. [0103] 11-04-2019 26 As described above, the sound wave extraction processing unit 13 can extract the component of the spherical wave included in the sound wave signal observed by the microphone arrays 121 and 122, that is, the sound wave from the desired sound source. In other words, in the sound collection device 11, by introducing the concept of acoustic holography on the basis of a linear microphone array that has been used so far to extract human voice, it is not a two-dimensional plane, but two in front of and behind The arrayed microphone arrays 121 and 122 can be used to extract sound waves from the desired sound source. [0104] In particular, in the sound collection device 11, even when the desired sound source and the noise source are in the same direction, the sound from the desired sound source can be extracted based on the relationship between the spherical wave and the plane wave. Specifically, the voice recognition rate of the vending machine can be improved by adopting the sound collecting device 11 as a vending machine and, for example, picking up only utterances close to the vending machine. In addition, the sound collection device 11 can also be used by a store clerk to hear a voice, for example, in a speech order such as a drive-through. Furthermore, by adopting the sound collection device 11 in a video conference or the like, it is possible to capture the voices of only the participants and not to collect the sound at a certain distance. [0105] As described above, the sound collection device 11 can extract the sound at any place from the plural sounds generated from the plural positions, such as a telephone, a videophone, a television relay, a conversation recording, etc. It can be applied to the technology of collecting sound. [0106] 11-04-2019 27 In the present embodiment, the description has been made assuming that the desired sound source is at a position near a predetermined distance from the sound collection device 11. However, for example, it is assumed that the desired sound source is at a distant position away from A sound from a distant desired sound source may be extracted. That is, instead of calculating the above equation (21) to obtain the spherical wave component wave number domain signal Q1 ′, the plane wave component wave number domain signal P1 is obtained in simultaneous equations using the above equations (12) and (20). By obtaining ', it is possible to extract a plane wave coming from a distant desired sound source. [0107] Note that the processes described with reference to the above-described flowchart do not necessarily have to be processed in chronological order according to the order described as the flowchart, and processes performed in parallel or individually (for example, parallel processes or objects Processing) is also included. The program may be processed by one CPU or may be distributed and processed by a plurality of CPUs. Furthermore, the program may be transferred to a remote computer for execution. [0108] Further, the series of processes (information processing method) described above can be performed by hardware or software. When a series of processes are executed by software, the various functions are executed by installing a computer in which a program constituting the software is incorporated in dedicated hardware or various programs. The program can be installed, for example, on a general-purpose personal computer from a program recording medium on which the program is recorded. [0109] FIG. 7 is a block diagram showing an example of a hardware configuration of a computer that executes the series of processes described above according to a program. 11-04-2019 28 [0110] In the computer, a central processing unit (CPU) 101, a read only memory (ROM) 102, and a random access memory (RAM) 103 are mutually connected by a bus 104. [0111] Further, an input / output interface 105 is connected to the bus 104. The input / output interface 105 includes an input unit 106 including a keyboard, a mouse and a microphone, an output unit 107 including a display and a speaker, a storage unit 108 including a hard disk and a non-volatile memory, and a communication unit 109 including a network interface. A drive 110 for driving a removable medium 111 such as a magnetic disk, an optical disk, a magneto-optical disk, or a semiconductor memory is connected. [0112] In the computer configured as described above, for example, the CPU 101 loads the program stored in the storage unit 108 into the RAM 103 via the input / output interface 105 and the bus 104 and executes the program. Processing is performed. [0113] The program executed by the computer (CPU 101) is, for example, a magnetic disk (including a flexible disk), an optical disk (CD-ROM (Compact Disc-Read Only Memory), DVD (Digital Versatile Disc), etc.), a magneto-optical disk, or a semiconductor It is recorded on a removable medium 111 which is a package medium including a memory or the like, or is provided via a wired or wireless transmission medium such as a local area network, the Internet, and digital satellite broadcasting. [0114] The program can be installed in the storage unit 108 via the input / output interface 105 by mounting the removable media 111 in the drive 110. The program can be received by the communication unit 109 via a wired or wireless 11-04-2019 29 transmission medium and installed in the storage unit 108. In addition, the program can be installed in advance in the ROM 102 or the storage unit 108. [0115] The present embodiment is not limited to the above-described embodiment, and various modifications can be made without departing from the scope of the present disclosure. [0116] 11 sound pickup apparatus, 121 and 122 microphone arrays, 13 sound wave extraction processing units, 211-1 to 211-M and 212-1 to 212-M microphone elements, 311-1 to 311-M and 312-1 to 312-M Fast Fourier Transform Unit, 321 and 322 Space Fourier Transform Unit, 33 Spherical Wave Extraction Processing Unit, 34 Inverse Space Fourier Transform Unit, 35 Signal Determination Unit, 36 Inverse Fast Fourier Transform Unit 11-04-2019 30

1/--страниц