Patent Translate Powered by EPO and Google Notice This translation is machine-generated. It cannot be guaranteed that it is intelligible, accurate, complete, reliable or fit for specific purposes. Critical decisions, such as commercially relevant or financial decisions, should not be based on machine-translation output. DESCRIPTION JP2015154207 Abstract: The present invention provides an acoustic processing device and an acoustic processing method capable of calculating a transfer function relating to a desired sound source direction according to an indoor environment without using a sound source for measurement. A sound collection position calculation unit calculates a sound collection position of the sound signal based on sound signals of a plurality of channels, and a sound source direction calculation unit calculates a sound source direction based on sound signals of the plurality of channels. The first transfer function calculation unit calculates a first transfer function corresponding to the sound source direction based on the acoustic signals of the plurality of channels, and a second transfer function calculation unit calculates a first transfer function corresponding to each of the plurality of sound source directions. The second transfer function is calculated by interpolating the function. [Selected figure] Figure 1 Sound processing apparatus and sound processing method [0001] The present invention relates to a sound processing device and a sound processing method. [0002] A transfer function representing a transfer characteristic when sound generated by the sound source propagates to the sound collection unit may be used to process the collected sound signal. 11-04-2019 1 The transfer function is used, for example, for the purpose of sound quality correction (equalization or the like), dereverberation, noise suppression, estimation of sound source direction and sound source position, and the like. Therefore, various transfer function calculation methods have been proposed conventionally. For example, the acoustic system described in Patent Document 1 sequentially outputs a plurality of predetermined band noise signals having different frequency bands from a speaker, and sets in advance detection noise signals detected by a microphone provided in a sound field of the speaker. It filters by several band pass filters and analyzes every several frequency band. Further, the acoustic system inputs the band noise signal and the detection noise signal to the transfer function calculation unit, calculates the transfer function from the speaker to the microphone, and corrects the calculated transfer function according to the pass characteristic of the band filter. [0003] In the sound system described in Patent Document 1, the positional relationship between the microphone and the sound source is known, and it is necessary to use a sound source for measurement such as the above-mentioned detection noise signal separately from the sound source for listening. Therefore, there has been proposed a method of estimating the positional relationship between a microphone and a sound source using an acoustic signal collected without using a measurement sound source. For example, the sound source position estimation method described in Patent Document 2 calculates the time difference of audio signals between channels, and uses the current sound source state information from the past sound source state information that is sound source state information consisting of the sound source position and the microphone position. The sound source state information is estimated so as to reduce an error between the time difference between the calculated and calculated audio signals among the channels and the time difference based on the sound source state information. [0004] Patent No. 4482247 Unexamined-Japanese-Patent No. 2012-161071 [0005] Although it is possible to estimate the transfer function using the geometric model from the positional relationship between the microphone and the sound source estimated by the sound source position estimation method described in Patent Document 2, it is possible to individually 11-04-2019 2 transfer the transfer function under different indoor environments. It could not be estimated. As an example of the indoor environment, the reverberation in the room varies depending on the size of the room, the reflection coefficient of the wall surface, the presence or absence of the installed object, the type, and the like. In addition, since the transfer function depends on the positional relationship between the microphone and the sound source, the transfer function of the desired sound source direction can not be obtained. [0006] The present invention has been made in view of the above points, and it is an acoustic processing apparatus and an acoustic processing method capable of calculating a transfer function relating to a desired sound source direction according to the indoor environment without using a sound source for measurement. provide. [0007] (1) The present invention has been made to solve the above problems, and one aspect of the present invention is a sound collection position calculation for calculating the sound collection position of the sound signal based on the sound signals of a plurality of channels. A sound source direction calculation unit that calculates a sound source direction based on the sound signals of the plurality of channels; and a first transmission that calculates a first transfer function corresponding to the sound source direction based on the sound signals of the plurality of channels The sound processing apparatus includes a function calculation unit and a second transfer function calculation unit that calculates a second transfer function by interpolating a first transfer function corresponding to each of a plurality of sound source directions. [0008] (2) Another aspect of the present invention is the above-described acoustic processing device, including: a time difference calculating unit that calculates a time difference of audio signals between channels, and the sound collecting position calculating unit includes a sound collecting position A first state prediction unit that is sound source state information and predicts current sound source state information from past sound source state information, and a difference between a time difference calculated by the time difference calculation unit and a time difference based on the current sound source state information decreases And a first state update unit that updates the current sound source state information. [0009] 11-04-2019 3 (3) Another aspect of the present invention is the acoustic processing device described above, wherein the time difference calculation unit calculates the time difference of the acoustic signal between the channels whose arrangement between the sound collection positions is within a predetermined range. [0010] (4) Another aspect of the present invention is the acoustic processing device described above, wherein the sound source direction calculation unit is delayed by at least a predetermined delay time than the time difference information input to the sound collection position calculation unit. Time difference information of time is input, and time difference information of time delayed by at least the delay time relative to the acoustic signal according to the time difference information input to the sound collection position calculation unit is input to the first transfer function calculation unit . [0011] (5) Another aspect of the present invention is the sound processing apparatus described above, wherein the sound source direction calculation unit predicts current sound source state information from sound source state information which is sound source state information including sound source position A second state prediction unit for updating the current sound source state information such that a difference between the time difference calculated by the time difference calculation unit and the time difference based on the current sound source state information decreases; Equipped with [0012] (6) Another aspect of the present invention is the acoustic processing device described above, wherein the second transfer function calculation unit updates the second transfer function calculated by the first transfer function calculation unit, to the second state update. The interpolation is performed by weighting based on the update amount of the sound source state information updated by the unit. [0013] (7) Another aspect of the present invention is the acoustic processing device described above, wherein the sound source direction is calculated based on the second transfer function calculated by the second transfer function calculation unit and the acoustic signals of the plurality of channels. And a second sound source direction calculation unit, wherein the second state update unit is configured to reduce the difference between the sound source direction calculated by the second sound source direction calculation unit and the sound source direction based on the 11-04-2019 4 current sound source state information. Update current sound source status information. [0014] (8) Another aspect of the present invention is the acoustic processing device described above, wherein the sound source direction calculation unit is a third transmission indicating a phase change due to propagation to the sound collection position calculated by the sound collection position calculation unit. A first sound source direction determination that determines a sound source direction based on a third transfer function calculation unit that calculates a function for each sound source direction, a third transfer function calculated by the third transfer function calculation unit, and acoustic signals of the plurality of channels And a unit. [0015] (9) Another aspect of the present invention is the sound processing method in the sound processing apparatus, wherein the sound collection position calculation step of calculating the sound collection position of the sound signal based on sound signals of a plurality of channels; A sound source direction calculating step of calculating a sound source direction based on an acoustic signal of the channel; and a first transfer function calculating step of calculating a first transfer function corresponding to the sound source direction based on the acoustic signals of the plurality of channels; A second transfer function calculating step of calculating a second transfer function by interpolating a first transfer function corresponding to each of a plurality of sound source directions. [0016] According to the configuration of (1) or (9) described above, a set of the sound source direction and the first transfer function is obtained based on the collected sound signal, and the first transfer function related to the obtained sound source direction is referred A second transfer function relating to the desired sound source direction is calculated. Therefore, it is possible to calculate the transfer function in a desired direction according to the indoor environment without using the sound source for measurement. [0017] According to the configuration of (2) described above, since the sound collection position can be sequentially calculated based on the collected sound signal, the sound collection position at that 11-04-2019 5 time can be obtained without using other measurement means. be able to. [0018] According to the configuration of (3) described above, since the time difference between the close sound collecting positions is calculated, the fluctuation of the calculated time difference is suppressed. Therefore, since the sound collection position calculation unit can stably estimate the sound source state information performed based on the calculated time difference, the sound collection position can be calculated with high accuracy. [0019] According to the configuration of (4) described above, the process performed by the sound collection position calculation unit and the process performed by the sound source direction calculation unit and the first transfer function calculation unit can be performed in parallel. Therefore, since the delay until the estimation error of the sound source state information converges in the sound collection position calculation unit does not reach the sound source direction calculation unit and the first transfer function calculation unit, the sound source direction and the first transfer function can be obtained more quickly be able to. [0020] According to the configuration of (5) described above, since the sound source direction can be sequentially calculated based on the collected sound signal, it is possible to obtain the sound source direction at that point without using other measurement means. it can. [0021] According to the configuration of (6) described above, the second transfer function in which the first transfer function related to the sound source direction is interpolated is calculated by the 11-04-2019 6 weight based on the update amount of the sound source state information used for calculating the sound source direction. Ru. Since the reliability of the sound source direction calculated by the sound source direction calculation unit depends on the update amount of the sound source state information, the reliability of the calculated second transfer function is improved. [0022] According to the configuration of (7) described above, since the sound source state information is updated based on the sound source direction which is information different from the time difference, it falls into the local solution more than the case of using either the time difference or the sound source direction alone. The fear can be reduced. Therefore, the sound collection position can be calculated with higher accuracy based on the sound source state information. [0023] According to the configuration of (8) described above, the third transfer function can be calculated by a simple process, and based on the phase change for each sound source direction at each sound collection position indicated by the calculated third transfer function The sound source direction can be determined. Therefore, the processing amount can be reduced without losing the estimation accuracy of the sound source direction. [0024] BRIEF DESCRIPTION OF THE DRAWINGS It is a schematic block diagram which shows the structure of the sound processing system which concerns on the 1st Embodiment of this invention. 11-04-2019 7 It is a top view which shows the example of 1 arrangement of a sound source and a sound collection part. It is a figure which shows the example of a setting of a proximity | contact channel pair. It is a figure showing the observation time of the sound respectively observed by each channel. It is a flowchart which shows the sound source state estimation process which concerns on the 1st Embodiment of this invention. It is a flowchart showing the 1st transfer function calculation process which concerns on the 1st Embodiment of this invention. It is a figure showing an example of the 1st transfer function data concerning a 1st embodiment of the present invention. It is a flow chart which shows interpolation processing concerning a 1st embodiment of the present invention. It is a figure which shows an example of a target sound source direction and a reference sound source direction. It is a flow chart which shows sound processing concerning a 1st embodiment of the present invention. It is a schematic block diagram which shows the structure of the sound processing system which concerns on the 2nd Embodiment of this invention. It is a flowchart which shows the sound processing which concerns on the 2nd Embodiment of this invention. It is a schematic block diagram which shows the structure of the sound processing system which concerns on the 3rd Embodiment of this invention. It is a figure showing an example of the 1st transfer function data concerning a 3rd embodiment of the present invention. It is a flow chart which shows sound processing concerning a 3rd embodiment of the present invention. It is a schematic block diagram which shows the structure of the sound processing system which concerns on the 4th Embodiment of this invention. It is a flowchart which shows the sound processing which concerns on the 4th Embodiment of this invention. It is a schematic block diagram which shows the structure of the sound processing system which concerns on the 5th Embodiment of this invention. It is a flowchart which shows the sound processing which concerns on the 5th Embodiment of this invention. It is a schematic block diagram which shows the structure of the sound processing system which concerns on the 6th Embodiment of this invention. It is a flowchart which shows the sound processing which concerns on the 6th Embodiment of this invention. It is a top view which shows the other example of arrangement | positioning of a sound source and a sound 11-04-2019 8 collection part. [0025] First Embodiment Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIG. 1 is a schematic block diagram showing the configuration of the sound processing system 1 according to the present embodiment. The sound processing system 1 includes a sound processing apparatus 10 and N + 1 (N is an integer larger than 1) sound collection units 11-0 to 11 -N. In the following description, each of the N + 1 sound collection units 11-0 to 11 -N or the whole may be referred to simply as the sound collection unit 11. Each of the sound collection units 11-n (n is an integer from 0 to N) is a microphone. The sound collection unit 11-n outputs the recorded sound signal to the sound processing apparatus 10. Therefore, the sound collection unit 11 outputs an acoustic signal of N + 1 channel based on the sound reached as a whole to the sound processing apparatus 10. [0026] The sound processing apparatus 10 includes a signal input unit 102, a peak detection unit 103, a time difference calculation unit 104, a sound collection position calculation unit 105, a sound source direction calculation unit 106, a first transfer function calculation unit 107, a transfer function storage unit 108, and a first The second transfer function calculation unit 109 is included. [0027] The signal input unit 102 receives an acoustic signal from each of the sound collection units 11n. In the following description, the acoustic signal input from the sound collection unit 11-n may be referred to as an acoustic signal of channel n. The acoustic signal of each channel is a digital signal formed of signal values of a plurality of samples. The signal input unit 102 outputs the input N + 1 channel acoustic signal to the peak detection unit 103. The signal input unit 102 is, for example, a data input interface. [0028] 11-04-2019 9 The peak detection unit 103 receives an acoustic signal of N + 1 channel from the signal input unit 102. The peak detection unit 103 detects a peak (maximum value) of a signal value indicated by an acoustic signal of any channel (for example, channel 0), and determines a predetermined time (for example, around a sample time when the peak is detected). , 30 ms) and extract the acoustic signal of each channel. The section to be extracted is the same between the channels, and the length of the section may be at least equal to or longer than the length at which frequency analysis can be performed. By extracting the sound signal around the peak, the part including the target sound such as the voice uttered by human being or the musical tone is extracted, and the part mainly including the noise is excluded. [0029] When detecting a peak of the acoustic signal, the peak detection unit 103 smoothes the signal value by taking a moving average of the signal values in advance in an average section of the length in advance for each sample forming the acoustic signal. By the smoothing, it is possible to eliminate the influence of noise mixed in the sound signal, a pulse whose signal value changes suddenly, and the like. The peak detection unit 103 differentiates between the samples about the smoothed signal value, and determines that the obtained differential value changes from a positive value larger than 0 to a negative value smaller than 0 as a peak. The peak detection unit 103 outputs the extracted N + 1 channel acoustic signal to the time difference calculation unit 104 and the first transfer function calculation unit 107. [0030] The time difference calculation unit 104 calculates a time difference for each set of two channels (channel pair) for the acoustic signal of the N + 1 channel input from the peak detection unit 103. The calculated time difference indicates the difference in time when the sound wave reaches the sound collection unit 11 corresponding to each of the two channels from the sound source. The time difference calculation unit 104 outputs time difference information indicating the time difference for each channel pair to the sound collection position calculation unit 105 and the sound source direction calculation unit 106. The process of calculating the time difference will be described later. [0031] 11-04-2019 10 The sound collection position calculation unit 105 sequentially calculates the sound collection position based on the time difference information input from the time difference calculation unit 104. The sound collection position calculation unit 105 calculates a sound collection position using a SLAM (Simultaneous Localization And Mapping) method. The sound collection position calculation unit 105 predicts the sound source state information ξk | k−1 at that time (current time k) from the sound source state information ξk−1 in the past (for example, previous time k−1), and the time difference calculation unit 104 The current sound source state information ξ k is updated based on the time difference information input from the. The sound source state information ξ k at each time k is, for example, the sound source position (x k, yk), the position (sound collecting position) (mn, x, m n, y) of each sound collecting unit 11-n, and the observation time error m n, τ Contains information indicating The sound collection position calculation unit 105 updates the sound source state information so as to reduce an error between the time difference represented by the time difference information and the time difference based on the predicted sound source state information when estimating the sound source state information. The sound pickup position is given by the updated sound source state information. [0032] The sound collection position calculation unit 105 uses, for example, an Extended Kalman Filter (EKF) method in prediction and update of sound source state information. The prediction and update of sound source state information in the sound collection position calculation unit 105 will be described later. The sound collection position calculation unit 105 outputs the updated sound source state information to the sound source direction calculation unit 106. The sound source state information to be output includes information indicating the estimated sound collection position. The configuration of the sound collection position calculation unit 105 will be described later. [0033] The sound source direction calculation unit 106 calculates the sound source direction d based on the time difference information input from the time difference calculation unit 104 and the sound source state information input from the sound collection position calculation unit 105. When calculating the sound source direction d, the sound source direction calculation unit 106 predicts and updates sound source state information using a method similar to that of the sound collection position calculation unit 105. The sound source direction calculation unit 106 uses the 11-04-2019 11 sound source state information input from the sound collection position calculation unit 105 as an initial value of the sound source state information, and calculates the position (mn, The x, mn, y) and the observation time errors mn, τ of the channel n are treated as fixed values. In other words, the sound source direction calculation unit 106 predicts and updates the sound source position (xk, yk) as a variable value. [0034] In the SLAM method, the sound collecting position may be calculated more accurately than the sound source position. Therefore, the sound source direction calculation unit 106 reduces the degree of freedom by setting the sound collection position calculated by the sound collection position calculation unit 105 to a constant value, and repeats the process of predicting and updating the sound source state information. The estimation accuracy of the position can be improved. The sound source direction calculation unit 106 calculates a sound source direction based on the calculated sound source position and the sound collection position calculated by the sound collection position calculation unit 105. The sound source direction may be, for example, the direction of the sound source based on the center of gravity of N + 1 sound collection units 11-0 to 11 -N, or may be based on the center of the circumference on which each sound collection unit is disposed. And the direction of the sound source. The sound source direction calculation unit 106 outputs sound source direction information indicating the calculated sound source direction to the first transfer function calculation unit 107. The configuration of the sound source direction calculation unit 106 will be described later. [0035] The acoustic signal of the N + 1 channel is input from the peak detection unit 103 to the first transfer function calculation unit 107. Of the input N + 1 channel acoustic signals, a predetermined one acoustic signal is called a representative channel, and each of the other channels is called a target channel. In the following description, the representative channel is channel 0, and the target channel is channels 1 to N. The first transfer function calculation unit 107 calculates the transfer function A [d] [n] of the target channel based on the sound signal of each target channel n and the sound signal of the representative channel 0. The calculated transfer function A [d] [n] is referred to as a first transfer function A [d] [n]. The first transfer function calculation unit 107 associates the sound source direction information input from the sound source direction calculation unit 106 with the first transfer function information indicating the calculated first transfer function A [d] [n] to obtain a transfer function. It is stored in the storage unit 108. The process of calculating the first transfer function will be described later. 11-04-2019 12 [0036] The first transfer function calculation unit 107 stores the sound source direction information and the first transfer function information in the transfer function storage unit 108 in association with each other. The first transfer function data is formed in the transfer function storage unit 108 by sequentially accumulating pairs of sound source direction information and first transfer function information stored in association with each other. An example of the first transfer function data will be described later. [0037] The second transfer function calculation unit 109 refers to the first transfer function data stored in the transfer function storage unit 108, and based on the sound source direction indicated by the sound source direction information and the target sound source direction (target sound source direction), The first transfer functions respectively corresponding to the sound source direction information are interpolated. A second transfer function corresponding to the target sound source direction is calculated by interpolation. When interpolating the first transfer function, the second transfer function calculation unit 109 uses, for example, a frequency-timedomain linear interpolation (FTDLI) method. A plurality of target sound source directions may be set in advance in the second transfer function calculation unit 109, and second transfer functions corresponding to the respective target sound source directions may be calculated. The calculation of the second transfer function by the second transfer function calculation unit 109 will be described later. [0038] (Example of Arrangement of Sound Source and Sound Collection Unit) Next, an example of arrangement of the sound source and the sound collection unit 11-n will be described. FIG. 2 is a plan view showing an arrangement example of the sound source S and the sound collection unit 11-n. In FIG. 2, the X direction is shown to the right of the drawing. The example shown in FIG. 2 indicates that the sound source S and 8 (N + 1) sound collection units 11-0 to 11-7 are disposed in the room Rm. The sound collection units 11-0 to 11-7 are fixed at equal intervals on the circumference of a predetermined radius か ら from the center C to the head of the robot (moving object) Ro. Therefore, the position of each sound collection unit 11-n changes according 11-04-2019 13 to the movement of the robot Ro and the change of its posture. Further, the sound pickup units 11-0 to 11-7 are arranged at different positions, and the positional relationship is fixed to each other, whereby a microphone array is formed. [0039] The sound source S is an entity that generates a sound (for example, an object such as a person, an instrument, an audio device, or the like). The sound source direction d is the direction of the sound source S with reference to the X-axis direction from the center C of the positions of the eight sound collection units 11-0 to 11-7. A [d] [0] to A [d] [7] are transfer functions of the sound collection units 11-0 to 11-7 with respect to the sound source direction d, that is, from the sound source S arranged in the sound source direction d The transfer function to each of 11-0 to 11-7 is shown. In the following description, the case of mainly dealing with the transfer functions A [d] [0] to A [d] [7] of the sound collection units 11-0 to 11-7 with respect to the sound source direction d in a two-dimensional plane is taken as an example. Take. [0040] In the following description, the position of each sound collection unit 11-n (n is an integer from 0 to N) may be referred to as a sound collection position or a sound collection position of channel n. The position of the representative point (for example, the center of gravity) of the microphone array formed by the N + 1 sound collection units 11-0 to 11 -N may be referred to as the “position of the sound collection unit 11”. Further, as described later, the transfer functions A [d] [0] to A [d] [7] are obtained for each frequency ω, but the representation of the frequency ω may be omitted in the drawings and the following description. [0041] (Time Difference Calculation Processing) Next, the time difference calculation processing by the time difference calculation unit 104 (FIG. 1) will be described. The time difference calculation unit 104 calculates a time difference for each channel pair for the acoustic signal of the N + 1 channel input from the peak detection unit 103. The time difference calculation unit 104 selects two sound collection units 11-n and 11-m (m is from 0, where the sound collection positions are geometrically close to each other out of N · (N−1) / 2 channel pairs. Time differences are calculated for channel pairs consisting of channels n and m corresponding to integers up to N 11-04-2019 14 and m ≠ n), respectively. N · (N−1) / 2 is the total number of channel pairs when N + 1 sound collecting units 11-0 to 11 -N are cyclically arranged. In the time difference calculation unit 104, as a channel pair for which the time difference is to be calculated, a channel pair corresponding to each of the other sound collection units 11-m within a predetermined range from one sound collection unit 11-n is preset. Keep it. The channel pair consisting of the channels n and m is called a close channel pair. [0042] FIG. 3 is a diagram showing an example of setting of adjacent channel pairs. In the example shown in FIG. 3, a close channel pair in which one channel is the channel 0 corresponding to the sound collection unit 11-0 is a close channel consisting of a combination of the channel 0 and the channel 1 corresponding to the sound collection unit 11-1. A close channel pair chp07 is a pair of the pair chp01, the channel 0, and the channel 7 corresponding to the sound collection unit 11-7. A close channel pair in which one channel is the channel 5 corresponding to the sound collection unit 11-5 is a close channel pair chp 45 consisting of a combination of the channel 5 and the channel 4 corresponding to the sound collection unit 11-4, the channel 5 The adjacent channel pair chp 56 is a pair of the channel 6 corresponding to the sound unit 11-6. In the case where one channel is the other channel, adjacent channel pairs can be similarly defined. [0043] In all adjacent channel pairs set in the time difference calculation unit 104, channels 0 to N may be included in one channel forming one adjacent channel pair. For example, if adjacent channel pairs chp01, chp12, chp23, ch34, chp45, chp56, chp67, chp07, which are pairs of channel 0 and channel 1, forming adjacent sound pickup units are included. Good. [0044] Next, the time difference Δtmn, k between the channels m and n calculated for the adjacent channel pair chpmn will be described. FIG. 4 is a diagram showing observation times tm, k, tn, k of sounds observed in channels m, n, respectively. The horizontal axis indicates time. The time difference Δtmn, k is the time tn, k-tm, k between the observation time tn, k and the observation time tm, k. At the observation time of each channel, the observation time error is added to the propagation time of the sound wave from the sound source S. The observation time tm, k is a 11-04-2019 15 time at which a sound wave is observed in the sound collection unit 11-m when the sound source S emits a sound at time Tk. The observation time tm, k is a time obtained by adding the propagation time Dm, k / c of the sound wave from the sound source S to the sound collection unit 11-m to the observation time error mm, τ in the channel m at time Tk. Here, Dm, k indicates the distance from the sound source S to the sound collection unit 11-m. c shows the speed of sound. The observation time tn, k is a time obtained by adding the propagation time Dn, k / c of the sound wave from the sound source S to the sound collection unit 11-n to the observation time error mn, τ in the channel n at time Tk. Dn, k indicates the distance from the sound source S to the sound collection unit 11-n. Therefore, the time difference Δtmn, k is expressed by equation (1). [0045] [0046] In Expression (1), the propagation time Dn, k is a function of the sound source position (xk, yk) and the sound collection position (mn, x, mn, y) of the channel n, as shown in Expression (2). [0047] [0048] The propagation time Dm, k is given by substituting the sound collection position (mm, x, mm, y) of the channel m into the sound collection position (mn, x, mn, y) of the channel n in equation (2). Therefore, the time difference Δtn, k is the observation time error mm, τ, mn, τ of the channel m, n, the sound source position (xk, yk), and the sound collection position (mm, x, mm, y) of the channel m, n , (Mn, x, mn, y), that is, a function of the aforementioned sound source state information. The time difference calculation unit 104 generates an observation value vector ζ k at time k having the calculated time difference for each channel pair as an element, and outputs the generated observation value vector ζ k to the sound collection position calculation unit 105 as time difference information. 11-04-2019 16 [0049] (Configuration of Sound Collection Position Calculation Unit) Referring back to FIG. 1, the configuration of the sound collection position calculation unit 105 will be described. The sound collection position calculation unit 105 calculates the sound collection position based on the time difference information input from the time difference calculation unit 104 using the SLAM method based on EKF. When calculating the sound collection position, the sound collection position calculation unit 105 is based on the observed value vector ζ k of the current time k and the sound source state information ξ k | k−1 of the current time k predicted from the previous time k−1. The sound source state information ξ k at the current time k is updated so that the error with the calculated observation value vector ζ k | k−1 decreases. The updated sound source state information ξk and the predicted sound source state information ξk | k−1 are information including the sound collecting position (mn, x, mn, y) of the channel n at time k. The process of calculating the sound source state information ξk | k−1 will be described later. The sound collection position calculation unit 105 includes a state update unit 1051, a state prediction unit 1052, a Kalman gain calculation unit 1054, and a convergence determination unit 1055. [0050] The state update unit 1051 adds the observation error vector δk to the observed value vector ζ k at the current time k indicated by the time difference information input from the time difference calculation unit 104, and updates the observed value vector ζ k to the addition value obtained by the addition. . The observation error vector δk is a random number vector according to a Gaussian distribution having an average value of 0 and being distributed with a predetermined covariance. A matrix including this covariance as an element of each row and column is represented as a covariance matrix Q. [0051] The state updating unit 1051 updates the sound source state information ξ k at the current time k using, for example, Expression (3) based on the observation value vector ζ k at the current time k indicated by the input time difference information. 11-04-2019 17 [0052] [0053] In equation (3), the sound source state information ξk | k−1 at the current time k indicates the sound source state information at the current time k predicted from the sound source state information at the previous time k. K k indicates the Kalman gain at the current time k. The observation value vector ζ k | k −1 indicates the observation value vector of the current time k predicted from the previous time k−1. That is, equation (3) is the prediction residual (残 差 k −ζ k) of the observed value vector of the current time k in the sound source state information ξ k | k−1 of the current time k predicted from the sound source condition information of the previous time k−1. It is shown that the sound source state information ξ k at the current time k is calculated by adding the multiplication value vector K k (ζ k-ζ k −1 k -1) obtained by multiplying the kalman gain K k by │k -1). The multiplication value vector K k (ζk−ζk | k−1) corresponds to the update amount of the sound source state information ξk | k−1. The sound source state information ξk | k−1 and the observation value vector ζk | k−1 are respectively input from the state prediction unit 1052, and the Kalman gain Kk is input from the Kalman gain calculation unit 1054. Based on the covariance matrix Pk | k−1 of the current time k predicted from the kalman gain Kk, the matrix Hk, and the covariance matrix Pk−1 of the previous time k−1, the state updating unit 1051 performs, for example, The covariance matrix Pk of the current time k is calculated using [0054] [0055] In equation (4), I represents an identity matrix. That is, equation (4) is obtained by multiplying the covariance matrix Pk | k−1 by the matrix 11-04-2019 18 obtained by subtracting the product of the Kalman gain Kk and the matrix Hk from the unit matrix I to obtain the covariance matrix Pk at the current time k. Indicates to calculate. When the covariance matrix Pk indicates the magnitude of the error of the sound source state information ξk, the equation (4) sets the covariance matrix Pk | k−1 to Pk so as to reduce the magnitude of the error of the sound source state information ξk. Indicates to update. The matrix Hk is input from the Kalman gain calculation unit 1054. The state update unit 1051 outputs the calculated covariance matrix Pk of the current time k and the sound source state information ξ k to the state prediction unit 1052. Further, the state update unit 1051 outputs the sound source state information ξ k to the convergence determination unit 1055. [0056] The sound source state information ξ k−1 at the previous time k−1 and the covariance matrix P k−1 are input to the state prediction unit 1052 from the state update unit 1051. The state prediction unit 1052 predicts the sound source state information ξk | k−1 at the current time k from the sound source state information ξk−1 at the previous time k−1, and calculates the current time from the covariance matrix Pk−1 at the previous time k−1. Predict the covariance matrix Pk | k-1 of k. [0057] Here, the state prediction unit 1052 sets a predetermined movement amount (Δx ′) at the current time k to the sound source position (xk−1, yk−1) indicated by the sound source state information ξk−1 at the previous time k−1. , Δy ′) <T> plus the error vector ε k indicating the error of the movement amount is added to the movement amount (Δx, Δy) <T> to calculate the sound source state information ξk | k−1 at the current time k Do. (...) <T> indicates transpose of a vector or matrix. The error vector ε k is a random number vector whose mean value is 0 and whose distribution follows a Gaussian distribution. A matrix including the covariance representing the characteristic of the Gaussian distribution as an element of each row and column is represented as a covariance matrix R. Specifically, the state prediction unit 1052 calculates the sound source state information ξk | k−1 at the current time k using Expression (5). [0058] 11-04-2019 19 [0059] In Equation (5), the matrix Fη is the matrix (2 rows 3N + 5 columns) represented by Equation (6). [0060] [0061] The movement amount (Δx ′, Δy ′) <T> is given in accordance with the movement model of the sound source assumed in advance. The movement model is, for example, a random walk model. Specifically, in the random walk model, a random number vector whose mean value is 0 and whose distribution follows a Gaussian distribution having a predetermined dispersion value is used as the movement amount (Δx, Δy) <T>. On the other hand, the state prediction unit 1052 calculates the covariance matrix Pk | k-1 at the current time k from the covariance matrix Pk-1 at the previous time k-1 using, for example, equation (7). [0062] [0063] Equation (7) predicts the covariance matrix Pk | k-1 of the current time k by adding the covariance matrix R representing the error distribution of the movement amount to the covariance matrix Pk-1 of the previous time k-1 Represents that. [0064] Further, the state prediction unit 1052 calculates the time difference for each channel pair given 11-04-2019 20 by the equations (1) and (2) based on the calculated sound source state information ξk | k−1 of the current time k, and calculates the calculated time difference as an element An observed value vector ζk | k−1 at time k is generated. The state prediction unit 1052 outputs the calculated sound source state information ξk | k−1, covariance matrix Pk | k−1, and observed value vector ζk | k−1 at time k to the state updating unit 1051 and the Kalman gain calculating unit 1054. Do. [0065] The Kalman gain calculation unit 1054 is, for example, based on the above-described covariance matrix Q, the sound source state information ξk | k−1 at time k input from the state prediction unit 1052, and the covariance matrix Pk | k−1. The Kalman gain Kk is calculated using 8). [0066] [0067] In equation (8), (...) <− 1> indicates the inverse matrix of matrix. The matrix H k is a Jacobian obtained by partially differentiating each element of the observation function vector h (ξ k | k −1) with each element of the sound source state information ξ k | k −1 as represented by equation (9) is there. [0068] [0069] In Expression (9), the observation function vector h (ξk) indicates the observation value vector ζk calculated based on the sound source state information ξk. That is, the matrix Hk is calculated by partially differentiating each element (see Expression (1)) 11-04-2019 21 of the observation value vector ζk | k−1 input from the state updating unit 1051. The Kalman gain calculation unit 1054 outputs the calculated Kalman gain Kk and the matrix Hk to the state update unit 1051. [0070] The convergence determination unit 1055 determines whether or not the estimation error of the sound source state information さ れ k input from the state update unit 1051 has converged. When it is determined that the convergence has occurred, the convergence determination unit 1055 outputs the sound source state information 判定 k to the sound source direction calculation unit 106. The convergence determination unit 1055, for example, indicates the sound collecting position (mn, x, mn, y) indicated by the sound source state information ξ k-1 at the previous time k-1 and the sound collecting position indicated by the sound source state information ξ k at the current time k (mn , X, mn, y) are calculated. The convergence determination unit 1055 determines that convergence has occurred when the calculated average distance Δξm becomes smaller than a preset threshold, and determines that convergence has not occurred otherwise. [0071] (Sound Source State Estimation Process) Next, a sound source state estimation process performed when the sound collection position calculation unit 105 calculates the sound collection position will be described. FIG. 5 is a flowchart showing a sound source state estimation process according to the present embodiment. (Step S101) The state prediction unit 1052 sets initial values of the sound source state information ξ k-1 and the covariance matrix P k -1. Thereafter, the process proceeds to step S102. (Step S102) The state prediction unit 1052 moves the movement amount (Δx, Δy) in which the error vector εk is added to the sound source position (xk-1, yk-1) indicated by the sound source state information ξk-1 at the previous time k-1. The sound source state information ξk | k−1 at the current time k is predicted by adding to <T> (equation (5)). The state prediction unit 1052 predicts the covariance matrix Pk | k-1 of the current time k by adding the covariance matrix R representing the error distribution of the amount of movement to the covariance matrix Pk-1 of the previous time k-1 (Formula (7)). Thereafter, the process proceeds to step S103. 11-04-2019 22 [0072] (Step S103) The Kalman gain calculation unit 1054 is based on the covariance matrix Q indicating the distribution of the observation error, the predicted sound source state information ξk | k−1 of the current time k, and the covariance matrix Pk | k−1. , Kalman gain Kk is calculated (equation (8)). Thereafter, the process proceeds to step S104. (Step S104) The state updating unit 1051 generates Kalman for the predicted residual (ζk−ζk | k−1) of the observed value vector of the current time k, in the predicted current state k of the source state information ξk | k−1 ′. The multiplication value vector obtained by multiplying the gain Kk is added to calculate the sound source state information ξ k at the current time k (equation (3)). The state update unit 1051 calculates the covariance matrix Pk at the current time k by multiplying the covariance matrix Pk | k−1 by the matrix obtained by subtracting the product of the Kalman gain Kk and the matrix Hk from the unit matrix I. (Formula (4)). Thereafter, the process proceeds to step S105. [0073] (Step S105) The convergence determination unit 1055 determines whether or not the estimation error of the sound source state information ξ k has converged. When the convergence determination unit 1055 determines that convergence has occurred (YES in step S105), the convergence determination unit 1055 outputs the sound source state information ξ k to the sound source direction calculation unit 106, and ends the processing illustrated in FIG. If the convergence determination unit 1055 determines that convergence has not occurred (NO in step S105), the current time k is updated to the previous time k-1, and the process proceeds to step S102. [0074] (Configuration of Sound Source Direction Calculation Unit) Referring back to FIG. 1, the configuration of the sound source direction calculation unit 106 will be mainly described in comparison with the sound collection position calculation unit 105. Similar to the sound collection position calculation unit 105, the sound source direction calculation unit 106 calculates the sound collection position using the SLAM method based on EKF based on the time difference information input from the time difference calculation unit 104. That is, the same 11-04-2019 23 process as the sound source state estimation process of FIG. 5 is performed. By this processing, the sound source direction calculation unit 106 calculates the observed value based on the observed value vector ζ k at the current time k and the sound source state information ξ k | k−1 at the current time k predicted from the previous time k−1. The sound source state information ξk at the current time k is updated so that the error with the vector ζk | k−1 is reduced. The updated sound source state information ξk or the predicted sound source state information ξk | k−1 is information including the sound source position (x, y) of the channel n at time k. [0075] The sound source direction calculation unit 106 includes a state update unit 1061, a state prediction unit 1062, a Kalman gain calculation unit 1064, and a convergence determination unit 1065. The state updating unit 1061, the state predicting unit 1062, the Kalman gain calculating unit 1064, and the convergence determining unit 1065 are respectively the state updating unit 1051, the state predicting unit 1052, the Kalman gain calculating unit 1054, and the convergence determination of the collected sound position calculating unit 105. The same processing as that of the unit 1055 is performed. [0076] However, the state prediction unit 1062 starts the process of calculating the sound source state information ξk | k-1 with the sound source state information ξk input from the sound collection position calculation unit 105 as an initial value. Further, the state updating unit 1061 and the state predicting unit 1062 select the sound collecting position (mn, x, mn, y) of each channel n included in the sound source state information ξk | k−1, ξk, and the observation time error mn, τ. Is a constant value, and the other elements, the sound source position (xk, yk) are variable values to calculate sound source state information ξk | k−1, ξk. In response to this, the state updating unit 1061, the state predicting unit 1062, and the Kalman gain calculating unit 1064 calculate the positions of the sound sources in the calculation of covariance matrices Pk | k-1, Pk, Kalman gain Kk, matrix Hk, and other matrices. Processing is performed only on the matrix element related to (xk, yk). [0077] 11-04-2019 24 The convergence determination unit 1065 determines whether or not the estimation error of the sound source state information さ れ k input from the state update unit 1061 has converged. When it is determined that the convergence has occurred, the convergence determination unit 1065 calculates the sound source direction d based on the sound source position (xk, yk) indicated by the sound source state information ξk. The convergence determination unit 1065 outputs sound source direction information indicating the calculated sound source direction d to the first transfer function calculation unit 107. The convergence determination unit 1065, for example, indicates the sound source position (xk-1, yk-1) indicated by the sound source state information ξk-1 at the previous time k-1 and the sound source position (xk, yk) indicated by the sound source state information ξk at the current time k. Calculate the average distance Δξs between). The convergence determination unit 1065 determines that convergence has occurred when the calculated average distance Δξs becomes smaller than a preset threshold value, and determines that convergence has not occurred otherwise. [0078] (Calculation Process of First Transfer Function) As described above, the first transfer function calculation unit 107 calculates the target channel n from the sound signal of the target channel n and the sound signal of the representative channel 0 based on the regression model. The transfer function A [d] [n] is calculated. In the regression model, observation values formed based on the acoustic signals of the representative channel 0 and the target channel n are given by convolution of a regressor formed based on the acoustic signal of the representative channel 0 and a transfer function as a basis parameter, And, it is assumed that the transfer function is constant within a predetermined observation time. Then, in the regression model, the transfer function is calculated by subtracting the contribution of the regressor from the observed value. Thereby, the first transfer function calculation unit 107 calculates the transfer function of each target channel n based on the acoustic signals recorded by the sound collection units 11-0 to 11 -N without using the sound source signal for measurement. be able to. [0079] Next, the process (first transfer function calculation process) in which the first transfer function calculation unit 107 calculates the first transfer function will be described. FIG. 6 is a flowchart showing a first transfer function calculation process according to the present embodiment. (Step S201) The first transfer function calculation unit 107 delays the acoustic signal of each target channel n by a predetermined delay time T. The purpose of the delay time T is to delay the sound signal of each target channel n more than the sound signal of the representative channel 0 11-04-2019 25 regardless of the positional relationship between the sound source and each sound collection unit 11. For example, when the N + 1 sound pickup units 11-0 to 11-N are disposed on a common circumference (FIG. 2), the delay time T is at least a distance corresponding to the diameter of the circumference. It should be longer than the time for the sound wave to travel. Thereafter, the process proceeds to step S202. [0080] (Step S202) The first transfer function calculation unit 107 converts the acoustic signal x0 of the representative channel and the acoustic signal xn of each target channel into a frequency domain for each frame consisting of a predetermined number L of samples, and converts the conversion factor X0 (ω). , Xn (ω) are calculated. Here, ω indicates a frequency. Then, the first transfer function calculation unit 107 aggregates the conversion coefficients X0 (ω) and Xn (ω) calculated for each frame into predetermined F frames. Here, F is a predetermined number of frames, for example, eight. In the following description, the transform coefficients calculated in frame f are represented as X0, f (ω), Xn, f (ω), respectively. Thereafter, the process proceeds to step S203. [0081] (Step S203) The first transfer function calculation unit 107 generates a regressor (vector) す る whose elements are transform coefficients X0 and f (ω) of the representative channel for F frames. The regressor Φ is [X0, 1 (ω), X0, 2 (ω), ..., X0, F (ω)] <T>. The first transfer function calculation unit 107 generates an observation value (matrix) X having, as elements, transform coefficients of N + 1 channels for F frames. The observation value X is a matrix having transform coefficient vectors X0, X1,..., XN + 1 for N + 1 channels as elements. Specifically, the observation value X is [X0 (ω), X1 (ω),..., XN (ω)] <T>. The transform coefficient vector Xn (ω) of each channel n is [Xn, 1 (ω), Xn, 2 (ω),..., Xn, F (ω)] <T>. Thereafter, the process proceeds to step S204. [0082] (Step S204) The first transfer function calculation unit 107 calculates the transfer function A [d] [0], A [d] [1] for each channel from the constructed observation value X and the regressor Φ using equation (10). ,... A [d] [N] is calculated. 11-04-2019 26 [0083] [0084] In equation (10), A <T> (ω) represents a transfer function vector having transfer function A [d] [n] for each channel as an element. That is, A <T> (ω) is [A [d] [0], A [d] [1], ..., A [d] [N]]. Further, (Φ Φ <T>) <-1> <<T> corresponds to a pseudo inverse matrix of the regressor Φ (onecolumn matrix). That is, equation (10) indicates that the observed value X is approximately divided by the regressor Φ to calculate the transfer function vector A <T> (ω). Thereafter, the process proceeds to step S205. [0085] (Step S205) The first transfer function calculating unit 107 calculates a transfer function A [d] [1] (ω),..., A [d] [] related to the target channel from the calculated transfer function vector A <T> (ω). N] (ω) is extracted as a first transfer function. However, the first transfer function calculation unit 107 may ignore the transfer function A [d] [0] (ω) related to the representative channel. The acoustic signal relating to the representative channel is used as the regressor レ ッ サ, so that the transfer function A [d] [0] (ω) does not take a significant value. The first transfer function calculation unit 107 calculates sound source direction information indicating the sound source direction d input from the sound source direction calculation unit 106 and a first transfer function indicating the calculated first transfer function A [d] [n] (ω). Information is stored in the transfer function storage unit 108 in association with each other. Thereafter, the process shown in FIG. 6 is ended. [0086] (Example of First Transfer Function Data) Next, an example of the first transfer function data stored in the transfer function storage unit 108 will be described. FIG. 7 is a diagram showing an example of first transfer function data according to the present embodiment. In the example 11-04-2019 27 shown in FIG. 7, the sound source direction d indicated by the sound source direction information and the transfer functions A [d] [1] (ω), A [d] [2] (1) of the channels 1 to 7 indicated by the first transfer function information. ω),... A [d] [7] (ω) are associated with each other. For example, as shown in the second row of FIG. 7, in the sound source direction 13 °, the transfer function of channel 1 0.24 + 0.35j (j is an imaginary unit), the transfer function of channel 2 0.44-0.08j , Channel 3 transfer function 0.40 + 0.29 j, channel 4 transfer function 0.18 + 0.51 j, channel 5 transfer function-0.37 + 0.32 j, channel 6 transfer function-0.14 + 0.48 j, and channel 7 The transfer function of 0.15 + 0.29 j is associated with Since the conversion of the acoustic signal of each channel to the frequency domain is performed for each frame of a predetermined number L of samples, the transfer function of each channel is actually L / 2 frequencies for each sound source direction d. It is given for each of ω. However, for the sake of simplicity, only one each of L / 2 is illustrated in FIG. [0087] The sound source directions corresponding to the first transfer function may be irregularly arranged between each row. For example, the sound source directions shown in the first column, the first, the second row, and the third row in FIG. 7 are 13 °, 29 °, and 35 °, respectively. This irregular array is generated by the first transfer function calculation unit 107 storing the sound source direction information indicating the calculated sound source direction in the transfer function storage unit 108 each time. Therefore, the first transfer function calculation unit 107 may rearrange the sets of sound source direction information and first transfer function information such that the sound source directions indicated by the sound source direction information are arranged in ascending order or descending order. As a result, the second transfer function calculation unit 109 can efficiently search for sound source direction information to be referred to. Also, in the sound source direction indicated by the sound source direction information stored in the transfer function storage unit 108, a sound source direction equal to the sound source direction d newly calculated by the sound source direction calculation unit 106 or within a predetermined range from the sound source direction. If the sound source direction is included, the first transfer function calculation unit 107 may replace the first transfer function information stored in association with the sound source direction information with the newly generated first transfer function information. . [0088] (Calculation Process of Second Transfer Function) The second transfer function calculation unit 109 specifies sound source direction information to be referred to from the first transfer 11-04-2019 28 function data stored in the transfer function storage unit 108 based on the target sound source direction. In the following description, a sound source direction to be referred to is referred to as a reference sound source direction, and information indicating the reference sound source direction is referred to as reference sound source direction information. The second transfer function calculation unit 109 calculates a second transfer function corresponding to the target sound source direction by interpolating the first transfer functions respectively corresponding to the specified reference sound source direction using the FTDLI method. The FTDLI method is a method in which the phase and amplitude of the first transfer function in each reference direction are interpolated based on the target sound source direction, and the second transfer function is configured by the phase and amplitude obtained by the interpolation. [0089] Specifically, the second transfer function calculation unit 109 executes the interpolation processing described below. FIG. 8 is a flowchart showing interpolation processing according to the present embodiment. (Step S301) The second transfer function calculation unit 109 specifies sound source direction information indicating the sound source directions d1 and d2 adjacent to each other as the two sound source directions sandwiching the target sound source direction d as reference sound source direction information (See FIG. 9). In FIG. 9, the sound source directions d1 and d2 indicate the directions of the sound sources S1 and S2, respectively. Thereafter, the process proceeds to step S302. (Step S302) The second transfer function calculation unit 109 reads, from the transfer function storage unit 108, first transfer function information respectively corresponding to the specified reference sound source direction information. Thereafter, the process proceeds to step S303. [0090] (Step S303) The second transfer function calculation unit 109 calculates the first transfer function A [d1] [n] (ω), A [d2] [n] (ω) represented by the read first transfer function information in the frequency domain. A transfer function Am [F] (ω) is calculated (interpolated) using a frequency domain linear interpolation (FDLI) method. When calculating the transfer function Am [F] (ω), the second transfer function calculation unit 109 uses Equation (11). [0091] 11-04-2019 29 [0092] In equation (11), δ1 and δ2 indicate interpolation coefficients, respectively. The interpolation coefficients δ1 and δ2 are coefficients indicating the degree of contribution of the first transfer functions A [d1] [n] (ω) and A [d2] [n] (ω) respectively corresponding to the reference sound source directions d1 and d2. is there. The interpolation coefficient δ1 is the ratio | (d2-d1) / (d-d1) of the angle (d2-d1) between the reference sound source directions to the angle (d-d1) between the reference sound source direction d1 and the target sound source direction d. The interpolation coefficient δ2 is the ratio | (d2-d1) / (d2-d) of the angle (d2-d1) between the reference sound source directions to the angle (d2-d) between the reference sound source direction d2 and the target sound source direction d. d) | That is, the transfer function Am [F] is the target sound source of the first transfer function A [d1] [n] (ω), A [d2] [n] (ω) corresponding to the two reference sound source directions d1 and d2, respectively. It is an arithmetic mean with the inverse number of the internal division ratio according to the direction d as each weighting coefficient. Interpolation coefficients are provided such that the degree of contribution decreases as the reference sound source direction moves away from the target sound source direction d. Thereafter, the process proceeds to step S304. [0093] (Step S304) The second transfer function calculation unit 109 calculates the first transfer function A [d1] [n] (ω), A [d2] [n] (ω) represented by the read first transfer function information in the time domain. The transfer function Am [T] (ω) is calculated (interpolated) using a time domain linear interpolation (TDLI) method. When calculating the transfer function Am [T] (ω), the second transfer function calculation unit 109 uses Equation (12). [0094] [0095] That is, the transfer function Am [T] is the target sound source of the first transfer function A [d1] [n] (ω), A [d2] [n] (ω) corresponding to the two reference sound source directions d1 and d2, respectively. It is the geometric mean which made the internal division ratio by direction d a 11-04-2019 30 weighting factor. Thereafter, the process proceeds to step S305. [0096] (Step S305) The second transfer function calculation unit 109 decomposes the calculated transfer function Am [F] (ω) into an absolute value λm [F] and a phase tm [F], and the transfer function Am [T] (ω). Is decomposed into an amplitude (absolute value) λm [T] and a phase tm [T]. The transfer function Am [F] (ω), the amplitude λm [F] and the phase tm [F] have the relationship shown in equation (13). [0097] [0098] The transfer function Am [T] (ω), the amplitude λm [T] and the phase tm [T] have the relationship shown in equation (14). [0099] [0100] Thereafter, the process proceeds to step S306. [0101] (Step S306) The second transfer function calculation unit 109 multiplies the amplitude λm [T] by the TDLI method by the phase tm [F] by the FDLI method, as shown in equation (15), to correspond to the target sound source direction d. The second transfer function A [d] [n] (ω) is calculated. [0102] 11-04-2019 31 [0103] Thereafter, the process shown in FIG. 8 is ended. The magnitude of the amplitude of the target sound source direction according to the TDLI method is between the magnitudes of the respective amplitudes of the two reference sound source directions. On the other hand, the value of the phase of the target sound source direction according to the TDLI method is not necessarily between the values of the phases of the two reference sound source directions. On the other hand, the magnitude of the amplitude of the target sound source direction according to the FDLI method is not necessarily between the magnitudes of the respective amplitudes of the two reference sound source directions. On the other hand, the phase value of the target sound source direction according to the FDLI method is intermediate between the respective phase values of the two reference sound source directions. In the FTDLI method, a second transfer function is constructed from the amplitude by the TDLI method and the phase by the FDLI method. The magnitude and phase value of the constructed second transfer function both fall between the two reference sound source directions. Therefore, interpolation characteristics can be improved by using the FTDLI method. [0104] As described above, since the sound source direction stored in the transfer function storage unit 108 is irregular, the distribution of the sound source direction may be biased within a narrow range. Therefore, the second transfer function calculation unit 109 generates a sound source 11-04-2019 32 related to at least one sound source direction for each of 360 / Md ° divided regions obtained by equally dividing an angle (360 °) for one rotation by a predetermined division number Md The interpolation processing of the second transfer function may be performed only when the direction information and the first transfer function information are stored. The division number Md is at least 3 or more, preferably 6 or more. Thus, the second transfer function calculation unit 109 can determine that the sound source directions stored in the transfer function storage unit 108 as candidates for the reference sound source direction are uniformly distributed in all directions. Since the second transfer function calculation unit performs the interpolation process of the second transfer function after the determination, the accuracy of the calculated second transfer function can be secured. [0105] If the reference sound source direction d2 having a larger value than the target sound source direction d can not be found in the above-described step S301, the second transfer function calculation unit 109 can be obtained by referring to the transfer function storage unit 108. A process of specifying the reference sound source direction d2 from the sound source direction obtained by adding an angle (360 °) for one rotation to the sound source direction may be executed. If the reference sound source direction d1 having a smaller value than the target sound source direction d can not be found, the second transfer function calculation unit 109 refers to the sound source direction obtained by referring to the transfer function storage unit 108 A process of specifying the reference sound source direction d1 from the sound source direction obtained by subtracting the angle (360 °) of the above may be executed. Then, in step S303, the second transfer function calculation unit 109 calculates interpolation coefficients δ2 and δ1 based on the specified reference sound source direction d2 or the reference sound source direction d1. Thus, even when two reference sound source directions sandwiching the target sound source direction d cross 0 ° (there is a change in phase of 360 °), it is possible to determine an appropriate reference sound source direction. [0106] (Sound Processing) Next, sound processing according to the present embodiment will be described. FIG. 10 is a flowchart showing acoustic processing according to the present embodiment. (Step S401) The peak detection unit 103 detects the peak of the signal value indicated by the acoustic signal of any channel input from the signal input unit, and the acoustic signal within a predetermined time from the sample time when the peak is detected For each channel. Thereafter, the process proceeds to step S402. 11-04-2019 33 [0107] (Step S402) The time difference calculation unit 104 calculates a time difference for each channel pair for the extracted acoustic signal of the N + 1 channel, and generates time difference information indicating the calculated time difference for each channel pair. Thereafter, the process proceeds to step S403. (Step S403) The sound collection position calculation unit 105 calculates the sound collection position based on the time difference information. Thereafter, the process proceeds to step S404. [0108] (Step S404) The sound source direction calculation unit 106 calculates the sound source direction based on the time difference information and the sound collection position indicated by the sound source state information obtained in the process of calculating the sound collection position by the sound collection position calculation unit 105. Then, it progresses to step S405. [0109] (Step S405) The first transfer function calculation unit 107 calculates the first transfer function A [d] [n] for each target channel based on the sound signal of each target channel and the sound signal of the representative channel, and the sound source direction The sound source direction information indicating “H” and the first transfer function information indicating the calculated first transfer function A [d] [n] are stored in the transfer function storage unit 108 in association with each other. Thereafter, the process proceeds to step S406. [0110] (Step S406) The second transfer function calculation unit 109 specifies two reference sound source directions sandwiching the target sound source direction, and transmits first transfer function information corresponding to each of the specified two reference sound source directions from the transfer function storage unit 108. read out. The second transfer function calculation unit 109 interpolates the first transfer function indicated by the read first transfer 11-04-2019 34 function information by a reciprocal of an internal division ratio that internally divides the intervals between the respective reference sound source directions in the target sound source direction. 2 Calculate the transfer function. The second transfer function calculation unit 109 refers to the first transfer function data stored in the transfer function storage unit 108, and based on the sound source direction indicated by the sound source direction information and the target sound source direction (target sound source direction), The first transfer functions respectively corresponding to the sound source direction information are interpolated. A second transfer function corresponding to the target sound source direction is calculated by interpolation. Thereafter, the process shown in FIG. 10 is ended. [0111] As described above, the sound processing device 10 according to the present embodiment calculates the sound collection position of the sound signal based on the sound signals of the plurality of channels; And a sound source direction calculation unit that calculates a sound source direction based on the signal. In addition, the sound processing apparatus 10 calculates a first transfer function corresponding to the sound source direction based on the sound signals of the plurality of channels, and a first transfer function calculation unit 107 corresponding to each of the plurality of sound source directions. And a second transfer function calculation unit 109 that calculates a second transfer function by interpolating the function. With this configuration, a set of the sound source direction and the first transfer function is obtained based on the collected sound signal, and a second transfer function relating to the desired sound source direction is obtained with reference to the obtained first transfer function. A transfer function is calculated. Therefore, it is possible to calculate the transfer function in a desired direction according to the indoor environment without using the sound source for measurement. [0112] Further, the sound processing apparatus 10 according to the present embodiment includes the time difference calculation unit 104 that calculates the time difference of sound signals between channels. In the sound processing apparatus 10, the sound collection position calculation unit 105 is a sound source state information including the sound collection position, and a state prediction unit 1052 that predicts current sound source state information from past sound source state information; The state updating unit 1051 updates the current sound source state information so that the difference between the time difference calculated by the step 104 and the time difference based on the current sound source state information decreases. According to this configuration, the sound collection position can be sequentially calculated based on the collected 11-04-2019 35 sound signal, so that the sound collection position at that time can be obtained without using other measurement means. [0113] Further, in the sound processing device 10 according to the present embodiment, the time difference calculation unit 104 calculates the time difference of sound signals between channels whose arrangement between the sound collection positions is within a predetermined range. With this configuration, since the time difference between the adjacent sound collection positions is calculated, the fluctuation of the calculated time difference is suppressed. Therefore, since the sound collection position calculation unit can stably estimate the sound source state information performed based on the calculated time difference, the sound collection position can be calculated with high accuracy. [0114] Further, in the sound processing apparatus 10 according to the present embodiment, the sound source direction calculation unit 106 predicts current sound source state information from sound source state information in the past, which is sound source state information including sound source position; The state updating unit 1061 updates the current sound source state information so that the difference between the time difference calculated by the time difference calculating unit 104 and the time difference based on the current sound source state information decreases. According to this configuration, the sound source direction can be sequentially calculated based on the collected sound signal, so that the sound source direction at that time can be obtained without using other measurement means. [0115] Second Embodiment The second embodiment of the present invention will be described below with reference to the drawings. The same components as those of the embodiment described above are denoted by the same reference numerals, and the above description is incorporated. FIG. 11 is a schematic block diagram showing the configuration of the sound processing system 1A according to the present embodiment. The sound processing system 1A is configured to include the sound collection unit 11 and the sound processing device 10A. The sound processing apparatus 10A includes a signal input unit 102, a peak detection unit 103, a time difference 11-04-2019 36 calculation unit 104, a sound collection position calculation unit 105, a sound source direction calculation unit 106A, a first transfer function calculation unit 107A, a transfer function storage unit 108, and The second transfer function calculation unit 109 is included. That is, in the sound processing apparatus 10 (FIG. 1), the sound processing apparatus 10A replaces the sound source direction calculation section 106 (FIG. 1) and the first transfer function calculation section 107 (FIG. 1). A transfer function calculation unit 107A is provided. [0116] The sound source direction calculation unit 106A has the same configuration as the sound source direction calculation unit 106, and performs the same processing. However, time difference information relating to an acoustic signal at a time delayed by at least the delay time TA from the time difference information input to the sound collection position calculation unit 105 is input to the sound source direction calculation unit 106A. The delay time TA is a predetermined time which is longer than the convergence time of the estimation error of the sound source state information ξ k calculated by the sound collection position calculation unit 105. The “time delayed by at least the delay time TA” means that it is after the delay time TA or later than a certain time. This is because the peak detection unit 103 does not necessarily detect the next peak after the delay time TA from the time when one peak is detected. The sound source direction calculation unit 106A calculates the sound source direction d by using this time difference information instead of the same time difference information as the time difference information input to the sound collection position calculation unit 105. The sound source direction calculation unit 106A outputs sound source direction information indicating the calculated sound source direction d to the first transfer function calculation unit 107A. [0117] The first transfer function calculation unit 107A has the same configuration as the first transfer function calculation unit 107, and performs the same processing. However, the acoustic signal input to the first transfer function calculation unit 107A is an acoustic signal of N + 1 channel according to the time difference information input to the sound source direction calculation unit 106A, and the time difference information input to the sound collection position calculation unit 105 Is an acoustic signal at a time delayed by at least a delay time TA from the acoustic signal according to. The first transfer function calculation unit 107A calculates a first transfer function A [d] [n] for each target channel based on the input acoustic signal. The first transfer function calculation unit 107A associates the sound source direction information input from the sound source direction calculation unit 106A with the first transfer function information indicating the 11-04-2019 37 calculated first transfer function A [d] [n] to obtain a transfer function. It is stored in the storage unit 108. [0118] (Sound Processing) Next, sound processing according to the present embodiment will be described. FIG. 12 is a flowchart showing acoustic processing according to the present embodiment. The sound processing illustrated in FIG. 12 includes steps S401 to S403, S404A, S405A, and S406. Therefore, the sound processing apparatus 10A proceeds to step S404A after performing steps S401 to S403. [0119] (Step 404A) The sound source direction calculation unit 106A receives time difference information relating to the sound signal at a time delayed by at least the delay time TA from the sound signal relating to the time difference information input to the sound collection position calculation unit 105. The sound source direction calculation unit 106A calculates the sound source direction based on the time difference information and the sound collection position indicated by the sound source state information obtained in the process of calculating the sound collection position by the sound collection position calculation unit 105. Thereafter, the process proceeds to step S405A. (Step 405A) The first transfer function calculation unit 107A receives an acoustic signal at a time delayed by at least the delay time TA from the acoustic signal related to the time difference information input to the sound collection position calculation unit 105. The first transfer function calculation unit 107A calculates a first transfer function A [d] [n] for each target channel, and sound source direction information indicating a sound source direction, and the calculated first transfer function A [d] [n] Are associated with each other and stored in the transfer function storage unit 108. Thereafter, the process proceeds to step S406. [0120] As described above, in the sound processing device 10A according to the present embodiment, the sound source direction calculation unit 106A receives at least a predetermined delay time (for example, TA) than the time difference information input to the sound collection position calculation unit 105. The time difference information of the time delayed by only the time difference information of the time delayed by at least the delay time from the acoustic signal 11-04-2019 38 according to the time difference information input to the sound collection position calculation unit 105 is input to the first transfer function calculation unit 107A. Is input. With this configuration, the processing performed by the sound collection position calculation unit 105 and the processing performed by the sound source direction calculation unit 106A and the first transfer function calculation unit 107A can be parallelized. Therefore, since the delay time until the estimation error of the sound source state information converges in the collected sound position calculation unit 105 does not reach the sound source direction calculation unit 106A and the first transfer function calculation unit 107A, the sound source direction and the first transfer function It can be obtained more quickly. [0121] Third Embodiment Hereinafter, a third embodiment of the present invention will be described with reference to the drawings. The same components as those of the embodiment described above are denoted by the same reference numerals, and the above description is incorporated. FIG. 13 is a schematic block diagram showing a configuration of the sound processing system 1B according to the present embodiment. The sound processing system 1B is configured to include the sound collection unit 11 and the sound processing device 10B. The sound processing apparatus 10B includes a signal input unit 102, a peak detection unit 103, a time difference calculation unit 104, a sound collection position calculation unit 105, a sound source direction calculation unit 106B, a first transfer function calculation unit 107, a transfer function storage unit 108, and 2 configured to include a transfer function calculation unit 109B. That is, in the sound processing apparatus 10 (FIG. 1), the sound processing apparatus 10B replaces the sound source direction calculation section 106 (FIG. 1) and the second transfer function calculation section 109 (FIG. 1). 2 A transfer function calculation unit 109B is provided. [0122] The sound source direction calculation unit 106B further includes a reliability determination unit 1066B in the sound source direction calculation unit 106. When the sound source direction information is input from the convergence determination unit 1065, the reliability determination unit 1066B receives, from the state update unit 1061, the prediction residual (ζ k −ζ k | k−1) of the observed value vector at the current time k. The absolute value | ζk−ζk | k−1 | of the input prediction residual is determined as the reliability w. The higher the value of the reliability value w, the lower the reliability of the sound source direction d calculated by the sound source direction calculation unit 106B, and the smaller the value, the higher the reliability of the sound source direction d. 11-04-2019 39 [0123] The reliability determination unit 1066 B associates the input sound source direction information with the reliability information indicating the reliability w when the reliability w is smaller than a predetermined threshold wth of the reliability, and the first transfer function calculation unit Output to 107. The reliability determination unit 1066 B rejects the input sound source direction information and the reliability information indicating the reliability w without outputting when the reliability w is equal to or larger than a predetermined threshold wth of the reliability. Thereby, the sound source direction information, the reliability information, and the first transfer function information are stored in association with each other in the transfer function storage unit 108 to form first transfer function data. The reliability determination unit 1066B receives the update amount of the sound source state information ξk at the current time k, that is, Kk (ζk-ζk | k-1), and determines the absolute value of the input update amount as the reliability w. May be [0124] (Example of First Transfer Function Data) Next, an example of the first transfer function data stored in the transfer function storage unit 108 will be described. FIG. 14 is a diagram showing an example of first transfer function data according to the present embodiment. In the example shown in FIG. 14, the sound source direction d indicated by the sound source direction information, the reliability w indicated by the reliability information, and the transfer functions A [d] [1] (ω of the channels 1 to 7 indicated by the first transfer function information) , A [d] [2] (ω),... A [d] [7] (ω) are associated with each other. For example, in the sound source direction 13 ° shown in the second row of FIG. 14, the reliability of 0.186, channel 1 transfer function 0.24 + 0.35 j, channel 2 transfer function 0.44-0.08 j, channel 3 Transfer function 0.40 + 0.29j, transfer function of channel 4 0.18 + 0.51j, transfer function of channel 5 -0.37 + 0.32j, transfer function of channel 6 -0.14 + 0.48j, and transfer function 0 of channel 7 .15 + 0.29 j are associated. In fact, the transfer function of each channel is given for each of L / 2 frequencies ω for each sound source direction d, but only one of L / 2 is shown in FIG. There is. [0125] Returning to FIG. 13, the second transfer function calculation unit 109B determines the 11-04-2019 40 weighting factor based on the reliability corresponding to each of the two reference sound source direction information, and determines the determined weighting factor as a target sound source direction between the two reference sound source directions. The interpolation factor is determined by multiplying the reciprocal of the internal division ratio internally divided by. The second transfer function calculation unit 109B calculates a second transfer function by interpolating the first transfer function corresponding to each of the two pieces of reference sound source direction information based on the determined interpolation coefficient. [0126] Specifically, after the second transfer function calculating unit 109B specifies the reference sound source direction based on the target sound source direction d (FIG. 8, step S301), the first transfer function corresponding to each of the specified reference sound source direction information Information and reliability information are read out from the transfer function storage unit 108. The second transfer function calculation unit 109B determines the weighting factors v1 and v2 based on the reliabilities w1 and w2 indicated by the two pieces of read reliability information. The reliability w1 and w2 are reliability corresponding to the reference sound source directions d1 and d2, respectively. The weighting factors v1 and v2 may be positive real numbers that decrease as the absolute values of the reliabilities w1 and w2 increase, and increase as the absolute values of the reliabilities w1 and w2 decrease. The weighting factors v1 and v2 can be determined, for example, as shown in equation (16). [0127] [0128] In equation (16), ε is a predetermined positive real number to prevent division by zero. The second transfer function calculation unit 109B calculates the weight coefficients v1 and v2 determined as shown in equation (17) by using the reciprocal of the internal division ratio by the target sound source direction d between the two reference sound source directions | (d2-d1) / (D−d1) | and | (d2−d1) / (d2−d) | are respectively multiplied to calculate multiplication values D1 and D2. 11-04-2019 41 [0129] [0130] As shown in equation (18), the second transfer function calculation unit 109B normalizes the respective multiplied values D1 and D2 by the sum total D1 + D2 to determine interpolation coefficients δ1 and δ2. [0131] [0132] That is, the interpolation coefficients δ1 and δ2 take larger values as the reliability w of the reference sound source directions d1 and d2 decreases. The interpolation coefficients δ1 and δ2 take larger values as they approach the target sound source direction d of the reference sound source directions d1 and d2, respectively. The second transfer function calculation unit 109B uses the determined interpolation coefficients δ1 and δ2 to determine first transfer functions A [d1] [n] (ω) and A [d2] [A corresponding to the reference sound source directions d1 and d2, respectively. The second transfer function A [d] [n] (ω) is calculated by interpolating n] (ω). When calculating the second transfer function A [d] [n] (ω), the second transfer function calculation unit 109B performs the process shown in steps S303 to S306 (FIG. 8). [0133] (Sound Processing) Next, sound processing according to the present embodiment will be described. FIG. 15 is a flowchart showing acoustic processing according to the present embodiment. The sound processing illustrated in FIG. 15 includes steps S401 to S404, S407B, S405, and S406B. After executing steps S401 to S404, the sound processing apparatus 10B proceeds to step S407B. 11-04-2019 42 [0134] (Step S407B) The reliability determination unit 1066B determines the reliability w based on the prediction residual (ζk−ζk | k−1), and determines whether the determined reliability w is smaller than a predetermined threshold wth of the reliability It is determined whether or not. The reliability determination unit 1066B associates the input sound source direction information with the reliability information indicating the reliability w when the determined reliability w is smaller than the threshold wth (the reliability with respect to the sound source direction d is high). , And output to the first transfer function calculation unit 107. Then, it progresses to step S405. After step S405 ends, the sound processing apparatus 10B proceeds to step S406B. [0135] (Step S406B) The second transfer function calculation unit 109B specifies two reference sound source directions sandwiching the target sound source direction, and transfers the first transfer function information and the reliability information respectively corresponding to the specified two reference sound source directions. It is read from the storage unit 108. The second transfer function calculation unit 109B determines the weighting factors v1 and v2 based on the reliability corresponding to each of the two reference sound source direction information, and divides the determined weighting factor between the two reference sound source directions in the target sound source direction The multiplication values D1 and D2 obtained by multiplying the reciprocal of the internal division ratio to be calculated are normalized to determine interpolation coefficients δ1 and δ2. The second transfer function calculation unit 109B calculates the second transfer function by interpolating the first transfer function indicated by the read first transfer function information with each interpolation coefficient. Thereafter, the process shown in FIG. 15 is ended. [0136] As described above, in the sound processing apparatus 10B according to the present embodiment, the second transfer function calculation unit 109B updates the first transfer function calculated by the first transfer function calculation unit 107 with the state update unit 1061. It interpolates by weighting based on the update amount of sound source state information. With this configuration, the second transfer function in which the first transfer 11-04-2019 43 function related to the sound source direction is interpolated is calculated by the weight based on the update amount of the sound source state information used for calculating the sound source direction. Since the reliability of the sound source direction calculated by the sound source direction calculation unit 106B depends on the update amount of the sound source state information, the reliability of the calculated second transfer function is improved. [0137] Fourth Embodiment Hereinafter, a fourth embodiment of the present invention will be described with reference to the drawings. The same components as those of the embodiment described above are denoted by the same reference numerals, and the above description is incorporated. FIG. 16 is a schematic block diagram showing the configuration of the sound processing system 1C according to the present embodiment. The sound processing system 1C is configured to include the sound collection unit 11 and the sound processing device 10C. The sound processing apparatus 10C includes a signal input unit 102, a peak detection unit 103, a time difference calculation unit 104, a sound collection position calculation unit 105C, a sound source direction calculation unit 106B, a first transfer function calculation unit 107, a transfer function storage unit 108, a second A transfer function calculation unit 109B and a second sound source direction calculation unit 110C are included. That is, in the sound processing device 10B (FIG. 13), the sound processing device 10C includes a sound collection position calculation unit 105C instead of the sound collection position calculation unit 105, and further includes a second sound source direction calculation unit 110C. In the following description, the sound source direction calculation unit 106B may be referred to as a first sound source direction calculation unit to be distinguished from the second sound source direction calculation unit 110C. [0138] The second sound source direction calculation unit 110C calculates the second sound source direction d ′ based on the second transfer function information input from the second transfer function calculation unit 109B and the acoustic signal of the N + 1 channel input from the peak detection unit 103. calculate. The second sound source direction calculation unit 110C calculates the second sound source direction d 'using, for example, the MUSIC (Multiple Signal Classification) method. Specifically, the second sound source direction calculation unit 110C inputs, for each channel, the second transfer function for each sound source direction d distributed at a predetermined interval (for example, 1 °), and the second transfer function for each channel n A transfer function vector D (d) having A [d] [n] (ω) as an element is generated for each sound source direction d. Here, the second sound source direction calculation unit 110C 11-04-2019 44 converts the acoustic signal xn of each channel n into a frequency domain for each frame made of a predetermined number of samples, calculates a conversion factor Xn (ω), and calculates from the calculated conversion factor The input correlation matrix Rxx is calculated as shown in equation (19). [0139] [0140] In Expression (19), E [...] indicates the expected value of .... [X] is an N + 1 dimensional vector whose elements are transform coefficients of each channel. [...] <*> indicates the conjugation of a matrix or a vector. Next, the second sound source direction calculation unit 110C calculates the eigenvalues δi and the eigenvectors ei of the input correlation matrix Rxx. The input correlation matrix Rxx, the eigenvalues δi, and the eigenvectors ei have the relationship shown in equation (20). [0141] [0142] In Formula (20), i is an integer of 1 or more and N + 1 or less. The order of the index i is the descending order of the eigenvalues δi. The second sound source direction calculation unit 110C calculates a space spectrum Psp (d) shown in Expression (21) based on the transfer function vector D (d) and the calculated eigenvector ei. [0143] [0144] 11-04-2019 45 In Equation (21), K is the number of detectable sound sources (for example, 1) and is a predetermined natural number smaller than N. The second sound source direction calculation unit 110C calculates the sum of the space spectrum Psp (d) in a frequency band where the S / N ratio is larger than a predetermined threshold (for example, 20 dB) as the expanded space spectrum Pext (d). The second sound source direction calculation unit 110C defines the direction d taking the maximum value of the calculated extended space spectrum Pext (d) as the second sound source direction d '. The second sound source direction d ′ is a sound source direction calculated based on the acoustic signal of the N + 1 channel. The second sound source direction calculation unit 110C outputs second sound source direction information indicating the determined second sound source direction d ′ to the sound collection position calculation unit 105C. [0145] Similar to the sound collection position calculation unit 105, the sound collection position calculation unit 105C includes a state updating unit 1051, a state prediction unit 1052, a Kalman gain calculation unit 1054, and a convergence determination unit 1055. The sound collection position calculation unit 105C predicts the sound source state information ξk | k−1 and the sound source state information ξk based on the time difference information input from the time difference calculation unit 104 as in the sound collection position calculation unit 105 at a certain time. Update the However, the sound collection position calculation unit 105C predicts the sound source state information ξk | k−1 and the sound source state information ξk based on the second sound source direction information input from the second sound source direction calculation unit 110C at another time. Make an update. In other words, the sound collection position calculation unit 105C sets the second sound source direction d ′ as the observation value ζ′k, and the sound source state information ξk | k−1 so that the estimation error of the observation value ζ′k | k−1 decreases. , Ξ k are calculated. [0146] The second sound source direction d ′ has a relationship shown in Expression (22) between the sound source position (xk, yk) and the gravity center point (mc, x, mc, y) of the sound collection position. [0147] 11-04-2019 46 [0148] In equation (22), the center of gravity point (mc, x, mc, y) is an average value between channels of the sound collection position (mn, x, mn, y) of each channel, so The predicted observed value ζk | k−1 ′ of the predicted current time k, that is, the predicted value of the second sound source direction d ′ is calculated from the sound source state information ζk | k−1. Therefore, the state updating unit 1051, the state predicting unit 1052, and the Kalman gain calculating unit 1054 of the sound collection position calculation unit 105C respectively observe the observation value vector ζk | k−1, ζk described above at another time. Calculating the sound source state information | k | k−1, よ う k so that the estimation error of the observed value k′k | k−1 is reduced by performing processing in place of “k | k−1, ζ′k” Can. [0149] Specifically, at another time, the state prediction unit 1052 observes the second sound source direction d ′ given by Equation (22) based on the sound source state information ξk | k−1 at the current time k as the observed value ζ′k Calculated as | k−1. The Kalman gain calculation unit 1054 calculates each element of the matrix Hk by partially differentiating the observed value ζ'k | k-1 with each element of the sound source state information ξk | k-1. The state updating unit 1051 adds the observation error δ′k to the observation value ζ′k, and updates the observation value ζ′k to the addition value obtained by the addition. In addition, the state updating unit 1051 adds Kalman to the predicted residual (ζ'k−ζ'k | k−1) of the observed value at the current time k, in the predicted sound source state information ξk | k−1 at the current time k. The multiplication value vector obtained by multiplying the gain Kk is added to calculate the sound source state information ξ k at the current time k (equation (3)). [0150] Similar to the collected sound position calculation unit 105, the collected sound position 11-04-2019 47 calculation unit 105C calculates sound source state information ξk | k−1, ξk based on the observed value vector ζk (hereinafter referred to as processing related to the observed value vector ζk) And the process of calculating the sound source state information ξk | k−1, ξk based on the observed value ζ′k as described above (hereinafter referred to as the process relating to the observed value ζ′k) alternately repeated It is also good. However, if the processing relating to the observed value vector 行 k and the processing relating to the observed value ζ ′ k are performed at different times, the collected sound position calculation unit 105 C is not limited to this. The sound collection position calculation unit 105C may repeat a cycle of performing the process related to the observation value vector 'k a predetermined number of times N' times after performing the process related to the observation value vector 'k N' times. Here, N 'and N' 'are each one or more predetermined integers. N 'and N' 'may be equal or different. [0151] Next, sound processing according to the present embodiment will be described. FIG. 17 is a flowchart showing acoustic processing according to the present embodiment. The sound processing illustrated in FIG. 17 includes steps S401, S402, S403C, S404, S407B, S405, S406B, and S408C. After executing steps S401 and S402, the sound processing apparatus 10C proceeds to step S403C. [0152] (Step S403C) The sound collection position calculation unit 105C performs prediction of the sound source state information ξk | k−1 and update of the sound source state information ξk using time difference information as observation value information at a certain time (for example, odd sample time). . The sound collection position calculation unit 105C performs prediction of the sound source state information ξk | k−1 and update of the sound source state information ξk using the second sound source direction information as observation value information at another time (for example, even sample time). . By repeating these processes, the sound collection position calculation unit 105C calculates a sound collection position. Thereafter, the sound processing apparatus 10C executes steps S404, S407B, S405, and S406B. Thereafter, the process proceeds to step S408C. (Step S408C) The second sound source direction calculation unit 110C generates a second sound source based on the second transfer function information input from the second transfer function calculation unit 109B and the acoustic signal of the N + 1 channel input from the peak detection unit 103. The direction d ′ is calculated to generate second sound source direction information. Thereafter, the process shown in FIG. 17 is ended. 11-04-2019 48 [0153] As described above, the sound processing apparatus 10C according to the present embodiment calculates the sound source direction based on the second transfer function calculated by the second transfer function calculation unit 109B and the sound signals of the plurality of channels. And a direction calculation unit 110C. The state updating unit 1061 updates the current sound source state information so that the difference between the sound source direction calculated by the second sound source direction calculating unit 110C and the sound source direction based on the current sound source state information decreases. With this configuration, the sound source state information is updated based on the sound source direction which is information different from the time difference, so that the possibility of falling into the local solution can be reduced compared to the case where only one of the time difference and the sound source direction is used. . Index values (e.g., squared errors) for evaluating the magnitude of the difference between the sound source direction calculated by the second sound source direction calculation unit 110C and the sound source direction based on the current sound source state information generally have a plurality of local minimum values. However, by updating the sound source state information based on different information, it is avoided that the sound source state information converges to a specific minimum value. Therefore, the sound collection position indicated by the sound source state information can be calculated with higher accuracy. [0154] Fifth Embodiment Hereinafter, a fifth embodiment of the present invention will be described with reference to the drawings. The same components as those of the embodiment described above are denoted by the same reference numerals, and the above description is incorporated. FIG. 18 is a schematic block diagram showing the configuration of the sound processing system 1D according to the present embodiment. The sound processing system 1D is configured to include the sound collection unit 11 and the sound processing device 10D. The sound processing apparatus 10D includes a signal input unit 102, a peak detection unit 103, a time difference calculation unit 104, a sound collection position calculation unit 105, a sound source direction calculation unit 106D, a first transfer function calculation unit 107, a transfer function storage unit 108, and The second transfer function calculation unit 109 is included. That is, the sound processing system 1D includes a sound source direction calculation unit 106D instead of the sound source direction calculation unit 106 in the sound processing device 10 (FIG. 1). 11-04-2019 49 [0155] The sound source direction calculation unit 106D includes a third transfer function calculation unit 1068D and a first sound source direction determination unit 1069D. The sound source state information is input from the sound collection position calculation unit 105 to the third transfer function calculation unit 1068D. The third transfer function calculation unit 1068D determines the third transfer function A [d of each channel n for each of the sound source directions d distributed at predetermined intervals based on the propagation model that gives the propagation characteristic from the sound source to the sound collection position. ] [N] (ω) is calculated. The sound collection position (mn, x, mn, y) of each channel is given by the input sound source state information. Also, the distance from the sound source to the center of gravity of the sound collection position may be determined in advance. [0156] The propagation model may be any model that provides a transfer function that represents propagation characteristics depending on the sound source direction and the sound collection position. The propagation model is, for example, a plane wave model. The transfer function A (r, d) by the plane wave model shows the change of the phase according to the delay due to the propagation from the sound source to the sound collecting position apart by the distance r, and it is assumed that the amplitude is constant. There is. The transfer function A (r, d) by the plane wave model is given by equation (23). [0157] [0158] In equation (23), k is the wave number and is equal to ω / c. The third transfer function calculation unit 1068D uses a predetermined value as the distance r. The third transfer function calculation unit 1068D outputs, to the first direction determination unit 1062D, third transfer function information indicating the third transfer function A [d] [n] (ω) of each channel n calculated for each sound source direction d. Do. 11-04-2019 50 [0159] The third transfer function calculation unit 1068D may calculate the third transfer function A [d] [n] (ω) using a spherical wave model. The transfer function A (r, d) based on the spherical wave model is such that the change in phase according to the delay due to propagation between the sound source and the sound collection position apart by a distance r Represents to do. The transfer function A (r, d) by the spherical wave model is given by equation (24). [0160] [0161] In equation (24), r0 is a predetermined positive real number. r0 is, for example, the radius of an object (that is, a sound source) that generates a sound. [0162] The first sound source direction determining unit 1069D calculates the sound source direction d based on the third transfer function information input from the third transfer function calculating unit 1068D and the acoustic signal of the N + 1 channel input from the peak detecting unit 103. The first sound source direction determination unit 1069D calculates the sound source direction d using the above-described MUSIC method, and outputs sound source direction information indicating the calculated sound source direction d to the first transfer function calculation unit 107. [0163] Next, sound processing according to the present embodiment will be described. FIG. 19 is a flowchart showing acoustic processing according to the present embodiment. The sound processing illustrated in FIG. 19 includes steps S401 to S403, S409D, S404D, S405, and S406. 11-04-2019 51 After executing steps S401 to S403, the sound processing apparatus 10D proceeds to step S409D. (Step S409D) Third transfer function calculation unit 1068D indicates a third transfer function A [d] indicating a phase change due to propagation to the sound collection position of each channel indicated by the sound source state information input from sound collection position calculation unit 105. [N] (ω) is calculated for each sound source direction. Thereafter, the process proceeds to step S404D. (Step S404D) The first sound source direction determining unit 1069D generates a sound source direction d based on the third transfer function information input from the third transfer function calculating unit 1068D and the acoustic signal of the N + 1 channel input from the peak detecting unit 103. Calculate Thereafter, the sound processing device 10D executes steps S405 and S406. [0164] As described above, in the sound processing device 10D according to the present embodiment, the sound source direction calculation unit 106D calculates the third transfer function indicating the phase change due to propagation to the sound collection position calculated by the sound collection position calculation unit 105. A first sound source direction determination unit 1069D that determines a sound source direction based on a third transfer function calculation unit 1068D calculated for each sound source direction and a third transfer function calculated by the third transfer function calculation unit 1068D and acoustic signals of a plurality of channels. And. With this configuration, the third transfer function can be calculated by a simple process, and the sound source direction can be determined based on the phase change for each sound source direction at each sound collection position indicated by the calculated third transfer function. Therefore, the processing amount can be reduced without losing the estimation accuracy of the sound source direction. [0165] Sixth Embodiment Hereinafter, a sixth embodiment of the present invention will be described with reference to the drawings. The same components as those of the embodiment described above are denoted by the same reference numerals, and the above description is incorporated. FIG. 20 is a schematic block diagram showing the configuration of the sound processing system 1E according to this embodiment. The sound processing system 1E is configured to include the sound collection unit 11 and the sound processing device 10E. The sound processing device 10E includes a signal input unit 102, a peak detection unit 103, a time difference calculation unit 104, a sound collection position calculation unit 105, and a third transfer function calculation unit 1068D. 11-04-2019 52 [0166] The sound processing device 10E executes the sound processing shown in FIG. 21 with this configuration. FIG. 21 is a flowchart showing acoustic processing according to the present embodiment. The sound processing shown in FIG. 21 has steps S401 to S403 and step S409D. After executing steps S401 to S403, the sound processing apparatus 10E executes step S409D, and thereafter ends the process shown in FIG. [0167] As described above, the sound processing device 10E according to the present embodiment calculates the sound collection position corresponding to each of the plurality of channels based on the sound signals of the plurality of channels. Further, in the sound processing device 10E, the third transfer function calculation unit 1068D indicates a third phase change for each sound source direction by propagation to each sound collection position indicated by the sound source state information calculated by at least the sound collection position calculation unit 105. Calculate transfer function. With this configuration, the sound collection position is sequentially estimated, and it is possible to calculate the third transfer function for each sound source direction by propagation to each estimated sound collection position by simple processing. [0168] (Modifications) Although the embodiment of the present invention has been described above, the specific configuration is not limited to the above, and various design changes can be made without departing from the scope of the present invention. It is possible. For example, the sound source direction calculation unit 106B (FIG. 13) or the sound source direction calculation unit 106D (FIG. 18) and the first transfer function calculation unit 107 (FIG. 13 and FIG. 18) are the same as the sound processing device 10A (FIG. 11). Alternatively, time difference information may be input to the sound signal at a time delayed by at least a predetermined delay time TA relative to the sound signal relating to the time difference information input to the sound collection position calculation unit 105. Further, the sound source direction calculation unit 106B of the sound processing apparatus 10C (FIG. 16) and the first transfer function calculation unit 107 (FIG. 16) receive the sound related to the time difference information input to the sound collection position calculation unit 105C (FIG. 16). Time difference information relating to an 11-04-2019 53 acoustic signal at a time delayed by at least a predetermined delay time TA from the signal may be input. [0169] The sound processing devices 10B (FIG. 13) and 10C (FIG. 16) may include a sound source direction calculation unit 106D (FIG. 18) instead of the sound source direction calculation unit 106B. Further, in the sound processing devices 10B (FIG. 13) and 10C (FIG. 16), the sound collection position calculation units 105 and 105C include the reliability determination unit 1066B, and instead, the sound source direction calculation unit 106B (FIGS. 13 and 16). , And the reliability determination unit 1066B may be omitted. The reliability determination unit 1066B provided in the sound collection position calculation unit 105, 105C updates the absolute value | 絶 対 k−ζk | k−1 | of the prediction residual input from the state update unit 1051 or the update amount of the sound source state information ξk. The absolute value | K k (ζ k −ζ k | k−1) | Then, when the reliability determination unit 1066 B is smaller than a predetermined threshold value wth of the reliability, the reliability determination unit 1066 associates the reliability information indicating the reliability w and outputs the information to the first transfer function calculation unit 107. Further, when the reliability determination unit 1066 B outputs the reliability information, the sound source direction calculation unit 106 B (in the case where the sound source direction calculation unit 106 D is provided instead of the sound source direction calculation unit 106 D) The sound source direction information may be output to the first transfer function calculation unit 107, and may be associated with the sound source direction information and the reliability information. Also, the sound source direction calculation unit 106B of the sound processing devices 10B (FIG. 13) and 10C (FIG. 16) is the second sound source calculated by the second sound source direction calculation unit 110C as in the sound collection position calculation unit 105C (FIG. 16). The sound source state information ξk | k−1 and the sound source state information ξk may be calculated such that the direction d ′ is the observation value ζ′k and the estimation error of the observation value ζ′k | k−1 decreases. Further, the sound processing device 10D (FIG. 18) includes a sound collection position calculation unit 105C (FIG. 16) instead of the sound collection position calculation unit 105, and further includes a second sound source direction calculation unit 110C (FIG. 16). It is also good. In that case, the second sound source direction calculation unit 110C calculates the second sound source direction d ′ using the second transfer function calculated by the second transfer function calculation unit 109, and outputs the second sound source direction d ′ to the sound collection position calculation unit 105C. . [0170] The sound collection position calculation units 105 (FIGS. 1, 11, 13, 18, 20), 105C (FIG. 16), the 11-04-2019 54 sound source direction calculation units 106 (FIG. 1), 106A (FIG. 11), 106B (FIG. 13, FIGS. FIG. 16) shows the extended Kalman filter when calculating the sound source state information ξk | k−1, ξk so that the estimation error of the observation value vector ζk | k−1 or the observation value ζ′k | k−1 decreases. In place of the method, a minimum mean square error (MMSE) method, another coefficient calculation method, and a system identification method may be used. [0171] The second sound source direction calculation unit 110C (FIG. 16) and the sound source direction calculation unit 106D (FIG. 18) respectively replace the MUSIC method with a generalized eigenvalue decomposition (GEVD: Generalized Eigenvalue) -MUSIC method, a generalized singular value decomposition (GEVD). Generalized Singular Value Decomposition (GSVD-) MUSIC method, Weighted Delay and Sum Beam Forming (WDS-BF) method, and other sound source direction calculation methods may be used. [0172] The second transfer function calculation unit 109 (FIG. 1, FIG. 11, FIG. 18, FIG. 20) and 109 B (FIG. 13, FIG. 16) have other devices (eg, robot) and other configurations (eg, input / output interface) According to the input of the transfer function request information for requesting the calculation of the transfer function from the above, the second transfer function corresponding to the target sound source direction indicated by the transfer function request information may be calculated. In that case, the second transfer function calculated by the second transfer function calculation unit 109, 109B may be output to the device or configuration that is the output source of the transfer function request information. The second transfer function calculation units 109 and 109B may calculate the second transfer function corresponding to the target sound source direction by interpolating the first transfer functions respectively corresponding to the three or more reference sound source directions. [0173] 11-04-2019 55 In the above, when the second transfer function calculation units 109 and 109B interpolate the first transfer function, the second transfer function A [d] is obtained from the amplitude λm [T] by the TDLI method and the phase tm [T] by the FDLI method. ] [N] ((omega)) was taken as an example, but it is not restricted to this. The second transfer function calculation units 109 and 109 B calculate the amplitude λ m [M] by product-based Eigenvalue scaling interpolation method (M-EVSI: Multiplication-based Eigenvalue Scaling Interpolation) and the phase tm [T] by the FDLI method. The second transfer function A [d] [n] (ω) may be configured. In addition, the second transfer function calculation units 109 and 109B may use another interpolation method when interpolating the first transfer function. [0174] In the example described above, the N + 1 sound collecting units 11-0 to 11 -N are fixed to the robot Ro, but moving objects other than the robot, for example, vehicles, carts, It may be installed in the like. Further, the N + 1 sound collection units 11-0 to 11 -N may be attachable to a human body. Each of the N + 1 sound collection units 11-0 to 11 -N may be removable from other objects or may be movable individually. Further, the arrangement of all or part of the N + 1 sound collection units 11-0 to 11 -N is arbitrary as long as it can collect the sound that has arrived from the common sound source. All or a part of the N + 1 sound collection units 11-0 to 11-N may be disposed on one straight line, or may be disposed on a plane or a curved surface. In addition, all of the N + 1 sound collection units 11-0 to 11-N may not be disposed within a predetermined range, and at least a portion thereof may be disposed outside the range. . [0175] For example, as shown in FIG. 22, the sound collection unit 11-0 according to the representative channel may be disposed close to a predetermined distance (for example, 5 cm) from the sound source. FIG. 22 is a plan view showing another arrangement example of the sound source S and the sound collection unit 11-n. As shown in FIG. 22, the sound collection unit 11-0 is disposed in the vicinity of the sound source S, and the remaining seven sound collection units 11-1 to 11-7 are respectively from the head center C of the robot Ro It may be equally spaced on the circumference of the radius ρ. As described above, by arranging the sound collection unit 11-0 closer to the sound source S than the other sound collection units 11-1 to 11-7, the first transfer function calculation unit 107 can set up the target channel 1-1. The transfer functions from the sound source S to the sound collection units 11-1 to 11-7 can be calculated as the first transfer function as the transfer functions A [d] [1] to A [d] [7] of 7. . As a result, the second transfer function calculation units 109 and 109B perform second transfer functions on the transfer 11-04-2019 56 functions from the sound source S located in the target sound source direction to the sound collection units 11-1 to 11-7 based on the calculated first transfer function. It can be calculated as a function. [0176] The sound processing devices 10 to 10E according to the above-described embodiment and modification calculate the sound source direction as spatial information such as the position and direction of the sound source, and take the case of ignoring the distance from the sound source to the sound collection unit. However, it is not limited to this. The sound processing devices 10 to 10E may calculate the first transfer function and the second transfer function relating to the sound source position in the two-dimensional plane, further considering the distance from the sound collection unit to the sound source. Also, the sound processing devices 10 to 10E may calculate the first transfer function and the second transfer function related to the sound source position in the three-dimensional space, further considering the height from the predetermined plane of the sound source or the elevation angle. Good. [0177] In addition, a part of the sound processing devices 10 to 10E in the embodiment and the modification described above, for example, the peak detection unit 103, the time difference calculation unit 104, the sound collection position calculation units 105 and 105C, the sound source direction calculation units 106, 106A, 106B , 106D, the third transfer function calculation unit 1068D, the first transfer function calculation units 107 and 107A, the second transfer function calculation units 109 and 109B, and the second sound source direction calculation unit 110C may be realized by a computer. In that case, a program for realizing the control function may be recorded in a computer readable recording medium, and the program recorded in the recording medium may be read and executed by a computer system. Here, the “computer system” is a computer system built in the sound processing devices 10 to 10E, and includes an OS and hardware such as peripheral devices. The term "computer-readable recording medium" refers to a storage medium such as a flexible disk, a magneto-optical disk, a ROM, a portable medium such as a ROM or a CD-ROM, or a hard disk built in a computer system. Furthermore, the “computer-readable recording medium” is one that holds a program dynamically for a short time, like a communication line in the case of transmitting a program via a network such as the Internet or a communication line such as a telephone line. In this case, a volatile memory in a computer system serving as a server or a client in that case may also include one that holds a program for a certain period of time. The program may be for realizing a part of the functions 11-04-2019 57 described above, or may be realized in combination with the program already recorded in the computer system. Moreover, you may implement | achieve some or all of the sound processing apparatuses 10-10E in embodiment and the modification which were mentioned above as integrated circuits, such as LSI (Large Scale Integration). Each functional block of the sound processing devices 10 to 10E may be individually processorized, or part or all may be integrated and processorized. Further, the method of circuit integration is not limited to LSI's, and implementation using dedicated circuitry or general purpose processors is also possible. In the case where an integrated circuit technology comes out to replace LSI's as a result of the advancement of semiconductor technology, integrated circuits based on such technology may also be used. [0178] As mentioned above, although one embodiment of this invention was described in detail with reference to drawings, a specific structure is not restricted to the above-mentioned thing, Various design changes etc. in the range which does not deviate from the summary of this invention It is possible to [0179] 1 to 1 E: sound processing system, 11 (11-0 to 11-N): sound collecting unit, 10 to 10 E: sound processing device, 102: signal input unit, 103: peak detection unit, 104: time difference calculation unit, 105 105C: sound collection position calculation unit 1051: state update unit 1052: state prediction unit 1054: Kalman gain calculation unit 1055: convergence determination unit 106, 106A, 106B, 106D: sound source direction calculation unit 1061: state Updating unit 1062 State prediction unit 1064 Kalman gain calculation unit 1065 Convergence judgment unit 1066 B Reliability judgment unit 1068 D Third transfer function calculation unit 1069 D First sound source direction determination unit 107, 107 A ... 1st transfer function calculation part, 108 ... transfer function storage part, 109, 109B ... 2nd transfer function calculation part, 110C ... 2nd sound source direction calculation part 11-04-2019 58
1/--страниц