J. Yang and W.C. Miller A method for in-the-loop training of a programmable neural network is proposed whereby each of the neuron activation functions of the actual physical realisation is measured and then modelled in terms of a small neural network. The method is valid for programmable neural networks with two or less hidden layers of neurons. Introduction: This Letter deals with the in-the-loop training of an intelligent sensor [I] which is based on the use of an artificial neural network with analogue neurons, programmable digital weights and an integrated photosensitive array. When the neurons of a programmable neural network are realised by an analogue CMOS implementation, there can be a significant deviation from the desired nonlinear activation function. The variations may be related to the robustness of the neuron circuit design and/or process variations over the area of the neural network implementation. Thus, when a neural network is trained using an idealised representation for the neuron activation function, the resultant set of weights are incorrect for use with the physical implementation of the network [2]. Thus, the physical network does not function properly primarily due to anomalies in the accuracy of the nonlinear activation function realisations and quantisation effects associated with the digital weights. In this Letter, we focus on modelling the physical neuron activation functions. An identification procedure which is valid for neural networks with two or less hidden layers of neurons has been developed; it utilises the effect of varying the synaptic weights so as to generate input-output sample pair measured values for the activation fmction of a specific neuron in the network. The resultant experimental data that characterises the activation function is used to train a small neural network to model the actual physical implementation of all the neurons in the network. These small neural networks are embedded in a larger neural network that models the complete intelligent programmable sensor as shown in Fig. 1. 141911 Fig. 1 Modelling a physical neuron using small neural network (subnet) input: 4 weight:W output: Y "" inout: weighti y I output: Y b 'F, (X) experimentally determining the input-output values associated with the activation function over the range of interest; and (ii) modelling the activation function input-output data in terms of a neural network [3]. First, consider measuring the neuron activation functions for a neural network with one hidden layer. To obtain the input-output characteristics of one neuron, we need to isolate a specific neuron in the network. T h s can be done with reference to Fig. 1 and Fig. 2 by setting one of the weights in W r l ) (the weight connected from the input node to the neuron being modelled) and one in wiz)(the weight connected from the neuron being modelled to the output node) to unity, while setting all others to zero. A ramp voltage would then be applied to the said input node and the resultant output from the neuron is measured. The activation fmction is thus characterised by a number of input-output pairs of sampled data over the range of interest. A neural network with one hidden layer is then trained using this data to model, or reproduce, the neuron activation function. T h s method can be extended to measuring the neuron activation functions for a neural network with two hidden layers. Once again, we need to be able to isolate a specific neuron in one of the two hidden layers in the network. This can be done with reference to Fig. 2 by setting the input node value and the incident weight connected to a neuron in the first layer, with activation function Fl, such that the neuron's output is driven to its maximum saturated value. This saturated output can then be used to form a test input for the neuron in the second hidden layer, with activation function F,, whose activation function is to be measured. The weight between the two selected hidden neurons is varied so as to produce a ramp input for the neuron in the second hidden layer. In this manner, the resultant output from the neuron can be measured and used to determine a number of input-output pairs of sampled data over the range of interest. Once the nonlinear activation function for a neuron in the second hidden layer, F2, is known, an inverse function, F2y1,can be found. The fact that the product of the activation function and its inverse is equal to unity is now exploited, i.e. F, F2-l = 1, so as to allow us to observe the output of the cascaded neuron in the first hidden layer. The identification of the neuron function, F,, proceeds in the manner already described for the single hidden layer case. Experimental results: The input-output pairs of sampled data that characterise the activation function are used to train a neural network with one input node, one hidden layer with 25 neurons, and one output layer with one node. The required synaptic weights are determined using the standard back-propagation algorithm. The training takes from 300 to 1000 iterations, depending on the initial values of the weights and converges with a mean-squared-error value of 0.002 for a function that ranges from 0 to 5. Once the subneural networks are known for each of the physical neurons in the sensor, the overall training of the complete network can begin. The complete neural network for the intelligent sensor consists of an input layer with 64 input nodes, one hidden layer with eight neurons, and an output layer with four neurons. All of the neurons are modelled by the subnetworks previously described. The training of the overall network was investigated using two methods and it was found that a modified back-propagation (BP) algorithm outperforms the standard weight perturbation method [4]. The modification is a compromise between the BP algorithm and a weight perturbation method which exploits the fact that the derivative of a sigmoidal function can be a good approximation to the value of AElAw, used in a weight perturbation algorithm [4] The effect of the errors between the exact and the approximate derivatives can be reduced by using an adaptive learning rate [SI. The algorithm is formulated as follows; we use w i l l and w,(z) to represent the weights connecting the input-to-hidden and the hidden-to-output layers, respectively. The same notation is used as in [SI. For the hidden-to-output layer connections, the gradient descent rule gives the following incremental weight change: 1419/2 Fig. 2 Identfying activation functions connected in series Activation function modelling: The procedure to model the neuron activation function comprises two distinct phases: (i) 998 where 6,p = g'(h,Y-)[(,p - OZp] = Ozp(l- Otp)(tlk- OzK),since we have chosen g(h) = ll(l+exp(-2ph)). The incremental weight changes ELECTRONICS LETTERS 14th May 1998 Vol. 34 No. IO for the input-to-hidden layer connections are given by: ‘Constructive approximations for neural networks by sigmoidal functions’, Proc. ZEEE, 1990, ’78, (lo), pp. 1586-1589 JABRI, M., and FLOWER, B.: ‘Weight perturbation: an optimal architecture and learning technique for analog VLSI feedforward and recurrent multilayer -networks’, IEEE Trans. Neural Netw., 1992, 3, (l), pp. 154-157 HERTZ, J., KROGH, A., and PALMER, R.G.: ‘Introduction to the theory of neural computation’ (Addison-Wesley Publishing Company, 1991) JONES, L.K.: iL where S,li = g’(h;)Ztw,(2)S,p, with y p = F,(h,’) and 0: = F2(hzk), where Fl and F2 represent the input-output relationship of the neurons modelled by the neural subnets. The learning rate parameter q is varied according to the following rule: x where E, and E, are small positive constants, and is a preset error ratio. The neural network for the intelligent sensor has been trained using the patterns given in [l]. The cost function chosen is the summation of squared errors over all the training patterns and all the outputs. The sum-squared error goal is set at 0.02. The other parameters are chosen as follows: p = 112, q(0) = 0.1, E , = 0.03, = 0.04 and = 1.04. The output from the resulting neuron model (subnet) is compared with the measured values of the activation function in Fig. 3. Speech signal prediction using feedforward neural network W.C. Chu and N.K. Bose The conflicting factors of performance and efficiency in a nonlinear predictor of speech signals based on the feedforward multilayer perceptron with one hidden layer are evaluated for various combinationsof the number of input and hidden units. x ’ i Introduction: Several network architectures, such as the multilayer perceptron with two hidden layers [l] and the radial basis function network [27 have been used to predict speech signals. Here the multilayer perceptron with one hidden layer is used to predict the speech sample s[n]that is represented by the nonlinear autoregressive (NLAR) model [3] 4 s[n]= p(s[n - 11,s[n - a], ..., s[n - MI) + w[n] (1) where v[n] is a white noise process, cp(.): RM + R is a nonlinear mapping, and the number of past samples M will be called the prediction order. The neural network can then be trained to estimate the nonlinear transformation cp(.) b y minimising the sum of squares of prediction errors. 0 -2 -1 0 1 2 Fig. 3 Modelling neuron activation function ~ modelled output measured data Conclusions: A method for training a VLSI implementation of a neural network using an in-the-loop strategy has been proposed. The method is based on obtaining an accurate neural network representation, using in-the-loop data, for each of the neurons in the physical realisation. A neural network model of the complete physical realisation is formed with embedded subnets representing individual neurons. The resulting network was trained using a modified BP algorithm. The percentage mean-squared-error between the measured and the modelled nonlinear activation functions was found to be in the order of 0.04% of the dynamic range. This high degree of neuron modelling accuracy allows the actual intelligent sensor to be satisfactorily trained in-the-loop. Acknowledgments: This project has been made possible by financial support from the Natural Science and Engineering Research Council of Canada (NSERC). The authors would like to thank H. Djahanshahi for providing the measured activation function data. Prediction with multilayer perceptron: Tlhroughout, a multilayer perceptron with an input layer of N, nodes, a single hidden layer with Nh neurons, and an output layer with one neuron is considered. The neuron activation function at the hidden layer is the unipolar sigmoid function ([4] p.24). The neuron at the output layer produces the weighted sum of the hidden neuron outputs. Zero threshold is assigned to reduce storage. Backpropagation ([4] p.162) is applied to train the network. The network input at time index n is formed from the past N , samples s[n-11, s[n-2], ..., s[nN I ] to produce the desired response s[nl. The number of input nodes NI is thus equal to the prediction order M. Each signal frame of length M+N has samples s[n], n = 0, 1, ..., M + N - 1. Training exemplars are formed from the signal frame. The first exemplar consists of network inputs s[O] to s[M - 11 and the the second exemplar has inputs s[l] to s[M] desired response s[w; and the desired response s[M + 11. A totall of N training exemplars are constructed in this way. One complete presentation of the set of training exemplars constitutes an epoch. After convergence to a local, but not necessarily global, minhium, a prediction error sequence e[n]can be defmed by: e[n]= s[n]- p(s[n - l ] , s [ n- 211, ..., s [ n - M I ) = M , ..., M + N - 1 (2) Performance of a given predictor is evaluated by the prediction gain 0 IEE 1998 16 March 1998 Electronics Letters Online No: 19980731 J. Yang and W.C. Miller (Electrical Engineering, University of Windsor, 401 Sunset Ave., Windsor, N9B 3P4, Canada) References 1 DJAHANSHAHI, H., AHMADI, M., JULLIEN, G.A., and MILLER, W C.: ‘A modular architecture for hybrid VLSI neural networks and its application in a smart photosensor’. Proc. 1996 IEEE Int. Conf. Neural Networks, 1996, Washington DC, pp. 868-873 2 FRYE, R.c., RIETMAN, E.A., and CHEE c. WONG, : ‘Back-propagation learning and nonidealities in analog neural network hardware’, ZEEE Trans. Neural Netw., 1991, 2, (I), pp. 110-117 ELECTRONICS LETTERS 14th May 7998 Vol. 34 The segmental prediction gain, denoted by SPG, is the ensemble average of PG and is approximated here by the average PG over all available frames in the simulations des’cribedbelow. Simulations: Two sentences (one male, one female) from the TIMIT [5] database are used. For consistency with common practice [6] the sentences are downsampled from 16 to 8KHz after lowpass FIR fdtering in MATLAB followed by 2:l decimation. No. 10 999

1/--страниц