вход по аккаунту



код для вставкиСкачать
J. Yang and W.C. Miller
A method for in-the-loop training of a programmable neural
network is proposed whereby each of the neuron activation
functions of the actual physical realisation is measured and then
modelled in terms of a small neural network. The method is valid
for programmable neural networks with two or less hidden layers
of neurons.
Introduction: This Letter deals with the in-the-loop training of an
intelligent sensor [I] which is based on the use of an artificial neural network with analogue neurons, programmable digital weights
and an integrated photosensitive array. When the neurons of a
programmable neural network are realised by an analogue CMOS
implementation, there can be a significant deviation from the
desired nonlinear activation function. The variations may be
related to the robustness of the neuron circuit design and/or process variations over the area of the neural network implementation.
Thus, when a neural network is trained using an idealised representation for the neuron activation function, the resultant set of
weights are incorrect for use with the physical implementation of
the network [2]. Thus, the physical network does not function
properly primarily due to anomalies in the accuracy of the nonlinear activation function realisations and quantisation effects associated with the digital weights. In this Letter, we focus on modelling
the physical neuron activation functions.
An identification procedure which is valid for neural networks
with two or less hidden layers of neurons has been developed; it
utilises the effect of varying the synaptic weights so as to generate
input-output sample pair measured values for the activation fmction of a specific neuron in the network. The resultant experimental data that characterises the activation function is used to train a
small neural network to model the actual physical implementation
of all the neurons in the network. These small neural networks are
embedded in a larger neural network that models the complete
intelligent programmable sensor as shown in Fig. 1.
Fig. 1 Modelling a physical neuron using small neural network (subnet)
output: Y
output: Y
'F, (X)
experimentally determining the input-output values associated
with the activation function over the range of interest; and (ii)
modelling the activation function input-output data in terms of a
neural network [3].
First, consider measuring the neuron activation functions for a
neural network with one hidden layer. To obtain the input-output
characteristics of one neuron, we need to isolate a specific neuron
in the network. T h s can be done with reference to Fig. 1 and Fig.
2 by setting one of the weights in W r l ) (the weight connected from
the input node to the neuron being modelled) and one in wiz)(the
weight connected from the neuron being modelled to the output
node) to unity, while setting all others to zero. A ramp voltage
would then be applied to the said input node and the resultant
output from the neuron is measured. The activation fmction is
thus characterised by a number of input-output pairs of sampled
data over the range of interest. A neural network with one hidden
layer is then trained using this data to model, or reproduce, the
neuron activation function.
T h s method can be extended to measuring the neuron activation functions for a neural network with two hidden layers. Once
again, we need to be able to isolate a specific neuron in one of the
two hidden layers in the network. This can be done with reference
to Fig. 2 by setting the input node value and the incident weight
connected to a neuron in the first layer, with activation function
Fl, such that the neuron's output is driven to its maximum saturated value. This saturated output can then be used to form a test
input for the neuron in the second hidden layer, with activation
function F,, whose activation function is to be measured. The
weight between the two selected hidden neurons is varied so as to
produce a ramp input for the neuron in the second hidden layer.
In this manner, the resultant output from the neuron can be measured and used to determine a number of input-output pairs of
sampled data over the range of interest. Once the nonlinear activation function for a neuron in the second hidden layer, F2, is
known, an inverse function, F2y1,can be found. The fact that the
product of the activation function and its inverse is equal to unity
is now exploited, i.e. F, F2-l = 1, so as to allow us to observe the
output of the cascaded neuron in the first hidden layer. The identification of the neuron function, F,, proceeds in the manner
already described for the single hidden layer case.
Experimental results: The input-output pairs of sampled data that
characterise the activation function are used to train a neural network with one input node, one hidden layer with 25 neurons, and
one output layer with one node. The required synaptic weights are
determined using the standard back-propagation algorithm. The
training takes from 300 to 1000 iterations, depending on the initial
values of the weights and converges with a mean-squared-error
value of 0.002 for a function that ranges from 0 to 5.
Once the subneural networks are known for each of the physical neurons in the sensor, the overall training of the complete network can begin. The complete neural network for the intelligent
sensor consists of an input layer with 64 input nodes, one hidden
layer with eight neurons, and an output layer with four neurons.
All of the neurons are modelled by the subnetworks previously
described. The training of the overall network was investigated
using two methods and it was found that a modified back-propagation (BP) algorithm outperforms the standard weight perturbation method [4]. The modification is a compromise between the
BP algorithm and a weight perturbation method which exploits
the fact that the derivative of a sigmoidal function can be a good
approximation to the value of AElAw, used in a weight perturbation algorithm [4] The effect of the errors between the exact and
the approximate derivatives can be reduced by using an adaptive
learning rate [SI. The algorithm is formulated as follows; we use
w i l l and w,(z) to represent the weights connecting the input-to-hidden and the hidden-to-output layers, respectively. The same notation is used as in [SI. For the hidden-to-output layer connections,
the gradient descent rule gives the following incremental weight
Fig. 2 Identfying activation functions connected in series
Activation function modelling: The procedure to model the neuron
activation function comprises two distinct phases: (i)
where 6,p = g'(h,Y-)[(,p
- OZp]
= Ozp(l- Otp)(tlk- OzK),since we have
chosen g(h) = ll(l+exp(-2ph)). The incremental weight changes
14th May 1998
Vol. 34
No. IO
for the input-to-hidden layer connections are given by:
‘Constructive approximations for neural networks by
sigmoidal functions’, Proc. ZEEE, 1990, ’78, (lo), pp. 1586-1589
JABRI, M., and FLOWER, B.: ‘Weight perturbation: an optimal
architecture and learning technique for analog VLSI feedforward
and recurrent multilayer -networks’, IEEE Trans. Neural Netw.,
1992, 3, (l), pp. 154-157
HERTZ, J., KROGH, A., and PALMER, R.G.: ‘Introduction to the theory
of neural computation’ (Addison-Wesley Publishing Company,
where S,li = g’(h;)Ztw,(2)S,p, with y p = F,(h,’) and 0: = F2(hzk),
where Fl and F2 represent the input-output relationship of the neurons modelled by the neural subnets. The learning rate parameter
q is varied according to the following rule:
where E, and E, are small positive constants, and is a preset error
The neural network for the intelligent sensor has been trained
using the patterns given in [l]. The cost function chosen is the
summation of squared errors over all the training patterns and all
the outputs. The sum-squared error goal is set at 0.02. The other
parameters are chosen as follows: p = 112, q(0) = 0.1, E , = 0.03,
= 0.04 and = 1.04. The output from the resulting neuron model
(subnet) is compared with the measured values of the activation
function in Fig. 3.
Speech signal prediction using feedforward
neural network
W.C. Chu and N.K. Bose
The conflicting factors of performance and efficiency in a
nonlinear predictor of speech signals based on the feedforward
multilayer perceptron with one hidden layer are evaluated for
various combinationsof the number of input and hidden units.
Introduction: Several network architectures, such as the multilayer
perceptron with two hidden layers [l] and the radial basis function
network [27 have been used to predict speech signals. Here the
multilayer perceptron with one hidden layer is used to predict the
speech sample s[n]that is represented by the nonlinear autoregressive (NLAR) model [3]
s[n]= p(s[n - 11,s[n - a], ..., s[n - MI)
+ w[n]
where v[n] is a white noise process, cp(.): RM + R is a nonlinear
mapping, and the number of past samples M will be called the
prediction order. The neural network can then be trained to estimate the nonlinear transformation cp(.) b y minimising the sum of
squares of prediction errors.
Fig. 3 Modelling neuron activation function
modelled output
measured data
Conclusions: A method for training a VLSI implementation of a
neural network using an in-the-loop strategy has been proposed.
The method is based on obtaining an accurate neural network representation, using in-the-loop data, for each of the neurons in the
physical realisation. A neural network model of the complete
physical realisation is formed with embedded subnets representing
individual neurons. The resulting network was trained using a
modified BP algorithm. The percentage mean-squared-error
between the measured and the modelled nonlinear activation functions was found to be in the order of 0.04% of the dynamic range.
This high degree of neuron modelling accuracy allows the actual
intelligent sensor to be satisfactorily trained in-the-loop.
Acknowledgments: This project has been made possible by financial support from the Natural Science and Engineering Research
Council of Canada (NSERC). The authors would like to thank H.
Djahanshahi for providing the measured activation function data.
Prediction with multilayer perceptron: Tlhroughout, a multilayer
perceptron with an input layer of N, nodes, a single hidden layer
with Nh neurons, and an output layer with one neuron is considered. The neuron activation function at the hidden layer is the unipolar sigmoid function ([4] p.24). The neuron at the output layer
produces the weighted sum of the hidden neuron outputs. Zero
threshold is assigned to reduce storage. Backpropagation ([4]
p.162) is applied to train the network. The network input at time
index n is formed from the past N , samples s[n-11, s[n-2], ..., s[nN I ] to produce the desired response s[nl. The number of input
nodes NI is thus equal to the prediction order M. Each signal
frame of length M+N has samples s[n], n = 0, 1, ..., M + N - 1.
Training exemplars are formed from the signal frame. The first
exemplar consists of network inputs s[O] to s[M - 11 and the
the second exemplar has inputs s[l] to s[M]
desired response s[w;
and the desired response s[M + 11. A totall of N training exemplars
are constructed in this way. One complete presentation of the set
of training exemplars constitutes an epoch. After convergence to a
local, but not necessarily global, minhium, a prediction error
sequence e[n]can be defmed by:
e[n]= s[n]- p(s[n - l ] , s [ n- 211, ..., s [ n - M I )
= M , ..., M + N - 1 (2)
Performance of a given predictor is evaluated by the prediction
0 IEE 1998
16 March 1998
Electronics Letters Online No: 19980731
J. Yang and W.C. Miller (Electrical Engineering, University of
Windsor, 401 Sunset Ave., Windsor, N9B 3P4, Canada)
modular architecture for hybrid VLSI neural networks and its
application in a smart photosensor’. Proc. 1996 IEEE Int. Conf.
Neural Networks, 1996, Washington DC, pp. 868-873
2 FRYE, R.c., RIETMAN, E.A., and CHEE c. WONG, : ‘Back-propagation
learning and nonidealities in analog neural network hardware’,
ZEEE Trans. Neural Netw., 1991, 2, (I), pp. 110-117
14th May 7998
Vol. 34
The segmental prediction gain, denoted by SPG, is the ensemble
average of PG and is approximated here by the average PG over
all available frames in the simulations des’cribedbelow.
Simulations: Two sentences (one male, one female) from the
TIMIT [5] database are used. For consistency with common practice [6] the sentences are downsampled from 16 to 8KHz after
lowpass FIR fdtering in MATLAB followed by 2:l decimation.
No. 10
Без категории
Размер файла
298 Кб
Пожаловаться на содержимое документа