Upload
barrie-melton
View
219
Download
0
Embed Size (px)
Citation preview
Neural Networks - lecture 5 1
Multi-layer neural networks Motivation
Choosing the architecture
Functioning. FORWARD algorithm
Neural networks as universal approximators
The Backpropagation algorithm
Neural Networks - lecture 5 2
Multi-layer neural networks Motivation
One layer neural networks have limited approximation capacity Example: XOR (parity function) cannot be represented by
using one layer but it can be represented by using two layers
Two different architectures to solve the same problem
Neural Networks - lecture 5 3
Choosing the architecture
Let us consider an association problem (classification or approximation) characterized by:
• N input data• M output data
The neural network will have:• N input units• M output units• How many hidden units ? Difficult problem
Heuristic hint: use as few hidden units as possible !
Example: one hidden layer with unitsMN
Neural Networks - lecture 5 4
Architecture and notationsFeedforward network with K layers
0 1 k
Input layer
Hidden layers Output layer
Y0=X0
… … KW1 W2 Wk Wk+1 WK
X1
Y1
F1
Xk
Yk
Fk
XK
YK
FK
Neural Networks - lecture 5 5
FunctioningComputation of the output vector
)()(
)))(...((1
1111
kkkkk
KKKKK
YWFXFY
XWFWFWFY
FORWARD Algorithm (propagation of the input signal toward the output layer)
Y[0]:=X (X is the input signal)
FOR k:=1,K DO
X[k]:=W[k]Y[k-1]
Y[k]:=F(X[k])
ENDFOR
Rmk: Y[K] is the output of the network
Neural Networks - lecture 5 6
A particular case
One hidden layer
Adaptive parameters: W1, W2
1
0
2
0
)1(1)2(2N
k
N
jjkjiki xwfwfy
Neural Networks - lecture 5 7
Neural networks – universal approximators
Theoretical result:
Any continuous function T:DRN->RM can be approximated with arbitrary accuracy by a neural network having the following architecture:
• M input units• N output units• “Enough” hidden units having activation monotonously
increasing and bounded activation functions (e.g. sigmoidal functions)
The accuracy of the approximation depends on the number of hidden units.
Neural Networks - lecture 5 8
Neural networks – universal approximators
Typical problems when solving approximation functions using neural networks:
• Representation problem: “can the network represent the desired function ?”– See the previous result
• Learning problem: “it is possible to find values for the adaptive parameters such that the desired function is approximated with the desired accuracy ?”– A training set and learning algorithm is needed
• Generalization problem: “ is the neural network able to extrapolate the knowledge extracted from the training set ?”– The training process should be carefully controlled in order to avoid
overtraining and enhance the generalization ability
Neural Networks - lecture 5 9
Neural networks – universal approximators
Applications which can be interpreted as association (approximation problems):
• Classification problems (association between a pattern and a class label)– Architecture: input size = patterns size; output size = no. of classes;
hidden layer size = depending on the problem
• Prediction problems (based on a set of previous values of a time series estimate the next value)– Architecture: input size = number of previous values (predictors)
ouput size = 1 (one-dimensional prediction)
hidden layer size = depending on the problem
Example: y(t)=T(y(t-1),y(t-2), …, y(t-N))
Neural Networks - lecture 5 10
Neural networks – universal approximators
• Compression problems (compress and decompress vectorial data)
Inputdata
Compresseddate
Output data
• Input size = output size
• Hidden layer size = input sized * compression ratio
Example: for a compression ratio of 1:2 the hidden layer will have the half size if the input layer
• Training set: {(X1,X1),…,(XL,XL)}
W1 W2
Neural Networks - lecture 5 11
Learning processLearning based on minimizing a error function• Training set: {(x1,d1), …, (xL,dL)}• Error function (one hidden layer):
• Aim of learning process: find W which minimizes the error function • Minimization method: gradient method
2
1
2
1
2
0
1
0
12
2
1)(
L
l
N
i
N
k
N
kjkjik
li xwfwfd
LWE
Neural Networks - lecture 5 12
Learning process
Gradient based adjustement
ijijij w
wEtwtw
)(
)()1(
2
1
2
1
2
0
1
0
12
2
1)(
L
l
N
i
N
k
N
kjkjik
li xwfwfd
LWE
xk
yk
xi
yi
El(W)
Neural Networks - lecture 5 13
Learning process • Partial derivatives computation
2
1
2
1
1
0
0
0122
1)(
L
l
N
i
N
k
N
jjkjik
li xwfwfd
LWE
xk
yk
xi
yi
jlkj
N
i
liikkjkii
li
N
iik
kj
l
klikii
li
ik
l
xxwxfxxfxfydww
WE
yyxfydw
WE
2
1
'1
'1
'2
2
1
'2
)()()()()(
)()()(
Neural Networks - lecture 5 14
The BackPropagation AlgorithmMain idea:
For each example in the training set:
- compute the output signal
- compute the error corresponding to the output level
- propagate the error back into the network and store the corresponding delta values for each layer
- adjust each weight by using the error signal and input signal for each layer
Computation of the output signal (FORWARD)
Computation of the error signal (BACKWARD)
Neural Networks - lecture 5 15
The BackPropagation AlgorithmGeneral structure
Random initialization of weights
REPEAT
FOR l=1,L DO
FORWARD stage
BACKWARD stage
weights adjustement
ENDFOR
Error (re)computation
UNTIL <stopping condition>
Rmk.. • The weights adjustment
depends on the learning rate• The error computation needs
the recomputation of the output signal for the new values of the weights
• The stopping condition depends on the value of the error and on the number of epochs
• This is a so-called serial (incremental) variant: the adjustment is applied separately for each example from the training set
epoc
h
Neural Networks - lecture 5 16
The BackPropagation AlgorithmBatch variant
Random initialization of weights
REPEAT
initialize the variables which will contain the adjustments
FOR l=1,L DO
FORWARD stage
BACKWARD stage
cumulate the adjustments
ENDFOR
Apply the cumulated adjustments
Error (re)computation
UNTIL <stopping condition>
Rmk.. • The incremental variant can be
sensitive to the presentation order of the training examples
• The batch variant is not sensitive to this order and is more robust to the errors in the training examples
• It is the starting algorithm for more elaborated variants, e.g. momentum variant
epoc
h
Neural Networks - lecture 5 17
Problems of BackPropagation• Low convergence rate (the error decreases too slow)
• Oscillations (the error value oscillates instead of continuously decreasing)
• Local minima problems (the learning process is stuck in a local minima of the error function)
• Stagnation (the learning process stagnates even if it is not a local minima)
• Overtraining and limited generalization
Neural Networks - lecture 5 18
Generalization capacityThe generalization capacity of a neural network
depends on the:
• Network architecture (e.g. number of hidden units)– A large number of hidden units can lead to overtraining (the
network extracts not only the useful knowledge but also the noise in data)
• The size of the training set– Too few examples are not enough to train the network
• The number of epochs (accuracy on the training set)– Too many epochs could lead to overtraining