Upload
sharyl-boyd
View
221
Download
1
Embed Size (px)
Citation preview
Neural Networks NN 1 1
Neural netwoksthanks to: www.cs.vu.nl/~elena/slides
Basics of neural network theory and practice for supervised and unsupervised learning.
Most popular Neural Network models:• architectures• learning algorithms• applications
Neural Networks NN 1 2
Neural Networks
• A NN is a machine learning approach inspired by the way in which the brain performs a particular learning task:– Knowledge about the learning task is given in the form of
examples.
– Inter neuron connection strengths (weights) are used to store the acquired information (the training examples).
– During the learning process the weights are modified in order to model the particular learning task correctly on the training examples.
Neural Networks NN 1 3
• Supervised Learning– Recognizing hand-written digits, pattern recognition,
regression.– Labeled examples
(input , desired output)– Neural Network models: perceptron, feed-forward, radial basis
function, support vector machine.
• Unsupervised Learning– Find similar groups of documents in the web, content
addressable memory, clustering.– Unlabeled examples
(different realizations of the input alone)– Neural Network models: self organizing maps, Hopfield
networks.
Learning
Neural Networks NN 1 4
Neurons
Neural Networks NN 1 5
Network architectures
• Three different classes of network architectures
– single-layer feed-forward neurons are organized
– multi-layer feed-forward in acyclic layers
– recurrent
• The architecture of a neural network is linked with the learning algorithm used to train
Neural Networks NN 1 6
Single Layer Feed-forward
Input layerof
source nodes
Output layerof
neurons
Neural Networks NN 1 7
Multi layer feed-forward
Inputlayer
Outputlayer
Hidden Layer
3-4-2 Network
Neural Networks NN 1 8
Recurrent Network with hidden neuron(s): unit delay operator z-1
implies dynamic system
z-1
z-1
z-1
Recurrent network
inputhiddenoutput
Neural Networks NN 1 10
The Neuron• The neuron is the basic information processing unit of
a NN. It consists of:1 A set of synapses or connecting links, each link
characterized by a weight: W1, W2, …, Wm
2 An adder function (linear combiner) which computes the weighted sum of the inputs:
3 Activation function (squashing function) for limiting the amplitude of the output of the neuron.
∑==
m
1jj xwu
j
ϕ
) (u y b+=ϕ
Neural Networks NN 1 11
The Neuron
Inputsignal
Synapticweights
Summingfunction
Bias
b
ActivationfunctionLocal
Field
vOutput
y
x1
x2
xm
w2
wm
w1
M M∑ )(−ϕ
Neural Networks NN 1 12
Bias of a Neuron
• Bias b has the effect of applying an affine transformation to u
v = u + b• v is the induced field of the neuron
v
u
∑==
m
1jj xwu
j
Neural Networks NN 1 13
Bias as extra input
Inputsignal
Synapticweights
Summingfunction
ActivationfunctionLocal
Field
vOutput
y
x1
x2
xm
w2
wm
w1
M M∑ )(−ϕ
w0x0 = +1
• Bias is an external parameter of the neuron. Can be modeled by adding an extra input.
bw
xwv j
m
j
j
=
=∑=
0
0
Neural Networks NN 1 14
Dimensions of a Neural Network
• Various types of neurons
• Various network architectures
• Various learning algorithms
• Various applications
Neural Networks NN 1 15
Face Recognition
90% accurate learning head pose, and recognizing 1-of-20 faces
Neural Networks NN 1 16
Handwritten digit recognition
Neural Networks NN 1 17
Learning in NN
• Hebb 49: learning by modifying connetions
• Widrow & Hoff 60: comparing with target
Neural Networks NN 1 18
Architecture
• We consider the architecture: feed-forward NN with one layer
• It is sufficient to study single layer perceptrons with just one neuron:
Neural Networks NN 1 19
Single layer perceptrons
• Generalization to single layer perceptrons with more neurons is easy because:
• The output units are independent among each other • Each weight only affects one of the outputs
Neural Networks NN 1 20
Perceptron: Neuron Model• Uses a non-linear (McCulloch-Pitts) model
of neuron:x1
x2
xn
w2
w1
wn
b (bias)
v y(v)
is the sign function:
(v) = +1 IF v >= 0
-1 IF v < 0Is the function sign(v)
Neural Networks NN 1 21
Perceptron: Applications
• The perceptron is used for classification: classify correctly a set of examples into one of the two classes C1, C2:
If the output of the perceptron is +1 then the input is assigned to class C1
If the output is -1 then the input is assigned to C2
Neural Networks NN 1 22
Perceptron: Classification
• The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2
0 bxwm
1iii =+∑
=
x2
C1
C2
x1
decisionboundary
w1x1 + w2x2 + b = 0
decisionregion for C1
w1x1 + w2x2 + b >= 0
Neural Networks NN 1 23
Perceptron: Limitations
• The perceptron can only model linearly separable functions.
• The perceptron can be used to model the following Boolean functions:
• AND• OR• COMPLEMENT
• But it cannot model the XOR. Why?
Neural Networks NN 1 24
Perceptron: Learning Algorithm
• Variables and parameters(n) = input vector
= [+1, x1(n), x2(n), …, xm(n)]T
w(n) = weight vector
= [b(n), w1(n), w2(n), …, wm(n)]T
b(n) = biasy(n) = actual responsed(n) = desired response
= learning rate parameter
Neural Networks NN 1 25
The fixed-increment learning algorithm
• Initialization: set w(1) = 0• Activation: activate perceptron by applying input
example (vector (n) and desired response d(n) )• Compute actual response of perceptron:
y(n) = sgn[wT(n)(n)]
• Adapt weight vector: if d(n) and y(n) different then
w(n + 1) = w(n) + d(n)(n)
Where d(n) =+1 if (n) C1
-1 if (n) C2
• Continuation: increment time step n by 1 and go to Activation step
Neural Networks NN 1 26
Example
Consider 2D training set C1 C2, where: C1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1
Use the perceptron learning algorithm to classify these examples.
• w(1) = [1, 0, 0]T = 1
Neural Networks NN 1 27
Consider the augmented training set C’1 C’2, with firstentry fixed to 1 (to deal with the bias as extra weight):(1, 1, 1), (1, 1, -1), (1, 0, -1)
(1,-1, -1), (1,-1, 1), (1,0, 1)
Replace with - for all C2’ and use the following simpler update rule:
w(n) + x(n) if w(n) x(n) 0
w(n+1) = w(n) otherwise
Trick
Neural Networks NN 1 28
• Training set after application of trick:
(1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-1,0, -1)• Application of perceptron learning algorithm:
Example
Adjustedpattern
Weightapplied
w(n) x(n) Update? Newweight
(1, 1, 1) (1, 0, 0) 1 No (1, 0, 0)(1, 1, -1) (1, 0, 0) 1 No (1, 0, 0)(1,0, -1) (1, 0, 0) 1 No (1, 0, 0)(-1,1, 1) (1, 0, 0) -1 Yes (0, 1, 1)(-1,1, -1) (0, 1, 1) 0 Yes (-1, 2, 0)(-1,0, -1) (-1, 2, 0) 1 No (-1, 2, 0)
End epoch 1
Neural Networks NN 1 29
Example
Adjustedpattern
Weightapplied
w(n) x(n) Update? Newweight
(1, 1, 1) (-1, 2, 0) 1 No (-1, 2, 0)(1, 1, -1) (-1, 2, 0) 1 No (-1, 2, 0)(1,0, -1) (-1, 2, 0) -1 Yes (0, 2, -1)(-1, 1, 1) (0, 2, -1) 1 No (0, 2, -1)(-1, 1, -1) (0, 2, -1) 3 No (0, 2, -1)(-1,0, -1) (0,2, -1) 1 No (0, 2, -1)
End epoch 2
At epoch 3 no updates are performed. (check!) stop execution of algorithm.Final weight vector: (0, 2, -1). decision hyperplane is 2x1 - x2 = 0.
Neural Networks NN 1 30
Example
+ +
-
-
x1
x2
C2
C1
- +
1
1
-1
-1 1/2
Decision boundary:2x1 - x2 = 0
Neural Networks NN 1 31
Convergence of the learning algorithm
Suppose datasets C1, C2 are linearly separable. The perceptron convergence algorithm converges after n0 iterations, with n0 nmax on training set C1 C2.
XOR is not l.s.
Neural Networks NN 1 35
Adaline: Adaptive Linear Element
• Adaline: uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithmThe idea: try to minimize the square error, which is a function of the weights
• We can find the minimum of the error function E by means of the Steepest descent method
)n(e)w(n)( 2
2
1=E
€
e(n) = d(n) − x j(n)w j(n)j= 0
m
∑
Neural Networks NN 1 36
Steepest Descent Method
)) E(nofgradient ()w(n)1n(w −=+
• start with an arbitrary point• find a direction in which E is decreasing most rapidly
• make a small step in that direction
mw
EwEE
,, ))w( of(gradient
1…
Neural Networks NN 1 37
Least-Mean-Square algorithm (Widrow-Hoff algorithm)
• Approximation of gradient(E)
• Update rule for the weights becomes:
](n)xe(n)[
w(n)
e(n)e(n)
w(n)
)w(n)(
T−=
∂∂
=∂
∂E
(n)e(n)x w(n)1)w(n +=+
)n(e)w(n)( 2
2
1=E
€
e(n) = d(n) − x j(n)w j(n)j= 0
m
∑
Neural Networks NN 1 38
Summary of LMS algorithm Training sample: input signal vector (n)
desired response d(n)
User selected parameter >0
Initializationset ŵ(1) = 0
Computation for n = 1, 2, … compute
e(n) = d(n) - ŵT(n)(n)
ŵ(n+1) = ŵ(n) + (n)e(n)
Neural Networks NN 1 39
Comparison LMS and Perceptron
• Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning.
LMS: Linear.
• Model of a neuron Perceptron: Non linear.
Hard-Limiter activation function. McCulloch-Pitts model.
LMS: Continuous.
• Learning Process Perceptron: A finite number of iterations.