Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 1

Neural netwoksthanks to: www.cs.vu.nl/~elena/slides

Basics of neural network theory and practice for supervised and unsupervised learning.

Most popular Neural Network models:• architectures• learning algorithms• applications


Neural Networks

• A NN is a machine learning approach inspired by the way in which the brain performs a particular learning task:– Knowledge about the learning task is given in the form of

examples.

– Inter neuron connection strengths (weights) are used to store the acquired information (the training examples).

– During the learning process the weights are modified in order to model the particular learning task correctly on the training examples.


• Supervised Learning– Recognizing hand-written digits, pattern recognition,

regression.– Labeled examples

(input , desired output)– Neural Network models: perceptron, feed-forward, radial basis

function, support vector machine.

• Unsupervised Learning– Find similar groups of documents in the web, content

addressable memory, clustering.– Unlabeled examples

(different realizations of the input alone)– Neural Network models: self organizing maps, Hopfield

networks.

Learning


Neurons


Network architectures

• Three different classes of network architectures

– single-layer feed-forward neurons are organized

– multi-layer feed-forward in acyclic layers

– recurrent

• The architecture of a neural network is linked with the learning algorithm used to train


Single Layer Feed-forward

Input layerof

source nodes

Output layerof

neurons


Multi layer feed-forward

Inputlayer

Outputlayer

Hidden Layer

3-4-2 Network


Recurrent Network with hidden neuron(s): unit delay operator z-1

implies dynamic system

z-1

z-1

z-1

Recurrent network

inputhiddenoutput


The Neuron• The neuron is the basic information processing unit of

a NN. It consists of:1 A set of synapses or connecting links, each link

characterized by a weight: W1, W2, …, Wm

2 An adder function (linear combiner) which computes the weighted sum of the inputs:

3 Activation function (squashing function) for limiting the amplitude of the output of the neuron.

∑==

m

1jj xwu

j

ϕ

) (u y b+=ϕ


The Neuron

Inputsignal

Synapticweights

Summingfunction

Bias

b

ActivationfunctionLocal

Field

vOutput

y

x1

x2

xm

w2

wm

w1

M M∑ )(−ϕ


Bias of a Neuron

• Bias b has the effect of applying an affine transformation to u

v = u + b• v is the induced field of the neuron

v

u

∑==

m

1jj xwu

j


Bias as extra input

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocal

Field

vOutput

y

x1

x2

xm

w2

wm

w1

M M∑ )(−ϕ

w0x0 = +1

• Bias is an external parameter of the neuron. Can be modeled by adding an extra input.

bw

xwv j

m

j

j

=

=∑=

0

0


Dimensions of a Neural Network

• Various types of neurons

• Various network architectures

• Various learning algorithms

• Various applications


Face Recognition

90% accurate learning head pose, and recognizing 1-of-20 faces


Handwritten digit recognition


Learning in NN

• Hebb 49: learning by modifying connetions

• Widrow & Hoff 60: comparing with target


Architecture

• We consider the architecture: feed-forward NN with one layer

• It is sufficient to study single layer perceptrons with just one neuron:


Single layer perceptrons

• Generalization to single layer perceptrons with more neurons is easy because:

• The output units are independent among each other • Each weight only affects one of the outputs


Perceptron: Neuron Model• Uses a non-linear (McCulloch-Pitts) model

of neuron:x1

x2

xn

w2

w1

wn

b (bias)

v y(v)

is the sign function:

(v) = +1 IF v >= 0

-1 IF v < 0Is the function sign(v)


Perceptron: Applications

• The perceptron is used for classification: classify correctly a set of examples into one of the two classes C1, C2:

If the output of the perceptron is +1 then the input is assigned to class C1

If the output is -1 then the input is assigned to C2


Perceptron: Classification

• The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2

0 bxwm

1iii =+∑

=

x2

C1

C2

x1

decisionboundary

w1x1 + w2x2 + b = 0

decisionregion for C1

w1x1 + w2x2 + b >= 0


Perceptron: Limitations

• The perceptron can only model linearly separable functions.

• The perceptron can be used to model the following Boolean functions:

• AND• OR• COMPLEMENT

• But it cannot model the XOR. Why?


Perceptron: Learning Algorithm

• Variables and parameters(n) = input vector

= [+1, x1(n), x2(n), …, xm(n)]T

w(n) = weight vector

= [b(n), w1(n), w2(n), …, wm(n)]T

b(n) = biasy(n) = actual responsed(n) = desired response

= learning rate parameter


The fixed-increment learning algorithm

• Initialization: set w(1) = 0• Activation: activate perceptron by applying input

example (vector (n) and desired response d(n) )• Compute actual response of perceptron:

y(n) = sgn[wT(n)(n)]

• Adapt weight vector: if d(n) and y(n) different then

w(n + 1) = w(n) + d(n)(n)

Where d(n) =+1 if (n) C1

-1 if (n) C2

• Continuation: increment time step n by 1 and go to Activation step


Example

Consider 2D training set C1 C2, where: C1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1

Use the perceptron learning algorithm to classify these examples.

• w(1) = [1, 0, 0]T = 1


Consider the augmented training set C’1 C’2, with firstentry fixed to 1 (to deal with the bias as extra weight):(1, 1, 1), (1, 1, -1), (1, 0, -1)

(1,-1, -1), (1,-1, 1), (1,0, 1)

Replace with - for all C2’ and use the following simpler update rule:

w(n) + x(n) if w(n) x(n) 0

w(n+1) = w(n) otherwise

Trick


• Training set after application of trick:

(1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-1,0, -1)• Application of perceptron learning algorithm:

Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (1, 0, 0) 1 No (1, 0, 0)(1, 1, -1) (1, 0, 0) 1 No (1, 0, 0)(1,0, -1) (1, 0, 0) 1 No (1, 0, 0)(-1,1, 1) (1, 0, 0) -1 Yes (0, 1, 1)(-1,1, -1) (0, 1, 1) 0 Yes (-1, 2, 0)(-1,0, -1) (-1, 2, 0) 1 No (-1, 2, 0)

End epoch 1


Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (-1, 2, 0) 1 No (-1, 2, 0)(1, 1, -1) (-1, 2, 0) 1 No (-1, 2, 0)(1,0, -1) (-1, 2, 0) -1 Yes (0, 2, -1)(-1, 1, 1) (0, 2, -1) 1 No (0, 2, -1)(-1, 1, -1) (0, 2, -1) 3 No (0, 2, -1)(-1,0, -1) (0,2, -1) 1 No (0, 2, -1)

End epoch 2

At epoch 3 no updates are performed. (check!) stop execution of algorithm.Final weight vector: (0, 2, -1). decision hyperplane is 2x1 - x2 = 0.


Example

+ +

-

-

x1

x2

C2

C1

- +

1

1

-1

-1 1/2

Decision boundary:2x1 - x2 = 0


Convergence of the learning algorithm

Suppose datasets C1, C2 are linearly separable. The perceptron convergence algorithm converges after n0 iterations, with n0 nmax on training set C1 C2.

XOR is not l.s.


Adaline: Adaptive Linear Element

• Adaline: uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithmThe idea: try to minimize the square error, which is a function of the weights

• We can find the minimum of the error function E by means of the Steepest descent method

)n(e)w(n)( 2

2

1=E

€

e(n) = d(n) − x j(n)w j(n)j= 0

m

∑


Steepest Descent Method

)) E(nofgradient ()w(n)1n(w −=+

• start with an arbitrary point• find a direction in which E is decreasing most rapidly

• make a small step in that direction

mw

EwEE

,, ))w( of(gradient

1…


Least-Mean-Square algorithm (Widrow-Hoff algorithm)

• Approximation of gradient(E)

• Update rule for the weights becomes:

](n)xe(n)[

w(n)

e(n)e(n)

w(n)

)w(n)(

T−=

∂∂

=∂

∂E

(n)e(n)x w(n)1)w(n +=+

)n(e)w(n)( 2

2

1=E

€

e(n) = d(n) − x j(n)w j(n)j= 0

m

∑


Summary of LMS algorithm Training sample: input signal vector (n)

desired response d(n)

User selected parameter >0

Initializationset ŵ(1) = 0

Computation for n = 1, 2, … compute

e(n) = d(n) - ŵT(n)(n)

ŵ(n+1) = ŵ(n) + (n)e(n)


Comparison LMS and Perceptron

• Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning.

LMS: Linear.

• Model of a neuron Perceptron: Non linear.

Hard-Limiter activation function. McCulloch-Pitts model.

LMS: Continuous.

• Learning Process Perceptron: A finite number of iterations.

Documents

Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised