35
Neural Networks NN 1 1 Neural netwoks thanks to: www.cs.vu.nl/~elena/slides Basics of neural network theory and practice for supervised and unsupervised learning. Most popular Neural Network models: • architectures learning algorithms • applications

Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Embed Size (px)

Citation preview

Page 1: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 1

Neural netwoksthanks to: www.cs.vu.nl/~elena/slides

Basics of neural network theory and practice for supervised and unsupervised learning.

Most popular Neural Network models:• architectures• learning algorithms• applications

Page 2: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 2

Neural Networks

• A NN is a machine learning approach inspired by the way in which the brain performs a particular learning task:– Knowledge about the learning task is given in the form of

examples.

– Inter neuron connection strengths (weights) are used to store the acquired information (the training examples).

– During the learning process the weights are modified in order to model the particular learning task correctly on the training examples.

Page 3: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 3

• Supervised Learning– Recognizing hand-written digits, pattern recognition,

regression.– Labeled examples

(input , desired output)– Neural Network models: perceptron, feed-forward, radial basis

function, support vector machine.

• Unsupervised Learning– Find similar groups of documents in the web, content

addressable memory, clustering.– Unlabeled examples

(different realizations of the input alone)– Neural Network models: self organizing maps, Hopfield

networks.

Learning

Page 4: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 4

Neurons

Page 5: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 5

Network architectures

• Three different classes of network architectures

– single-layer feed-forward neurons are organized

– multi-layer feed-forward in acyclic layers

– recurrent

• The architecture of a neural network is linked with the learning algorithm used to train

Page 6: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 6

Single Layer Feed-forward

Input layerof

source nodes

Output layerof

neurons

Page 7: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 7

Multi layer feed-forward

Inputlayer

Outputlayer

Hidden Layer

3-4-2 Network

Page 8: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 8

Recurrent Network with hidden neuron(s): unit delay operator z-1

implies dynamic system

z-1

z-1

z-1

Recurrent network

inputhiddenoutput

Page 9: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 10

The Neuron• The neuron is the basic information processing unit of

a NN. It consists of:1 A set of synapses or connecting links, each link

characterized by a weight: W1, W2, …, Wm

2 An adder function (linear combiner) which computes the weighted sum of the inputs:

3 Activation function (squashing function) for limiting the amplitude of the output of the neuron.

∑==

m

1jj xwu

j

ϕ

) (u y b+=ϕ

Page 10: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 11

The Neuron

Inputsignal

Synapticweights

Summingfunction

Bias

b

ActivationfunctionLocal

Field

vOutput

y

x1

x2

xm

w2

wm

w1

M M∑ )(−ϕ

Page 11: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 12

Bias of a Neuron

• Bias b has the effect of applying an affine transformation to u

v = u + b• v is the induced field of the neuron

v

u

∑==

m

1jj xwu

j

Page 12: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 13

Bias as extra input

Inputsignal

Synapticweights

Summingfunction

ActivationfunctionLocal

Field

vOutput

y

x1

x2

xm

w2

wm

w1

M M∑ )(−ϕ

w0x0 = +1

• Bias is an external parameter of the neuron. Can be modeled by adding an extra input.

bw

xwv j

m

j

j

=

=∑=

0

0

Page 13: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 14

Dimensions of a Neural Network

• Various types of neurons

• Various network architectures

• Various learning algorithms

• Various applications

Page 14: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 15

Face Recognition

90% accurate learning head pose, and recognizing 1-of-20 faces

Page 15: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 16

Handwritten digit recognition

Page 16: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 17

Learning in NN

• Hebb 49: learning by modifying connetions

• Widrow & Hoff 60: comparing with target

Page 17: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 18

Architecture

• We consider the architecture: feed-forward NN with one layer

• It is sufficient to study single layer perceptrons with just one neuron:

Page 18: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 19

Single layer perceptrons

• Generalization to single layer perceptrons with more neurons is easy because:

• The output units are independent among each other • Each weight only affects one of the outputs

Page 19: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 20

Perceptron: Neuron Model• Uses a non-linear (McCulloch-Pitts) model

of neuron:x1

x2

xn

w2

w1

wn

b (bias)

v y(v)

is the sign function:

(v) = +1 IF v >= 0

-1 IF v < 0Is the function sign(v)

Page 20: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 21

Perceptron: Applications

• The perceptron is used for classification: classify correctly a set of examples into one of the two classes C1, C2:

If the output of the perceptron is +1 then the input is assigned to class C1

If the output is -1 then the input is assigned to C2

Page 21: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 22

Perceptron: Classification

• The equation below describes a hyperplane in the input space. This hyperplane is used to separate the two classes C1 and C2

0 bxwm

1iii =+∑

=

x2

C1

C2

x1

decisionboundary

w1x1 + w2x2 + b = 0

decisionregion for C1

w1x1 + w2x2 + b >= 0

Page 22: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 23

Perceptron: Limitations

• The perceptron can only model linearly separable functions.

• The perceptron can be used to model the following Boolean functions:

• AND• OR• COMPLEMENT

• But it cannot model the XOR. Why?

Page 23: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 24

Perceptron: Learning Algorithm

• Variables and parameters(n) = input vector

= [+1, x1(n), x2(n), …, xm(n)]T

w(n) = weight vector

= [b(n), w1(n), w2(n), …, wm(n)]T

b(n) = biasy(n) = actual responsed(n) = desired response

= learning rate parameter

Page 24: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 25

The fixed-increment learning algorithm

• Initialization: set w(1) = 0• Activation: activate perceptron by applying input

example (vector (n) and desired response d(n) )• Compute actual response of perceptron:

y(n) = sgn[wT(n)(n)]

• Adapt weight vector: if d(n) and y(n) different then

w(n + 1) = w(n) + d(n)(n)

Where d(n) =+1 if (n) C1

-1 if (n) C2

• Continuation: increment time step n by 1 and go to Activation step

Page 25: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 26

Example

Consider 2D training set C1 C2, where: C1 = {(1,1), (1, -1), (0, -1)} elements of class 1 C2 = {(-1,-1), (-1,1), (0,1)} elements of class -1

Use the perceptron learning algorithm to classify these examples.

• w(1) = [1, 0, 0]T = 1

Page 26: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 27

Consider the augmented training set C’1 C’2, with firstentry fixed to 1 (to deal with the bias as extra weight):(1, 1, 1), (1, 1, -1), (1, 0, -1)

(1,-1, -1), (1,-1, 1), (1,0, 1)

Replace with - for all C2’ and use the following simpler update rule:

w(n) + x(n) if w(n) x(n) 0

w(n+1) = w(n) otherwise

Trick

Page 27: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 28

• Training set after application of trick:

(1, 1, 1), (1, 1, -1), (1,0, -1), (-1,1, 1), (-1,1, -1), (-1,0, -1)• Application of perceptron learning algorithm:

Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (1, 0, 0) 1 No (1, 0, 0)(1, 1, -1) (1, 0, 0) 1 No (1, 0, 0)(1,0, -1) (1, 0, 0) 1 No (1, 0, 0)(-1,1, 1) (1, 0, 0) -1 Yes (0, 1, 1)(-1,1, -1) (0, 1, 1) 0 Yes (-1, 2, 0)(-1,0, -1) (-1, 2, 0) 1 No (-1, 2, 0)

End epoch 1

Page 28: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 29

Example

Adjustedpattern

Weightapplied

w(n) x(n) Update? Newweight

(1, 1, 1) (-1, 2, 0) 1 No (-1, 2, 0)(1, 1, -1) (-1, 2, 0) 1 No (-1, 2, 0)(1,0, -1) (-1, 2, 0) -1 Yes (0, 2, -1)(-1, 1, 1) (0, 2, -1) 1 No (0, 2, -1)(-1, 1, -1) (0, 2, -1) 3 No (0, 2, -1)(-1,0, -1) (0,2, -1) 1 No (0, 2, -1)

End epoch 2

At epoch 3 no updates are performed. (check!) stop execution of algorithm.Final weight vector: (0, 2, -1). decision hyperplane is 2x1 - x2 = 0.

Page 29: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 30

Example

+ +

-

-

x1

x2

C2

C1

- +

1

1

-1

-1 1/2

Decision boundary:2x1 - x2 = 0

Page 30: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 31

Convergence of the learning algorithm

Suppose datasets C1, C2 are linearly separable. The perceptron convergence algorithm converges after n0 iterations, with n0 nmax on training set C1 C2.

XOR is not l.s.

Page 31: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 35

Adaline: Adaptive Linear Element

• Adaline: uses a linear neuron model and the Least-Mean-Square (LMS) learning algorithmThe idea: try to minimize the square error, which is a function of the weights

• We can find the minimum of the error function E by means of the Steepest descent method

)n(e)w(n)( 2

2

1=E

e(n) = d(n) − x j(n)w j(n)j= 0

m

Page 32: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 36

Steepest Descent Method

)) E(nofgradient ()w(n)1n(w −=+

• start with an arbitrary point• find a direction in which E is decreasing most rapidly

• make a small step in that direction

mw

EwEE

,, ))w( of(gradient

1…

Page 33: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 37

Least-Mean-Square algorithm (Widrow-Hoff algorithm)

• Approximation of gradient(E)

• Update rule for the weights becomes:

](n)xe(n)[

w(n)

e(n)e(n)

w(n)

)w(n)(

T−=

∂∂

=∂

∂E

(n)e(n)x w(n)1)w(n +=+

)n(e)w(n)( 2

2

1=E

e(n) = d(n) − x j(n)w j(n)j= 0

m

Page 34: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 38

Summary of LMS algorithm Training sample: input signal vector (n)

desired response d(n)

User selected parameter >0

Initializationset ŵ(1) = 0

Computation for n = 1, 2, … compute

e(n) = d(n) - ŵT(n)(n)

ŵ(n+1) = ŵ(n) + (n)e(n)

Page 35: Neural NetworksNN 11 Neural netwoks thanks to: elena/slides Basics of neural network theory and practice for supervised and unsupervised

Neural Networks NN 1 39

Comparison LMS and Perceptron

• Perceptron and Adaline represent different implementations of a single-layer perceptron based on error-correction learning.

LMS: Linear.

• Model of a neuron Perceptron: Non linear.

Hard-Limiter activation function. McCulloch-Pitts model.

LMS: Continuous.

• Learning Process Perceptron: A finite number of iterations.