Programming in MATLABureerat/321641/...Artificial Neural Network 3.2 Gp.Capt.Thanapant Raicharoen, PhD Outline nLimitation of Single layer Perceptron nMulti Layer Perceptron (MLP)

Gp.Capt.Thanapant Raicharoen, PhDGp.Capt.Thanapant Raicharoen, PhD

Programming in MATLABProgramming in MATLAB

Chapter 3: Multi Layer PerceptronChapter 3: Multi Layer Perceptron

3.2 Gp.Capt.Thanapant Raicharoen, PhDArtificial Neural Network

OutlineOutline

n Limitation of Single layer Perceptron

n Multi Layer Perceptron (MLP)

n Backpropagation Algorithm

n MLP for non-linear separable classification problem

n MLP for function approximation problem


Limitation of Perceptron (XOR Function)Limitation of Perceptron (XOR Function)

No. P1 P2 Output/Target

1. 0 0 0

2. 0 1 1

3. 1 0 1

4. 1 1 0

?


Output of each node

Multilayer Feedforward Network Structure

Hidden nodeInput node

Output node

( )hijw

h = layer no.

j = node j of layer h-1i = node i of layer h

( )hiy

(1)1yx1

x2

x3

o1

(1)2,2w

(2)1,1w

(2)1,2w

(2)1,3w

(1)2y

(1)3y

h = layer no.

i = node i of layer h

( ) ( ) ( 1) ( ) ( 1) ( ) ( 1) ( ) ( 1) ( ),1 1 ,2 2 ,3 3 ,

( ) ( 1) ( ),

( )

( )

h h h h h h h h h hi i i i i m m i

h h hi j j i

j

y f w y w y w y w y

f w y

θ

θ

− − − −

−

= + + + +

= +∑L

where

(0) ( ) input and Output Nj j i iy x j y o i= = = =

(1)2,1w

(1)2,3w


Multilayer Perceptron : How it works

x1 x2 y

0 0 0

0 1 1

1 0 1

1 1 0

Function XOR

(1)2,2w y2

ox1

x2

y1(1)1,1w (2)

1,1w

(2)1,2w

(1)2,1w

(1) (1) (1)1 1,1 1 1,2 2 1

(1) (1) (1)2 2,1 1 2,2 2 2

( )

( )

y f w x w x

y f w x w x

θ

θ

= + +

= + +

(2) (2) (2)1,1 1 1,2 2 1( )o f w y w y θ= + +

Layer 1

Layer 2

f( ) = Activation (or Transfer) function

(1)1,2w


x1

x2

(0,0)

(0,1)

(1,0)

(1,1)

(1) (1) (1)1 1,1 1 1,2 2 1 0Line L w x w x θ⇒ + + =

(1) (1) (1)2 2,1 1 2,2 2 2 0Line L w x w x θ⇒ + + =

Multilayer Perceptron : How it works (cont.)

x1 x2 y1 y2

0 0 0 0

0 1 1 0

1 0 1 0

1 1 1 1

Outputs at layer 1

y2

x1

x2

y1(1)1,1w

(1)2,1w(1)1,2w

(1)2,2w



(0,0) (1,0)

(1,1)

y1

y2

y1-y2 space

x1

x2

(0,0)

(0,1)

(1,0)

(1,1)

Class 1

Class 0

x1-x2 space

Linearlyseparable !

Inside layer 1

(1) (1) (1)1 1,1 1 1,2 2 1 0Line L w x w x θ⇒ + + =

(1) (1) (1)2 2,1 1 2,2 2 2 0Line L w x w x θ⇒ + + =



Inside output layer

y1

y2

(0,0) (1,0)

(1,1)

y1-y2 space

Class 0

Class 1

Space y1-y2 is linearly separable.Therefore the line L3 can classify(separate) class 0 and class 1.

(2) (2) (2)3 1,1 1 1,2 2 1 0Line L w y w y θ⇒ + + =

oy1

(2)1,1w

(2)1,2wy2



How hidden layers work

- Try to map data in hidden layer to be a linearly separable,before transferring these data into output layer

- Finally the data in hidden layer should be linearly separable.

- There may be more than one hidden layer in order to map data to be linearly separable.

- Generally Activation function of each layer is not necessaryto be a Hard limit (Thresholding) function and not to bethe same function.


output is:

How can we adjust weights?

Assume we have a function 21 2xxy +=

And we want to use a single layer perceptron to approximatethis function.

yw1

w2

x1

x2

^

-We need to adjust w1 and w2 in order to obtain

y is close to y (or equal to)

^

In this case: activation function is identity function(Linear function) f(x) = x

2211ˆ xwxwy +=

^


Delta Learning Rule (Widrow-Hoff Rule)

Consider the Mean Square Error, MSE

22211

22

)(

)ˆ(

xwxwy

yy

−−=

−=ε means average

ε2 is a function of w1 and w2 as see on this below graph

w1w2

MSE

This graph called error surface(parabola)


0 0.5 1 1.5 21

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

w2

w1

Mean square error ε2 as a function of w1 and w2

The minimumpoint is (1,2).Because MSE = 0

Therefore, w1 and w2 must be adjusted in order to reach the minimum point in this error surface



0 0.5 1 1.5 21

1.2

1.4

1.6

1.8

2

2.2

2.4

2.6

2.8

3

w2

w1

w1 and w2 are adjusted to the minimum point like this:

Initial values(w1,w2)

Adjusted No. 1

Adjusted No. 2

Adjusted No. 3

Adjusted No. k

Target



What is the direction of steepest descent?In what direction will the function decreaseMost rapidly.

1. calculate gradient of error surface in the current position(w1,w2), gradient direction is steepest(Go to Hill Direction)

2. Walk to the opposite site of gradient (adjust w1,w2)

3. Go to Step 1 until reach the minimum point

Gradient Descent Method


Backpropagation Algorithm

Output layer

Input layer

Hiddenlayer

2-Layer case

(1) (1),

(1)

( )

( )

j j i i ji

j

y f w x

f h

θ= +

=

∑

ix

(1),j iw

(2) (2),

(2)

ˆ ( )

( )

k k j j kj

k

o f w y

f h

θ= +

=

∑

(2),k jw

ko

( ) weighted sum of input

of Node in Layer

nmh

m n

=


2-Layer case2 2

(2) (2) 2,

(2) (1) (1) (2) 2, ,

ˆ( ) (2.1)

( ( )) (2.2)

( ( ( ) ))

k kk

k k j j kk j

k k j j i i j kj i

o o

o f w y

o f w f w x

ε

θ

θ θ

= −

= − +

= − ⋅ + +

∑

∑ ∑

∑ ∑ (2.3)k

∑

w(2)k,j

2(2)

(2),

ˆ 2 ( ) ( )k k k jk j

o o f h ywε∂ ′= − ⋅ − ⋅ ⋅

∂

The derivative of ε2 with respect to θ (2)k

2(2)

(2)ˆ 2 ( ) ( )k k k

k

o o f hε

θ∂ ′= − ⋅ − ⋅

∂

Backpropagation Algorithm (cont.)

The derivative of ε2 with respect to


2-Layer case

The derivative of ε2 with respect to w(1)j,i

2(2) (2),(1) (1)

, ,

(2) (2) (2) (1) (1), , ,

(2) (2) (1),

ˆ 2 ( ) ( ( )

ˆ 2 ( ) ( ) ( )

ˆ 2 ( ) ( ) ( )

k k k j j kk jj i j i

k k k j j k k j j i i j ik j j

k k k k j j ik

o o f w yw w

o o f w y w f w x x

o o f h w f h x

εθ

θ θ

∂ ∂= − ⋅ − ⋅ +

∂ ∂

′ ′= − ⋅ − ⋅ + ⋅ ⋅ ⋅ + ⋅

′ ′= − ⋅ − ⋅ ⋅ ⋅ ⋅

∑ ∑

∑ ∑ ∑

∑

2 (2) (2) 2,

(2) (1) (1) (2) 2, ,

( ( )) (2.2)

( ( ( ) )) (2.3)

k k j j kk j

k k j j i i j kk j i

o f w y

o f w f w x

ε θ

θ θ

= − +

= − ⋅ + +

∑ ∑

∑ ∑ ∑



2(2) (2) (1)

,(1),

ˆ 2 ( ) ( ) ( )k k k k j j ikj i

o o f h w f h xwε∂ ′ ′= − ⋅ − ⋅ ⋅ ⋅ ⋅ ∂

∑

Taking the derivative of e2 with respect to w(1)j,i in order to adjust

the weight connecting the Node j of current layer (Layer 1) with Node i of Lower Layer (Layer 0)

Error from upper Node k

Derivative ofupper Node k

Weight betweenupper Node kand Node j of current layer

Derivative of Node j of current layer

Input fromlower Node i

This part is the back propagation of errorto Node j at current layer



derivative of ε2

with w(2)k,j

2(2)

2,

ˆ 2 ( ) ( )k k k jk j

o o f h ywε∂ ′= − ⋅ − ⋅ ⋅

∂

2(2) (2) (1)

,(1),

ˆ 2 ( ) ( ) ( )k k k k j j ikj i

o o f h w f h xwε∂ ′ ′= − ⋅ − ⋅ ⋅ ⋅ ⋅ ∂

∑derivative of e2

with w(1)j,i

Error at current node Input from lower node

Derivative of current node



2( ) ( ) ( ) ( 1), ( )

,

2( ) ( ) ( )

( )

( )

( )

n n n nj i j j in

j i

n n nj j jn

j

w f h xw

f h

εα α

εθ α α

θ

−∂ ′∆ = − ⋅ = ⋅∆ ⋅ ⋅∂

∂ ′∆ = − ⋅ = ⋅∆ ⋅∂

Updating Weights : Gradient Descent Method

Updating weights and bias

( ) ( ) ( ), , ,

( ) ( ) ( )

( ) ( )

( ) ( )

n n nj i j i j i

n n nj j j

w new w old w

new oldθ θ θ

= + ∆

= + ∆


Adjusting Weights for a Nonlinear Function (Unit)

xexf β21

1)( −+

=

calculation f ? , in case of nonlinear (function) unit

))(1()(2)( xfxfxf −⋅⋅=′ β

1. Sigmoid function

We get

2. Function tanh(x) )tanh()( xxf β=

We get ))(1()( 2xfxf −⋅=′ β

Special case of f ?It’s easy to calculate f ´


Backpropagation Calculation DemonstrationBackpropagation Calculation Demonstration


Example : Application of MLP for classification

Example: Run_XOR_MLP_Newff.m% Run_XOR_MLP_Newff.mP = [0 0 1 1; 0 1 0 1]; % XOR FunctionT = [0 1 1 0]plotpv(P,T,[-1, 2, -1, 2]); % plot dataPR = [min(P(1,:)) max(P(1,:));

min(P(2,:)) max(P(2,:))]; S1 = 2; S2 = 1; TF1 = 'logsig';TF2 = 'logsig';PF = 'mse';%net = newff(PR,[S1 S2],{TF1 TF2});%net.trainParam.epochs = 100;net.trainParam.goal = 0.001;net = train(net,P,T);%


-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

Class 1Class 0

Example : Application of MLP for classification

x = randn([2 200]);

o = (x(1,:).^2+x(2,:).^2)<1;

Example: Run_MLP_Random.mMatlab command : Create training data

Input pattern x1 and x2 generated from random numbers

Desired output o:if (x1,x2) lies in a circle of radius 1centered at the origin then

o = 1else

o = 0

x2

x1


PR = [min(x(1,:)) max(x(1,:));

min(x(2,:)) max(x(2,:))];

S1 = 10;

S2 = 1;

TF1 = 'logsig';

TF2 = 'logsig';

BTF = 'traingd';

BLF = 'learngd';

PF = 'mse';

net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);

Matlab command : Create a 2-layer network

Range of inputs

No. of nodes in Layers 1 and 2

Activation functions of Layers 1 and 2

Training functionLearning function

Cost function

Command for creating the network

Example : Application of MLP for classification (cont.)


Matlab command : Train the network

net.trainParam.epochs = 2000;

net.trainParam.goal = 0.002;

net = train(net,x,o);

y = sim(net,x);

netout = y>0.5;

No. of training roundsMaximum desired errorTraining command

Compute network outputs (continuous)Convert to binary outputs



Network structure

x2

x1

Hidden nodes(sigmoid)

Output node (Sigmoid)

Input nodes

Threshold unit(for binary output)



Initial weights of the hidden layer nodes (10 nodes) displayed as Lines w1x1+w2x2+θ = 0

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

Class 1Class 0


3.29 Gp.Capt.Thanapant Raicharoen, PhDArtificial Neural Network MSE vs training epochs

Training algorithm: Gradient descent method

0 0.5 1 1.5 2

x 104

10-3

10-2

10-1

100 Performance is 0.151511, Goal is 0.002

20000 Epochs

Trai

ning

-Blu

e G

oal-B

lack



Results obtained using the Gradient descent method

Classification Error : 40/200

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

Class 1Class 0


3.31 Gp.Capt.Thanapant Raicharoen, PhDArtificial Neural Network MSE vs training epochs (success with in only 10 epochs!)

Training algorithm: Levenberg-Marquardt Backpropagation

0 2 4 6 8 1010

-3

10-2

10-1

100 Performance is 0.00172594, Goal is 0.002

10 Epochs

Trai

ning

-Blu

e G

oal-B

lack



Results obtained using the Levenberg-Marquardt Backpropagation

Classification Error : 0/200

-3 -2 -1 0 1 2 3-3

-2

-1

0

1

2

Class 1Class 0

Unused node

Only 6 hidden nodes are adequate !



Summary: MLP for Classification Problem

- Each lower layer (hidden) Nodes of Neural Network createa local boundary decision.

- The upper layer Nodes of Neural Network combine all localboundary decisions to a global boundary decision.



PR = [min(x) max(x)]

S1 = 6;

S2 = 1;

TF1 = 'logsig';

TF2 = 'purelin';

BTF = 'trainlm';

BLF = 'learngd';

PF = 'mse';

net = newff(PR,[S1 S2],{TF1 TF2},BTF,BLF,PF);

Example: Run_MLP_SinFunction.mMatlab command : Create a 2-layer network

Range of inputs




Cost function


Example: Application of MLP for function approximation


Network structure

Hidden nodes(sigmoid)

Output node (Linear)

Input nodes x y



Example: Run_MLP_SinFunction.m


% Run_MLP_SinFunction.mp=0:0.25:5;t = sin(p);figure;plot(p,t,'+b'); axis([-0.5 5.5 -1.5 1.5 ]); %net = newff([0 10],[6,1],{'logsig','purelin'},'trainlm');%net.trainParam.epochs = 50; net.trainParam.goal = 0.01;net = train(net,p,t);%a = sim(net,p);hold on;plot(p,a,'.r');%…



S1 = 3;

S2 = 1;

TF1 = 'logsig';

TF2 = 'purelin';

BTF = 'trainlm';

BLF = 'learngd';

PF = 'mse';

net = newff(PR,[S1 S2],{TF1TF2},BTF,BLF,PF);


Range of inputs




Cost function





0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

Input x

Out

put y

Function to be approximated

x = 0:0.01:4;

y = (sin(2*pi*x)+1).*exp(-x.^2);


0 0.5 1 1.5 2 2.5 3 3.5 40

0.2

0.4

0.6

0.8

1

1.2

1.4

1.6

1.8

2Desired outputNetwork output

Networkstructure

No. of hidden nodes is too small !

Function approximated using the network




S1 = 5;

S2 = 1;

TF1 = 'radbas';

TF2 = 'purelin';

BTF = 'trainlm';

BLF = 'learngd';

PF = 'mse';

net = newff(PR,[S1 S2],{TF1TF2},BTF,BLF,PF);


Range of inputs




Cost function




Function approximated using the network

0 0.5 1 1.5 2 2.5 3 3.5 4-0.5

0

0.5

1

1.5

2Desired outputNetwork output




Summary: MLP for Function Approximation Problem

- Each lower layer (hidden) nodes of Neural Network createa local (short) approximated function.

- The upper layer Nodes of Neural Network combine all local approximated function to global approximated function coverall input range.


Summary

§ Backpropagation can train multilayer feed-forward networks with differentiable transfer functions to perform function approximation, pattern association, and pattern classification. § The term backpropagation refers to the process by which derivatives of network error, with respect to network weights and biases, can be computed.§ The number of inputs and outputs to the network are constrained by the problem. However, the number of layersbetween network inputs and the output layer and the sizes of the layers are up to the designer. § The two-layer sigmoid/linear network can represent any functional relationship between inputs and outputs if the sigmoid layer has enough neurons.


Programming in MATLAB ExerciseProgramming in MATLAB Exercise

n Exercise:

1. Write MATLAB to solve the question 1 in Exercise 4.

2. Write MATLAB to solve the question 2 in Exercise 4.

Documents

Programming in MATLABureerat/321641/...Artificial Neural Network 3.2 Gp.Capt.Thanapant Raicharoen, PhD Outline nLimitation of Single layer Perceptron nMulti Layer Perceptron (MLP)