21
Non-Bayes classifiers. Linear discriminants, neural networks.

Non-Bayes classifiers. Linear discriminants, neural networks

Embed Size (px)

Citation preview

Page 1: Non-Bayes classifiers. Linear discriminants, neural networks

Non-Bayes classifiers.

Linear discriminants,

neural networks.

Page 2: Non-Bayes classifiers. Linear discriminants, neural networks

Discriminant functions(1)

2121 :?0)|()|( wwxwPxwP Bayes classification rule:

Instead might try to find a function:

21, :?0)(21

wwxf ww

)(21 , xf ww is called discriminant function.

}0)(|{21 , xfx ww

- decision surface

Page 3: Non-Bayes classifiers. Linear discriminants, neural networks

Discriminant functions (2)Class 1

Class 2

Class 1

Class 2

0, )(21

wxwxf Tww

Decision surface is a hyperplane 00 wxwT

Linear discriminant function:

Page 4: Non-Bayes classifiers. Linear discriminants, neural networks

Linear discriminant – perceptron cost function

x

Tx xwxJ )(

1 xand

0

x

w

wwReplace

Thus now decision function is and decision surface is

xwxf Tww )(

21 ,

0xwT

Perceptron cost function:

where

classifiedcorrectly is ,0

0 is and if ,1

0 is and if ,1

2

1

x

xwxwx

xwxwxT

T

x

Page 5: Non-Bayes classifiers. Linear discriminants, neural networks

Linear discriminant – perceptron cost function

x

Tx xwxJ )(

Perceptron cost function:Class 1

Class 2

Value of is proportional to the sum of distances of all misclassified samples to the decision surface.

)(xJ

If discriminant function separates classes perfectly, thenOtherwise, and we want to minimize it.

0)( xJ0)( xJ

is continuous and piecewise linear. So we might try to use gradient descent algorithm.

)(xJ

Page 6: Non-Bayes classifiers. Linear discriminants, neural networks

Linear discriminant – Perceptron algorithm

)(

)()()1(

twwt w

wJtwtw

Gradient descent:

At points where is differentiable )(xJ

x

xxδw

wJ

sifiedmisclas

)(

Thus x

xt xδtwtw

sifiedmisclas

)()1(

Perceptron algorithm converges when classes are linearly separable with some conditions on t

Page 7: Non-Bayes classifiers. Linear discriminants, neural networks

Sum of error squares estimation

xwx Twwf )(

21 ,Want to find discriminant functionwhose output is similar to

Let denote as desired output function, 1 for one class and –1 for the other.

1)( xy

)(xy

Use sum of error squares as similarity criterion:

)(minargˆ

)(1

2

ww

xww

wJ

yJN

ii

Ti

Page 8: Non-Bayes classifiers. Linear discriminants, neural networks

Sum of error squares estimationMinimize mean square error:

N

iii

N

i

Tii

N

ii

Tii

y

yJ

11

1

ˆ

0)(2)(

xwxx

xwxw

w

Thus

N

iii

N

i

Tii yw

1

1

1

ˆ xxx

Page 9: Non-Bayes classifiers. Linear discriminants, neural networks

Neurons

Page 10: Non-Bayes classifiers. Linear discriminants, neural networks

Artificial neuron.

1w

2w

lw

1x

2x

lx0w

f

Above figure represent artificial neuron calculating:

l

iiixwfy

1

Page 11: Non-Bayes classifiers. Linear discriminants, neural networks

Artificial neuron.Threshold functions f:

0

1

0

1

Step function Logistic function

00

01)(

x

xxf

axexf

1

1)(

Page 12: Non-Bayes classifiers. Linear discriminants, neural networks

Combining artificial neurons

1x

2x

lx

Multilayer perceptron with 3 layers.

Page 13: Non-Bayes classifiers. Linear discriminants, neural networks
Page 14: Non-Bayes classifiers. Linear discriminants, neural networks

Discriminating ability of multilayer perceptron

Since 3-layer perceptron can approximate any smooth function, it can approximate - optimal discriminant function of two classes.

)|()|()( 21 xwPxwPxF

Page 15: Non-Bayes classifiers. Linear discriminants, neural networks

Training of multilayer perceptronf

f

f

Layer r-1 Layer r

f

f

f

1rky r

jkwrjv

rjy

Page 16: Non-Bayes classifiers. Linear discriminants, neural networks

Training and cost functionDesired network output: )()( iyix

Trained network output: )(ˆ)( iyix

Cost function for one training sample:

Lk

mmm iyiyiE

1

2))(ˆ)((2

1)(

Total cost function:

N

i

iEJ1

)(

Goal of the training: find values of which minimize cost function .

rjkw

J

Page 17: Non-Bayes classifiers. Linear discriminants, neural networks

Gradient descentDenote: Tr

jkrj

rj

rj r

www ],...,,[110

w

rj

rj

rj

Joldnew

www

)()(Gradient descent:

Since , we might want to update weights after processing each training sample separately:

N

i

iEJ1

)(

rj

rj

rj

iEoldnew

www

)(

)()(

Page 18: Non-Bayes classifiers. Linear discriminants, neural networks

Gradient descent

)()(

)()(

)(

)()( 1 iyiv

iEiv

iv

iEiE rrj

rj

rj

rj

rj

ww

Chain rule for differentiating composite functions:

Denote: )(

)()(

iv

iEi

rj

rj

Page 19: Non-Bayes classifiers. Linear discriminants, neural networks

BackpropagationIf r=L, then

))(()())(())(ˆ))(((

))(ˆ))(((2

1

)()(

)()(

1

2

ivfieivfiyivf

iyivfiviv

iEi

Ljj

Ljj

Lj

k

mm

LmL

jLj

Lj

L

If r<L, then

rr

r

k

k

rj

rkj

rj

k

krj

rjr

j

k

krj

rj

rj

rj

rj

ivfwiiv

ivi

iv

iv

iv

iE

iv

iEi

1

1

11

111

1

))(()()(

)()(

)(

)(

)(

)(

)(

)()(

Page 20: Non-Bayes classifiers. Linear discriminants, neural networks

Backpropagation algorithm

• Initialization: initialize all weights with random values.• Forward computations: for each training vector x(i) compute all • Backward computations: for each i, j and r=L, L-1,…,2 compute • Update weights:

)(1 irj

)( ),( iyiv rj

rj

)()()(

)()()(

1 iyiold

iEoldnew

rrj

rj

rj

rj

rj

w

www

Page 21: Non-Bayes classifiers. Linear discriminants, neural networks

MLP issues• What is the best network configuration?• How to choose proper learning parameter ?• When training should be stopped?• Choose another threshold function f or cost function J?