CS434a/541a: Pattern Recognition Prof. Olga Vekslerolga/Courses/CS434a_541a/Lecture4.pdf · 3 Why...

Preview:

Citation preview

1

CS434a/541a: Pattern RecognitionProf. Olga Veksler

Lecture 4

2

Outline

� Normal Random Variable� Properties� Discriminant functions

3

Why Normal Random Variables?

� Analytically tractable� Works well when observation comes

form a corrupted single prototype (µµµµ)� Is an optimal distribution of data for

many classifiers used in practice

4

The Univariate Normal Density

Where: µµµµ = mean (or expected value) of x

σσσσ2 = variance

,x

21

exp21

)x(p2

������������

����

������������

������������

����

���� −−−−−−−−====σσσσ

µµµµσσσσππππ

� x is a scalar (has dimension 1)

5

6

Several Features

� What if we have several features x1, x2, …, xd� each normally distributed� may have different means� may have different variances� may be dependent or independent of each other

� How do we model their joint distribution?

7

covariance of x1 and xd

inverse of ΣΣΣΣ

determinant of ΣΣΣΣ

� Multivariate normal density in d dimensions is:

��������

������������

���� −−−−−−−−−−−−==== −−−− )x()x(21

exp)2(

1)x(p 1t

2/12/dµµµµΣΣΣΣµµµµ

ΣΣΣΣππππ

The Multivariate Normal Density

��������

����

����

��������

����

����====

2d1d

d121

σσσσσσσσ

σσσσσσσσΣΣΣΣ

���

µµµµ = [µµµµ1, µµµµ2, …, µµµµd]t

x = [x1, x2, …, xd]t

� Each xi is N(µµµµi,σσσσi2)

� to prove this, integrate out all other features from the joint density

8

More on ΣΣΣΣ

� plays role similar to the role that σσσσ2 plays in one dimension

� From ΣΣΣΣ we can find out 1. The individual variances of features

x1, x2, …, xd

2. If features xi and xj are � independent σσσσij=0� have positive correlation σσσσij>0� have negative correlation σσσσij<0

��������

����

����

��������

����

����====

2d1d

d121

σσσσσσσσ

σσσσσσσσΣΣΣΣ

���

9

The Multivariate Normal Density

� If ΣΣΣΣ is diagonal then the features xi,…, xj are independent, and

∏∏∏∏====

��������

������������

���� −−−−−−−−====d

i i

ii

i

xxp

1

2

22)(

exp2

1)(

σσσσµµµµ

ππππσσσσ

������������

����

����

������������

����

����

23

22

21

000000

σσσσσσσσ

σσσσ

10

scalar s (single number), the closer s to 0 the larger is p(x)normalizing

constant

��������

������������

���� −−−−−−−−−−−−==== −−−− )x()x(21

exp)2(

1)x(p 1t

2/12/dµµµµΣΣΣΣµµµµ

ΣΣΣΣππππ

The Multivariate Normal Density

[[[[ ]]]]������������

����

����

������������

����

����

��������

����

����

��������

����

����

−−−−−−−−−−−−

������������

����

����

������������

����

����

−−−−−−−−−−−−−−−−⋅⋅⋅⋅====

−−−−

33

22

11

1

232313

232212

131221

332211

xxx

xxx21

expc)x(pµµµµµµµµµµµµ

σσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσσ

µµµµµµµµµµµµ

� Thus P(x) is larger for smaller )x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

� ΣΣΣΣ is positive semi definite (xtΣΣΣΣ x>=0)� If xtΣΣΣΣ x=0 for nonzero x then det(ΣΣΣΣ)=0. This case is

not interesting, p(x) is not defined1. one feature vector is a constant (has zero

variance) 2. or two components are multiples of each other

� so we will assume ΣΣΣΣ is positive definite (xtΣΣΣΣ x >0)

� If ΣΣΣΣ is positive definite then so is Σ Σ Σ Σ −−−−1111

)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

t

t

21

21

1 ΜΜΜΜΜΜΜΜΦΛΦΛΦΛΦΛΦΛΦΛΦΛΦΛΣΣΣΣ ====������������

����

����������������

����

����====

−−−−−−−−−−−−

� Thus if ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 denotes matrix s.t. ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 ΛΛΛΛ−−−−1/2 1/2 1/2 1/2 = ΛΛΛΛ−−−−1111

� Positive definite matrix of size d by d has d distinct real eigenvalues and its d eigenvectors are orthogonal

� Thus if ΦΦΦΦ is a matrix whose columns are normalized eigenvectors of ΣΣΣΣ, then Φ Φ Φ Φ -1= Φ Φ Φ Φ t

� ΣΦΣΦΣΦΣΦ =ΦΛΦΛΦΛΦΛ where ΛΛΛΛ is a diagonal matrix with corresponding eigenvalues on the diagonal

� Thus ΣΣΣΣ =ΦΛΦΦΛΦΦΛΦΦΛΦ−−−−1111 and ΣΣΣΣ −−−−1111 =ΦΛΦΛΦΛΦΛ−−−−1111 Φ Φ Φ Φ −−−−1111

)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

� Thus

� Thus

where

====−−−−−−−−====−−−−−−−− −−−− )x(MM)x()x()x( tt1t µµµµµµµµµµµµΣΣΣΣµµµµ

(((( )))) (((( )))) 2tttt )x(M)x(M)x(M µµµµµµµµµµµµ −−−−====−−−−−−−−====

21 )()()( µµµµµµµµµµµµ −−−−====−−−−ΣΣΣΣ−−−− −−−− xMxx tt

121

tM −−−−−−−−==== ΦΦΦΦΛΛΛΛ

� Points x which satisfy lie on an

ellipse

constxMt ====−−−−2

)( µµµµ

rotation matrix

scaling matrix

)x()x( t µµµµµµµµ −−−−−−−−

µ

points x at equal Eucledian

distance from µµµµlie on a circle

µ

points x at equal Mahalanobis distance from

µµµµ lie on an ellipse: Σstretches circles to ellipses

)x()x( 1t µµµµΣΣΣΣµµµµ −−−−−−−− −−−−

usual (Eucledian) distance between x and µµµµ

p(x) decreases

−−−−−−−− −−−− )x()x( 1t µµµµµµµµMahalanobis distance

between x and µµµµ

p(x) decreases slowp(

x) d

ecre

ases

fast

eigenvectors of ΣΣΣΣ

15

2-d Multivariate Normal Density

� Can you see much in this graph?

� At most you can see that the mean is around [0,0], but can’t really tell if x1 and x2 are correlated

16

2-d Multivariate Normal Density

� How about this graph?

17

2-d Multivariate Normal Density

� Level curves graph� p(x) is constant along

each contour� topological map of 3-d

surface

� Now we can see much more � x1 and x1 are independent� σσσσ1

2 and σσσσ22 are equal

µµµµ

18

2-d Multivariate Normal Density

������������

������������==== 10

01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ������������

������������==== 10

01ΣΣΣΣ

[[[[ ]]]]1,1====µµµµ

������������

������������==== 40

01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ

������������

������������==== 10

04ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ

2-d Multivariate Normal Density

������������

������������==== 15.0

5.01ΣΣΣΣ

[[[[ ]]]]0,0====µµµµ

������������

������������==== 19.0

9.01ΣΣΣΣ������������

������������==== 49.0

9.01ΣΣΣΣ

������������

������������−−−−

−−−−==== 15.05.01ΣΣΣΣ

������������

������������−−−−

−−−−==== 19.09.01ΣΣΣΣ

������������

������������−−−−

−−−−==== 49.09.01ΣΣΣΣ

20

The Multivariate Normal Density

� If X has density N(µµµµ,ΣΣΣΣ) then AX has density N(ΑΑΑΑtµµµµ,ΑΑΑΑtΣΑΣΑΣΑΣΑ)� Thus X can be transformed into a spherical normal

variable (covariance of spherical density is the identity matrix I) with whitening transform

21−−−−

ΦΛΦΛΦΛΦΛ====wA

������������

������������==== 19.0

9.01ΣΣΣΣ������������

������������==== 10

01ΣΣΣΣ

X AX

Discriminant Functions� Classifier can be viewed as network which

computes m discriminant functions and selects category corresponding to the largest discriminant

features

discriminantfunctions

select classgiving maximim

(((( ))))xg2 (((( ))))xgm(((( ))))xg1

1x 2x 3x dx

� gi(x) can be replaced with any monotonically increasing function, the results will be unchanged

22

Discriminant Functions

� The minimum error-rate classification is achieved by the discriminant function

gi(x) = P(ci |x)=P(x|ci)P(ci)/P(x)

� Since the observation x is independent of the class, the equivalent discriminant function is

gi(x) = P(x|ci)P(ci)

� For normal density, convinient to take logarithms. Since logarithm is a monotonically increasing function, the equivalent discriminant function is

gi(x) = ln P(x|ci)+ ln P(ci)

Discriminant Functions for the Normal Density

��������

������������

���� −−−−ΣΣΣΣ−−−−−−−−ΣΣΣΣ

==== −−−− )()(21

exp)2(

1)|( 1

2/12/ iit

ii

di xxcxp µµµµµµµµππππ

)(lnln21

2ln2

)()(21

)( 1iiii

tii cP

dxxxg ++++ΣΣΣΣ−−−−−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− ππππµµµµµµµµ

� Plug in p(x|ci) and P(ci) get

� Discriminant function gi(x) = ln P(x|ci)+ ln P(ci)

� Suppose we for class ci its class conditional density p(x|ci) is N(µi,Σi)

)(lnln21

)()(21

)( 1iiii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− µµµµµµµµ

constant for all i

24

� That is

Case ΣΣΣΣi = σσσσ2I

������������

����

������������

����⋅⋅⋅⋅====

��������

����

����

��������

����

����====

100010001

000000

2

2

2

2

i σσσσσσσσ

σσσσσσσσ

� In this case, features x1, x2. ,…, xd are independent with different means and equal variances σ2

µµµµ

25

� Discriminant function

Case ΣΣΣΣi = σσσσ2I

)(lnln21

)()(21

)( 1iii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−−−−−−−−−==== −−−− µµµµµµµµ

� Can simplify discriminant function

(((( )))) )(lnln21

)()(21

)( 22 i

di

tii cPx

Ixxg ++++−−−−−−−−−−−−−−−−==== σσσσµµµµ

σσσσµµµµ

constant for all i

====++++−−−−−−−−−−−−==== )(ln)()(2

1)( 2 ii

tii cPxxxg µµµµµµµµ

σσσσ

)(ln2

1 2

2 ii cPx ++++−−−−−−−−==== µµµµσσσσ

� Det(ΣΣΣΣi)=σσσσ2d and ΣΣΣΣi-1=(1/σσσσ2)I

������������������������

����

����

������������������������

����

����

====

2

2

2

100

01

0

001

σσσσ

σσσσ

σσσσ

Case ΣΣΣΣi = σσσσ2I Geometric Interpretation

(((( )))) 2ii xxg µµµµ−−−−−−−−====

then),c(Pln)c(PlnIf ji ====

(((( )))) )c(Plnx2

1xg i

2i2i ++++−−−−−−−−==== µµµµ

σσσσ

then),c(Pln)c(PlnIf ji ≠≠≠≠

decision region for c2

decision region for c3

decision region for c1

in c3

1µµµµ

2µµµµ

3µµµµ

decision region for c2

decision region for c3

decision region for c1

x

1µµµµ

2µµµµ

3µµµµ

voronoi diagram: points in each cell are closer to the mean in that cell

than to any other mean

27

Case ΣΣΣΣi = σσσσ2I====++++−−−−−−−−−−−−==== )(ln)()(

21

)( 2 iit

ii cPxxxg µµµµµµµµσσσσ

)c(Pln)xxxx(2

1ii

tii

tti

t2 ++++++++−−−−−−−−−−−−==== µµµµµµµµµµµµµµµµ

σσσσconstant

for all classes

====++++++++−−−−−−−−==== )c(Pln)x2(2

1)x(g ii

ti

ti2i µµµµµµµµµµµµ

σσσσ

0)( itii wxwxg ++++====

))c(Pln2

(x i2i

ti

2

ti ++++−−−−++++

σσσσµµµµµµµµ

σσσσµµµµ

discriminant function is linear

Case ΣΣΣΣi = σσσσ2I

0)( itii wxwxg ++++====

� Thus discriminant function is linear,� Therefore the decision boundaries

gi(x)=gj(x) are linear� lines if x has dimension 2� planes if x has dimension 3� hyper-planes if x has dimension larger than 3

linear in x:

====

====d

iii

ti xwxw

1

constant in x

29

Case ΣΣΣΣi = σσσσ2I: Example� 3 classes, each 2-dimensional Gaussian with

������������

������������==== 21

1µµµµ������������

������������==== 64

2µµµµ������������

������������−−−−==== 4

23µµµµ ��������

����������������============ 30

03321 ΣΣΣΣΣΣΣΣΣΣΣΣ

� Priors and(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

������������

����

����++++−−−−++++==== )c(Pln

2x)x(g i2

iti

2

ti

i σσσσµµµµµµµµ

σσσσµµµµ

� Discriminant function is

[[[[ ]]]])38.1

65

(x321

)x(g1 −−−−−−−−++++====

� Plug in parameters for each class[[[[ ]]]]

)38.1652

(x364

)x(g2 −−−−−−−−++++====

[[[[ ]]]])69.0

620

(x3

42)x(g3 −−−−−−−−++++

−−−−====

30

Case ΣΣΣΣi = σσσσ2I: Example� Need to find out when gi(x) < gj(x) for i,j=1,2,3

� Can be done by solving gi(x) = gj(x) for i,j=1,2,3

� Let’s take g1(x) = g2(x) first

� Simplifying, [[[[ ]]]]647

xx

343

2

1 −−−−====������������

��������

����−−−−−−−−

647

x34

x 21 −−−−====−−−−−−−−

line equation

[[[[ ]]]] [[[[ ]]]])38.1

652

(x364

)38.165

(x321 −−−−−−−−++++====−−−−−−−−++++

31

Case ΣΣΣΣi = σσσσ2I: Example� Next solve g2(x) = g3(x)

02.6x32

x2 21 ====++++

� Almost finally solve g1(x) = g3(x)

81.1x32

x 21 −−−−====−−−−

� And finally solve g1(x) = g2(x) = g3(x)

82.4xand4.1x 21 ========

32

Case ΣΣΣΣi = σσσσ2I: Example

� Priors and(((( )))) (((( ))))41

cPcP 21 ========

c1

c2

c3

(((( ))))21

cP 3 ====

g1 (x) = g

2 (x)

g2 (x) = g

3 (x)

g 1(x

) = g 3

(x)

lines connecting means

are perpendicular to decision boundaries

33

� Covariance matrices are equal but arbitrary

Case ΣΣΣΣi = ΣΣΣΣ

� In this case, features x1, x2. ,…, xd are not necessarily independent

������������

������������==== 15.0

5.01ΣΣΣΣ

squared Mahalanobis Distance

� Discriminant function

Case ΣΣΣΣi = ΣΣΣΣ

)(lnln21

)()(21

)( 1iii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−−−−−−−−−==== −−−− µµµµµµµµ

constantfor all classes

� Discriminant function becomes

)c(Pln)x()x(21

)x(g ii1t

ii ++++−−−−−−−−−−−−==== −−−− µµµµµµµµ

� Mahalanobis Distance −−−−−−−−====−−−− −−−−−−−− )yx()yx(yx 1t2

1ΣΣΣΣ

� If ΣΣΣΣ=I, Mahalanobis Distance becomes usual Eucledian distance

)yx()yx(yx t2

I 1 −−−−−−−−====−−−− −−−−

Eucledian vs. Mahalanobis Distances

−−−−−−−−====−−−− −−−−ΣΣΣΣ−−−− )()( 12

1 µµµµµµµµµµµµ xxx t)()(2 µµµµµµµµµµµµ −−−−−−−−====−−−− xxx t

µ

points x at equal Eucledian

distance from µµµµlie on a circle

µ

points x at equal Mahalanobis distance from

µµµµ lie on an ellipse:Σ stretches cirles to ellipses

eigenvectors of Σ

Case ΣΣΣΣi = ΣΣΣΣ Geometric Interpretation

(((( )))) 1ii xxg −−−−−−−−−−−−==== ΣΣΣΣµµµµ

then),c(Pln)c(PlnIf ji ====

(((( )))) )c(Plnx21

xg iii 1 ++++−−−−−−−−==== −−−−ΣΣΣΣµµµµ

then),c(Pln)c(PlnIf ji ≠≠≠≠

1µµµµ

2µµµµ

3µµµµ

decision region for c2

decision region for c3

decision region for c1

1µµµµ

2µµµµ

3µµµµ

decision region for c2

decision region for c3

decision region for c1

points in each cell are closer to the mean in that cell than to any other mean under Mahalanobis distance

Case ΣΣΣΣi = ΣΣΣΣ� Can simplify discriminant function:

====++++−−−−−−−−−−−−==== −−−− )(ln)()(21

)( 1ii

tii cPxxxg µµµµµµµµ

(((( )))) ====++++ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−−−−−− )(ln21 1111

iitii

tti

t cPxxxx µµµµµµµµµµµµµµµµ

(((( )))) ====++++ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(ln221 111

iiti

ti

t cPxxx µµµµµµµµµµµµ

constant for all classes

(((( )))) )(ln221 11

iiti

ti cPx ++++ΣΣΣΣ++++ΣΣΣΣ−−−−−−−−==== −−−−−−−− µµµµµµµµµµµµ

� Thus in this case discriminant is also linear

0iti wxw ++++====����

����

����

���� −−−−++++==== −−−−−−−−i

1tii

1ti 2

1)c(Plnx µµµµΣΣΣΣµµµµΣΣΣΣµµµµ

Case ΣΣΣΣi = ΣΣΣΣ: Example� 3 classes, each 2-dimensional Gaussian with

������������

������������==== 21

1µµµµ������������

������������−−−−==== 5

12µµµµ

������������

������������−−−−==== 4

23µµµµ

������������

������������−−−−

−−−−============ 45.15.11

321 ΣΣΣΣΣΣΣΣΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

� Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

scalarrow vector

Case ΣΣΣΣi = ΣΣΣΣ: Example

��������

����

���� −−−−++++====��������

����

���� −−−−++++ −−−−−−−−−−−−−−−−i

1tii

1tij

1tjj

1tj 2

1)c(Plnx

21

)c(Plnx µµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµ

(((( )))) ��������

����

���� −−−−++++��������

����

���� −−−−−−−−====−−−− −−−−−−−−−−−−−−−−i

1tiij

1tjj

1ti

1tj 2

1)c(Pln

21

)c(Plnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµΣΣΣΣµµµµ

(((( ))))��������

����

����

����−−−−++++====−−−− −−−−−−−−−−−−

i1t

ij1t

jj

i1ti

tj 2

121

)c(P)c(P

lnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµ

� Let’s solve in general first

(((( )))) (((( ))))xgxg ij ====

� Let’s regroup the terms

� We get the line where (((( )))) (((( ))))xgxg ij ====

Case ΣΣΣΣi = ΣΣΣΣ: Example

[[[[ ]]]] 0x02 ====−−−−

� Now substitute for i,j=1,2

[[[[ ]]]] 41.2x4.114.3 −−−−====−−−−−−−−

� Now substitute for i,j=2,3

[[[[ ]]]] 41.2x43.114.5 −−−−====−−−−−−−−� Now substitute for i,j=1,3

(((( ))))��������

����

����

����−−−−++++====−−−− −−−−−−−−−−−−

i1t

ij1t

jj

i1ti

tj 2

121

)c(P)c(P

lnx µµµµΣΣΣΣµµµµµµµµΣΣΣΣµµµµΣΣΣΣµµµµµµµµ

0x1 ====

41.2x4.1x14.3 21 ====++++

41.2x43.1x14.5 21 ====++++

41

Case ΣΣΣΣi = ΣΣΣΣ : Example

� Priors and(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

lines connecting means

are not in general perpendicular to

decision boundaries

c2

c3

c1

42

� Covariance matrices for each class are arbitrary

General Case ΣΣΣΣi are arbitrary

� In this case, features x1, x2. ,…, xd are not necessarily independent

������������

������������====ΣΣΣΣ 15.0

5.01i ��������

����������������−−−−

−−−−====ΣΣΣΣ 49.09.01

j

43

� From previous discussion,

General Case ΣΣΣΣi are arbitrary

� This can’t be simplified, but we can rearrange it:

)(lnln21

)()(21

)( 1iiii

tii cPxxxg ++++ΣΣΣΣ−−−−−−−−ΣΣΣΣ−−−−−−−−==== −−−− µµµµµµµµ

(((( )))) )(lnln21

221

)( 111iiii

tii

tii

ti cPxxxxg ++++ΣΣΣΣ−−−−ΣΣΣΣ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−==== −−−−−−−−−−−− µµµµµµµµµµµµ

��������

����

���� ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−++++ΣΣΣΣ++++��������

����

���� ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(lnln21

21

21

)( 111iiii

tii

tii

ti cPxxxxg µµµµµµµµµµµµ

0)( itt

i wxwWxxxg ++++++++====

44

General Case ΣΣΣΣi are arbitrary

0)( itt

i wxwWxxxg ++++++++====

� Thus the discriminant function is quadratic� Therefore the decision boundaries are quadratic

(ellipses and parabolloids)

quadratic in x since

======== ====

========d

jijiij

d

j

d

ijiij

t xxwxxwWxx1,1 1

linear in x

constant in x

� 3 classes, each 2-dimensional Gaussian with

������������

������������−−−−==== 3

11µµµµ

������������

������������==== 60

2µµµµ������������

������������−−−−==== 4

23µµµµ

������������

������������−−−−

−−−−==== 25.05.01

1ΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

� Again can be done by solving gi(x) = gj(x) for i,j=1,2,3

General Case ΣΣΣΣi are arbitrary: Example

������������

������������−−−−

−−−−==== 7222

2ΣΣΣΣ������������

������������==== 35.1

5.113ΣΣΣΣ

��������

����

���� ++++ΣΣΣΣ−−−−ΣΣΣΣ−−−−++++ΣΣΣΣ++++��������

����

���� ΣΣΣΣ−−−−==== −−−−−−−−−−−− )(lnln21

21

21

)( 111iiii

tii

tii

ti cPxxxxg µµµµµµµµµµµµ

� Need to solve a bunch of quadratic inequalities of 2 variables

� Priors: and

General Case ΣΣΣΣi are arbitrary: Example

������������

������������−−−−==== 3

11µµµµ

������������

������������==== 60

2µµµµ������������

������������−−−−==== 4

23µµµµ

������������

������������−−−−

−−−−==== 25.05.01

1ΣΣΣΣ������������

������������−−−−

−−−−==== 7222

2ΣΣΣΣ������������

������������==== 35.1

5.113ΣΣΣΣ

(((( )))) (((( ))))41

cPcP 21 ======== (((( ))))21

cP 3 ====

c2

c3 c1

c1

� The Bayes classifier when classes are normally distributed is in general quadratic� If covariance matrices are equal and proportional to

identity matrix, the Bayes classifier is linear� If, in addition the priors on classes are equal, the Bayes

classifier is the minimum Eucledian distance classifier� If covariance matrices are equal, the Bayes

classifier is linear� If, in addition the priors on classes are equal, the Bayes

classifier is the minimum Mahalanobis distance classifier

� Popular classifiers (Euclidean and Mahalanobis distance) are optimal only if distribution of data is appropriate (normal)

Important Points

Recommended