Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification

All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000

with the permission of the authors and the publisher

Chapter 2 (part 3)Bayesian Decision Theory

(Sections 2-6,2-9)

• Discriminant Functions for the Normal Density

• Bayes Decision Theory – Discrete Features

Pattern Classification, Chapter 2 (Part 3)

2Discriminant Functions for the Normal Density

• We saw that the minimum error-rate classification can be achieved by the discriminant function

gi(x) = ln P(x | ωi) + ln P(ωi)

• Case of multivariate normal

)(lnln212ln

2)()(

21)(

1

iii

it

ii Pdxxxg ωπμμ +Σ−−−−−= ∑−


3

Case Σi = σ2I (I stands for the identity matrix)

• What does “Σi = σ2I” say about the dimensions?

• What about the variance of each dimension?

normEuclidean thedenotes where

)(ln2

)(

:osimplify tcan weThus

)(lnln212ln

2)()(

21)(

in oft independen are ln (d/2) and both :Note

2

2

1

i

⋅

+−

−=

+Σ−−−−−=

Σ

∑−

ii

i

iii

it

ii

Pxg

Pdxxxg

i

ωσμχ

ωπμμ

π


4

• We can further simplify by recognizing that the quadratic term xtx implicit in the Euclidean norm is the same for all i.

)category!th for the threshold thecalled is (

)(ln2

1 ;

:wherefunction)nt discrimina(linear )(

0

202

0

ii

iitii

ii

itii

Pw

wxg

ω

ωσσ

+−==

+=

μμμw

xw


5

• A classifier that uses linear discriminantfunctions is called “a linear machine”

• The decision surfaces for a linear machine are pieces of hyperplanes defined by:

gi(x) = gj(x)

The equation can be written as:wt(x-x0)=0


6

• The hyperplane separating Ri and Rj

always orthogonal to the line linking the means!

)()()(ln)(

21

2

2

0 jij

i

ji

ji PP μμ

μμμμx −

−−+=

ωωσ

)(21 then )()( 0 jiji PPif μμx +== ωω


7


8


9


10

• Case Σi = Σ (covariance of all classes are identical but arbitrary!)Hyperplane separating Ri and Rj

(the hyperplane separating Ri and Rj is generally not orthogonal to the line between the means!)

[ ]).(

)()()(/)(ln

)(21

and

)(Where

0)(equation theHas

10

0t

jiji

tji

jiji

ji

PPμμ

μμΣμμμμx

μμΣw

xxw

1

−−−

−+=

−=

=−

−

−

ωω


11


12


13

• Case Σi = arbitrary

• The covariance matrices are different for each category

The decision surfaces are hyperquadratics(Hyperquadrics are: hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids)

)(Plnln21

21 w

w21W

:wherewxwxWx)x(g

iii1

iti0i

i1

ii

1ii

0itii

ti

ωΣμΣμ

μΣ

Σ

+−−=

=

−=

=+=

−

−

−


14


15


16Bayes Decision Theory – Discrete

Features

• Components of x are binary or integer valued, x can take only one of m discrete values

v1, v2, …, vm

concerned with probabilities rather than probability densities in Bayes Formula:

∑=

=

=

c

jjj

jjj

PPP

PPP

P

1

)()|()(

where)(

)()|()|(

ωω

ωωω

xx

xx

x



Features

• Conditional risk is defined as before: R(α|x)

• Approach is still to minimize risk:

)|(minarg* xiiR αα =


18

Bayes Decision Theory – Discrete Features

• Case of independent binary features in 2 category problemLet x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with probabilities:

pi = P(xi = 1 | ω1)qi = P(xi = 1 | ω2)



Features

• Assuming conditional independence, P(x|ωi) can be written as a product of component probabilities:

∏

∏

∏

=

−

=

−

=

−

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛=

−=

−=

d

i

x

i

i

x

i

i

d

i

xi

xi

d

i

xi

xi

ii

ii

ii

qp

qp

PP

qqP

ppP

1

1

2

1

1

12

1

11

11

)|()|(

:bygiven ratio likelihood a yielding

)1()|(

and

)1()|(

ωω

ω

ω

xx

x

x



Features

• Taking our likelihood ratio

)()(ln

11ln)1(ln)(

:yields)()(ln

)|()|(ln)(

31 Eq. intoit plugging and

11

)|()|(

2

1

1

2

1

2

1

1

1

2

1

ωω

ωω

ωω

ωω

pp

qpx

qpxg

pp

ppg

qp

qp

PP

d

i i

ii

i

ii

d

i

x

i

i

x

i

iii

+⎥⎦

⎤⎢⎣

⎡−−

−+=

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛=

∑

∏

=

=

−

x

xxx

xx


21

• The discriminant function in this case is:

0g(x) if and0 g(x) if decide)(P)(Pln

q1p1lnw

:and

d,...,1i )p1(q)q1(plnw

:where

wxw)x(g

21

2

1d

1i i

i0

ii

iii

0i

d

1ii

≤>

+−−

=

=−−

=

+=

∑

∑

=

=

ωωωω

Documents

Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic