22
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification

All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.

Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000

with the permission of the authors and the publisher

Page 2: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Chapter 2 (part 3)Bayesian Decision Theory

(Sections 2-6,2-9)

• Discriminant Functions for the Normal Density

• Bayes Decision Theory – Discrete Features

Page 3: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

2Discriminant Functions for the Normal Density

• We saw that the minimum error-rate classification can be achieved by the discriminant function

gi(x) = ln P(x | ωi) + ln P(ωi)

• Case of multivariate normal

)(lnln212ln

2)()(

21)(

1

iii

it

ii Pdxxxg ωπμμ +Σ−−−−−= ∑−

Page 4: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

3

Case Σi = σ2I (I stands for the identity matrix)

• What does “Σi = σ2I” say about the dimensions?

• What about the variance of each dimension?

normEuclidean thedenotes where

)(ln2

)(

:osimplify tcan weThus

)(lnln212ln

2)()(

21)(

in oft independen are ln (d/2) and both :Note

2

2

1

i

+−

−=

+Σ−−−−−=

Σ

∑−

ii

i

iii

it

ii

Pxg

Pdxxxg

i

ωσμχ

ωπμμ

π

Page 5: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

4

• We can further simplify by recognizing that the quadratic term xtx implicit in the Euclidean norm is the same for all i.

)category!th for the threshold thecalled is (

)(ln2

1 ;

:wherefunction)nt discrimina(linear )(

0

202

0

ii

iitii

ii

itii

Pw

wxg

ω

ωσσ

+−==

+=

μμμw

xw

Page 6: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

5

• A classifier that uses linear discriminantfunctions is called “a linear machine”

• The decision surfaces for a linear machine are pieces of hyperplanes defined by:

gi(x) = gj(x)

The equation can be written as:wt(x-x0)=0

Page 7: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

6

• The hyperplane separating Ri and Rj

always orthogonal to the line linking the means!

)()()(ln)(

21

2

2

0 jij

i

ji

ji PP μμ

μμμμx −

−−+=

ωωσ

)(21 then )()( 0 jiji PPif μμx +== ωω

Page 8: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

7

Page 9: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

8

Page 10: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

9

Page 11: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

10

• Case Σi = Σ (covariance of all classes are identical but arbitrary!)Hyperplane separating Ri and Rj

(the hyperplane separating Ri and Rj is generally not orthogonal to the line between the means!)

[ ]).(

)()()(/)(ln

)(21

and

)(Where

0)(equation theHas

10

0t

jiji

tji

jiji

ji

PPμμ

μμΣμμμμx

μμΣw

xxw

1

−−−

−+=

−=

=−

ωω

Page 12: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

11

Page 13: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

12

Page 14: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

13

• Case Σi = arbitrary

• The covariance matrices are different for each category

The decision surfaces are hyperquadratics(Hyperquadrics are: hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids)

)(Plnln21

21 w

w21W

:wherewxwxWx)x(g

iii1

iti0i

i1

ii

1ii

0itii

ti

ωΣμΣμ

μΣ

Σ

+−−=

=

−=

=+=

Page 15: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

14

Page 16: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

15

Page 17: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

16Bayes Decision Theory – Discrete

Features

• Components of x are binary or integer valued, x can take only one of m discrete values

v1, v2, …, vm

concerned with probabilities rather than probability densities in Bayes Formula:

∑=

=

=

c

jjj

jjj

PPP

PPP

P

1

)()|()(

where)(

)()|()|(

ωω

ωωω

xx

xx

x

Page 18: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

17Bayes Decision Theory – Discrete

Features

• Conditional risk is defined as before: R(α|x)

• Approach is still to minimize risk:

)|(minarg* xiiR αα =

Page 19: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

18

Bayes Decision Theory – Discrete Features

• Case of independent binary features in 2 category problemLet x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with probabilities:

pi = P(xi = 1 | ω1)qi = P(xi = 1 | ω2)

Page 20: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

19Bayes Decision Theory – Discrete

Features

• Assuming conditional independence, P(x|ωi) can be written as a product of component probabilities:

=

=

=

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛=

−=

−=

d

i

x

i

i

x

i

i

d

i

xi

xi

d

i

xi

xi

ii

ii

ii

qp

qp

PP

qqP

ppP

1

1

2

1

1

12

1

11

11

)|()|(

:bygiven ratio likelihood a yielding

)1()|(

and

)1()|(

ωω

ω

ω

xx

x

x

Page 21: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

20Bayes Decision Theory – Discrete

Features

• Taking our likelihood ratio

)()(ln

11ln)1(ln)(

:yields)()(ln

)|()|(ln)(

31 Eq. intoit plugging and

11

)|()|(

2

1

1

2

1

2

1

1

1

2

1

ωω

ωω

ωω

ωω

pp

qpx

qpxg

pp

ppg

qp

qp

PP

d

i i

ii

i

ii

d

i

x

i

i

x

i

iii

+⎥⎦

⎤⎢⎣

⎡−−

−+=

+=

⎟⎟⎠

⎞⎜⎜⎝

⎛−−

⎟⎟⎠

⎞⎜⎜⎝

⎛=

=

=

x

xxx

xx

Page 22: Pattern Classification - Computer Science & Erose/768/ppt/DHSch2part3.pdf · Pattern Classification, Chapter 2 (Part 3) 4 • We can further simplify by recognizing that the quadratic

Pattern Classification, Chapter 2 (Part 3)

21

• The discriminant function in this case is:

0g(x) if and0 g(x) if decide)(P)(Pln

q1p1lnw

:and

d,...,1i )p1(q)q1(plnw

:where

wxw)x(g

21

2

1d

1i i

i0

ii

iii

0i

d

1ii

≤>

+−−

=

=−−

=

+=

=

=

ωωωω