19
Pattern Classification, Chapter 2 (Part 2) 1 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5)

  • Upload
    anitra

  • View
    66

  • Download
    1

Embed Size (px)

DESCRIPTION

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher. Chapter 2 (Part 2): Bayesian Decision Theory (Sections 2.3-2.5). - PowerPoint PPT Presentation

Citation preview

Page 1: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

1

Pattern Classification

All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Page 2: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Chapter 2 (Part 2): Bayesian Decision Theory

(Sections 2.3-2.5)

• Minimum-Error-Rate Classification

• Classifiers, Discriminant Functions and Decision Surfaces

• The Normal Density

Page 3: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

3

Minimum-Error-Rate Classification

•Actions are decisions on classesIf action i is taken and the true state of nature is j then:

the decision is correct if i = j and in error if i j

•Seek a decision rule that minimizes the probability of error which is the error rate

Page 4: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

4

• Introduction of the zero-one loss function:

Therefore, the conditional risk is:

“The risk corresponding to this loss function is the average probability error”

c,...,1j,i ji 1

ji 0),( ji

1jij

cj

1jjjii

)x|(P1)x|(P

)x|(P)|()x|(R

Page 5: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

5

•Minimize the risk requires maximize P(i | x)

(since R(i | x) = 1 – P(i | x))

•For Minimum error rate

•Decide i if P (i | x) > P(j | x) j i

Page 6: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

6

• Regions of decision and zero-one loss function, therefore:

• If is the zero-one loss function which means:

b1

2

a1

2

)(P

)(P2 then

0 1

2 0 if

)(P

)(P then

0 1

1 0

)|x(P

)|x(P :if decide then

)(P

)(P. Let

2

11

1

2

1121

2212

Page 7: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

7

Page 8: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

8Classifiers, Discriminant Functions

and Decision Surfaces

•The multi-category case

•Set of discriminant functions gi(x), i = 1,…, c

•The classifier assigns a feature vector x to class i

if: gi(x) > gj(x) j i

Page 9: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

9

Page 10: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

10

•Let gi(x) = - R(i | x)

(max. discriminant corresponds to min. risk!)

•For the minimum error rate, we take gi(x) = P(i | x)

(max. discrimination corresponds to max. posterior!)

gi(x) P(x | i) P(i)

gi(x) = ln P(x | i) + ln P(i)

(ln: natural logarithm!)

Page 11: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

11

•Feature space divided into c decision regions

if gi(x) > gj(x) j i then x is in Ri

(Ri means assign x to i)

•The two-category case•A classifier is a “dichotomizer” that has two

discriminant functions g1 and g2

Let g(x) g1(x) – g2(x)

Decide 1 if g(x) > 0 ; Otherwise decide 2

Page 12: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

12

•The computation of g(x)

)(P

)(Pln

)|x(P

)|x(Pln

)x|(P)x|(P)x(g

2

1

2

1

21

Page 13: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

13

Page 14: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

14

The Normal Density

• Univariate density

• Density which is analytically tractable

• Continuous density

• A lot of processes are asymptotically Gaussian

• Handwritten characters, speech sounds are ideal or prototype corrupted by random process (central limit theorem)

Where: = mean (or expected value) of x 2 = expected squared deviation or variance

,x

2

1exp

2

1)x(P

2

Page 15: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

15

Page 16: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

16

• Multivariate density

• Multivariate normal density in d dimensions is:

where:

x = (x1, x2, …, xd)t (t stands for the transpose vector form)

= (1, 2, …, d)t mean vector = d*d covariance matrix

|| and -1 are determinant and inverse respectively

)x()x(

2

1exp

)2(

1)x(P 1t

2/12/d

Page 17: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

17

Appendix

•Variance=S2

•Standard Deviation=S

2

1

2 )(1

1xx

nS

n

ii

Page 18: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

18

Bays theorem

A ﹁ A

B A and B ﹁ A and B

﹁ B A and ﹁ B ﹁ A and ﹁ B

)|()()|()(

)|()()|(

ABPAPABPAP

ABPAPBAP

)(

)|()()|(

BP

ABPAPBAP

Page 19: Chapter 2 (Part 2):  Bayesian Decision Theory (Sections 2.3-2.5)

Pattern Classification, Chapter 2 (Part 2)

19