Upload
others
View
8
Download
0
Embed Size (px)
Citation preview
Pattern Classification
All materials in these slides were taken from Pattern Classification (2nd ed) by R. O.
Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000
with the permission of the authors and the publisher
Chapter 2 (part 3)Bayesian Decision Theory
(Sections 2-6,2-9)
• Discriminant Functions for the Normal Density
• Bayes Decision Theory – Discrete Features
Pattern Classification, Chapter 2 (Part 3)
2Discriminant Functions for the Normal Density
• We saw that the minimum error-rate classification can be achieved by the discriminant function
gi(x) = ln P(x | ωi) + ln P(ωi)
• Case of multivariate normal
)(lnln212ln
2)()(
21)(
1
iii
it
ii Pdxxxg ωπμμ +Σ−−−−−= ∑−
Pattern Classification, Chapter 2 (Part 3)
3
Case Σi = σ2I (I stands for the identity matrix)
• What does “Σi = σ2I” say about the dimensions?
• What about the variance of each dimension?
normEuclidean thedenotes where
)(ln2
)(
:osimplify tcan weThus
)(lnln212ln
2)()(
21)(
in oft independen are ln (d/2) and both :Note
2
2
1
i
⋅
+−
−=
+Σ−−−−−=
Σ
∑−
ii
i
iii
it
ii
Pxg
Pdxxxg
i
ωσμχ
ωπμμ
π
Pattern Classification, Chapter 2 (Part 3)
4
• We can further simplify by recognizing that the quadratic term xtx implicit in the Euclidean norm is the same for all i.
)category!th for the threshold thecalled is (
)(ln2
1 ;
:wherefunction)nt discrimina(linear )(
0
202
0
ii
iitii
ii
itii
Pw
wxg
ω
ωσσ
+−==
+=
μμμw
xw
Pattern Classification, Chapter 2 (Part 3)
5
• A classifier that uses linear discriminantfunctions is called “a linear machine”
• The decision surfaces for a linear machine are pieces of hyperplanes defined by:
gi(x) = gj(x)
The equation can be written as:wt(x-x0)=0
Pattern Classification, Chapter 2 (Part 3)
6
• The hyperplane separating Ri and Rj
always orthogonal to the line linking the means!
)()()(ln)(
21
2
2
0 jij
i
ji
ji PP μμ
μμμμx −
−−+=
ωωσ
)(21 then )()( 0 jiji PPif μμx +== ωω
Pattern Classification, Chapter 2 (Part 3)
7
Pattern Classification, Chapter 2 (Part 3)
8
Pattern Classification, Chapter 2 (Part 3)
9
Pattern Classification, Chapter 2 (Part 3)
10
• Case Σi = Σ (covariance of all classes are identical but arbitrary!)Hyperplane separating Ri and Rj
(the hyperplane separating Ri and Rj is generally not orthogonal to the line between the means!)
[ ]).(
)()()(/)(ln
)(21
and
)(Where
0)(equation theHas
10
0t
jiji
tji
jiji
ji
PPμμ
μμΣμμμμx
μμΣw
xxw
1
−−−
−+=
−=
=−
−
−
ωω
Pattern Classification, Chapter 2 (Part 3)
11
Pattern Classification, Chapter 2 (Part 3)
12
Pattern Classification, Chapter 2 (Part 3)
13
• Case Σi = arbitrary
• The covariance matrices are different for each category
The decision surfaces are hyperquadratics(Hyperquadrics are: hyperplanes, pairs of hyperplanes, hyperspheres, hyperellipsoids, hyperparaboloids, hyperhyperboloids)
)(Plnln21
21 w
w21W
:wherewxwxWx)x(g
iii1
iti0i
i1
ii
1ii
0itii
ti
ωΣμΣμ
μΣ
Σ
+−−=
=
−=
=+=
−
−
−
Pattern Classification, Chapter 2 (Part 3)
14
Pattern Classification, Chapter 2 (Part 3)
15
Pattern Classification, Chapter 2 (Part 3)
16Bayes Decision Theory – Discrete
Features
• Components of x are binary or integer valued, x can take only one of m discrete values
v1, v2, …, vm
concerned with probabilities rather than probability densities in Bayes Formula:
∑=
=
=
c
jjj
jjj
PPP
PPP
P
1
)()|()(
where)(
)()|()|(
ωω
ωωω
xx
xx
x
Pattern Classification, Chapter 2 (Part 3)
17Bayes Decision Theory – Discrete
Features
• Conditional risk is defined as before: R(α|x)
• Approach is still to minimize risk:
)|(minarg* xiiR αα =
Pattern Classification, Chapter 2 (Part 3)
18
Bayes Decision Theory – Discrete Features
• Case of independent binary features in 2 category problemLet x = [x1, x2, …, xd ]t where each xi is either 0 or 1, with probabilities:
pi = P(xi = 1 | ω1)qi = P(xi = 1 | ω2)
Pattern Classification, Chapter 2 (Part 3)
19Bayes Decision Theory – Discrete
Features
• Assuming conditional independence, P(x|ωi) can be written as a product of component probabilities:
∏
∏
∏
=
−
=
−
=
−
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
⎟⎟⎠
⎞⎜⎜⎝
⎛=
−=
−=
d
i
x
i
i
x
i
i
d
i
xi
xi
d
i
xi
xi
ii
ii
ii
qp
qp
PP
qqP
ppP
1
1
2
1
1
12
1
11
11
)|()|(
:bygiven ratio likelihood a yielding
)1()|(
and
)1()|(
ωω
ω
ω
xx
x
x
Pattern Classification, Chapter 2 (Part 3)
20Bayes Decision Theory – Discrete
Features
• Taking our likelihood ratio
)()(ln
11ln)1(ln)(
:yields)()(ln
)|()|(ln)(
31 Eq. intoit plugging and
11
)|()|(
2
1
1
2
1
2
1
1
1
2
1
ωω
ωω
ωω
ωω
pp
qpx
qpxg
pp
ppg
qp
qp
PP
d
i i
ii
i
ii
d
i
x
i
i
x
i
iii
+⎥⎦
⎤⎢⎣
⎡−−
−+=
+=
⎟⎟⎠
⎞⎜⎜⎝
⎛−−
⎟⎟⎠
⎞⎜⎜⎝
⎛=
∑
∏
=
=
−
x
xxx
xx
Pattern Classification, Chapter 2 (Part 3)
21
• The discriminant function in this case is:
0g(x) if and0 g(x) if decide)(P)(Pln
q1p1lnw
:and
d,...,1i )p1(q)q1(plnw
:where
wxw)x(g
21
2
1d
1i i
i0
ii
iii
0i
d
1ii
≤>
+−−
=
=−−
=
+=
∑
∑
=
=
ωωωω