Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
Geoff Gordon—10-701 Machine Learning—Fall 2013
Related reading
• Bishop 2.5: nearest neighbor and Parzen windows
• Bishop 3-3.1: least squares for regression
• Bishop 4-4.1: linear classifiers
• Bishop p46, p380: naive Bayes
1
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule
• recall def of conditional: ‣ P(a|b) = P(a^b) / P(b) if P(b) != 0
2Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule
• recall def of conditional: ‣ P(a|b) = P(a^b) / P(b) if P(b) != 0
12
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule: sum version
• P(a | b) = P(b | a) P(a) / P(b)
3
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule in ML
• P(model | data) = P(data | model) P(model) / P(data)
4
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bayes rule vs. MAP vs. MLE
• P(model | data) = P(data | model) P(model) / P(data)
5
Geoff Gordon—10-701 Machine Learning—Fall 2013
Jerzy Neyman
Frequentist vs. Bayes
• Nature as adversary vs. Nature as probability distribution
• Probability as long-run frequency of repeatable events vs. odds for bets I'm willing to take
6
rev. Thomas Bayes
FIGHT!!!
see
also
: htt
p://w
ww
.xkc
d.co
m/1
132/
Geoff Gordon—10-701 Machine Learning—Fall 2013
Test for a rare disease
• About 0.1% of all people are infected
• Test detects all infections
• Test is highly specific: 1% false positive
• You test positive. What is the probability you have the disease?
7
Geoff Gordon—10-701 Machine Learning—Fall 2013
Test for a rare disease
• About 0.1% of all people are infected
• Test detects all infections
• Test is highly specific: 1% false positive
• You test positive. What is the probability you have the disease?
7
Bonus: what is probability an average med student gets this question wrong?
Geoff Gordon—10-701 Machine Learning—Fall 2013
Follow-up test
• Test 2: detects 90% of infections, 5% false positives‣ P(+disease | +test1, +test2) =
8
Geoff Gordon—10-701 Machine Learning—Fall 2013
Independence
9
Geoff Gordon—10-701 Machine Learning—Fall 2013
Conditional independence
10
xkcd.com
London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally another study pointed out that people wear coats when it rains…
Conditionally Independent
31
slide credit: Barnabas
Geoff Gordon—10-701 Machine Learning—Fall 2013
xkcd.com
London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally another study pointed out that people wear coats when it rains…
Conditionally Independent
31
humor credit: xkcd
More on the importance of conditioning
12
Geoff Gordon—10-701 Machine Learning—Fall 2013
Samples
13
…
Geoff Gordon—10-701 Machine Learning—Fall 2013
Recall: spam filtering
14
Geoff Gordon—10-701 Machine Learning—Fall 2013
Bag of words
15
Geoff Gordon—10-701 Machine Learning—Fall 2013
A ridiculously naive assumption
• Assume:
• Clearly false:
• Given this assumption, use Bayes rule
16
Geoff Gordon—10-701 Machine Learning—Fall 2013
Graphical model
17
A Graphical Model
spam
x1 x2 . . . xn
spam
xi
i=1..n
41
A Graphical Model
spam
x1 x2 . . . xn
spam
xi
i=1..n
41
Geoff Gordon—10-701 Machine Learning—Fall 2013
Naive Bayes
• P(spam | email ∧ award ∧ program ∧ for ∧ internet ∧ users ∧ lump ∧ sum ∧ of ∧ Five ∧ Million)
18
Geoff Gordon—10-701 Machine Learning—Fall 2013
In log spacezspam = ln(P(email | spam) P(award | spam) ... P(Million | spam) P(spam))
z~spam = ln(P(email | ~spam) ... P(Million | ~spam) P(~spam))
19
Geoff Gordon—10-701 Machine Learning—Fall 2013
Collect termszspam = ln(P(email | spam) P(award | spam) ... P(Million | spam) P(spam))
z~spam = ln(P(email | ~spam) ... P(Million | ~spam) P(~spam))
z = zspam – zspam
20
Geoff Gordon—10-701 Machine Learning—Fall 2013
Linear discriminant
21
Geoff Gordon—10-701 Machine Learning—Fall 2013
Intuitions
22
Geoff Gordon—10-701 Machine Learning—Fall 2013
How to get probabilities?
23
• Bernoulli distribution: Ber(p)
Suppose a coin with head prob. p is tossed n times. What is the probability of getting k heads and n-k tails?
• Binomial distribution: Bin(n,p)
17
Discrete Distributions
Geoff Gordon—10-701 Machine Learning—Fall 2013
Improvements
24