Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop

Geoff Gordon—10-701 Machine Learning—Fall 2013

Related reading

• Bishop 2.5: nearest neighbor and Parzen windows

• Bishop 3-3.1: least squares for regression

• Bishop 4-4.1: linear classifiers

• Bishop p46, p380: naive Bayes

1


Bayes rule

• recall def of conditional: ‣ P(a|b) = P(a^b) / P(b) if P(b) != 0

2Geoff Gordon—10-701 Machine Learning—Fall 2013

Bayes rule

• recall def of conditional: ‣ P(a|b) = P(a^b) / P(b) if P(b) != 0

12


Bayes rule: sum version

• P(a | b) = P(b | a) P(a) / P(b)

3


Bayes rule in ML

• P(model | data) = P(data | model) P(model) / P(data)

4


Bayes rule vs. MAP vs. MLE

• P(model | data) = P(data | model) P(model) / P(data)

5


Jerzy Neyman

Frequentist vs. Bayes

• Nature as adversary vs. Nature as probability distribution

• Probability as long-run frequency of repeatable events vs. odds for bets I'm willing to take

6

rev. Thomas Bayes

FIGHT!!!

see

also

: htt

p://w

ww

.xkc

d.co

m/1

132/


Test for a rare disease

• About 0.1% of all people are infected

• Test detects all infections

• Test is highly specific: 1% false positive

• You test positive. What is the probability you have the disease?

7


Test for a rare disease

• About 0.1% of all people are infected

• Test detects all infections

• Test is highly specific: 1% false positive

• You test positive. What is the probability you have the disease?

7

Bonus: what is probability an average med student gets this question wrong?


Follow-up test

• Test 2: detects 90% of infections, 5% false positives‣ P(+disease | +test1, +test2) =

8


Independence

9


Conditional independence

10

xkcd.com

London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally another study pointed out that people wear coats when it rains…

Conditionally Independent

31

slide credit: Barnabas


xkcd.com

London taxi drivers: A survey has pointed out a positive and significant correlation between the number of accidents and wearing coats. They concluded that coats could hinder movements of drivers and be the cause of accidents. A new law was prepared to prohibit drivers from wearing coats when driving. Finally another study pointed out that people wear coats when it rains…

Conditionally Independent

31

humor credit: xkcd

More on the importance of conditioning

12


Samples

13

…


Recall: spam filtering

14


Bag of words

15


A ridiculously naive assumption

• Assume:

• Clearly false:

• Given this assumption, use Bayes rule

16


Graphical model

17

A Graphical Model

spam

x1 x2 . . . xn

spam

xi

i=1..n

41

A Graphical Model

spam

x1 x2 . . . xn

spam

xi

i=1..n

41


Naive Bayes

• P(spam | email ∧ award ∧ program ∧ for ∧ internet ∧ users ∧ lump ∧ sum ∧ of ∧ Five ∧ Million)

18


In log spacezspam = ln(P(email | spam) P(award | spam) ... P(Million | spam) P(spam))

z~spam = ln(P(email | ~spam) ... P(Million | ~spam) P(~spam))

19


Collect termszspam = ln(P(email | spam) P(award | spam) ... P(Million | spam) P(spam))

z~spam = ln(P(email | ~spam) ... P(Million | ~spam) P(~spam))

z = zspam – zspam

20


Linear discriminant

21


Intuitions

22


How to get probabilities?

23

• Bernoulli distribution: Ber(p)

Suppose a coin with head prob. p is tossed n times. What is the probability of getting k heads and n-k tails?

• Binomial distribution: Bin(n,p)

17

Discrete Distributions


Improvements

24

Documents

Related readingggordon/tmp/04-naive-bayes-annotated.pdfGeoff Gordon—10-701 Machine Learning—Fall 2013 Related reading •Bishop 2.5: nearest neighbor and Parzen windows •Bishop