84
Machine Learning The Naïve Bayes Classifier 1

The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

  • Upload
    donhu

  • View
    220

  • Download
    0

Embed Size (px)

Citation preview

Page 1: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

MachineLearning

TheNaïveBayesClassifier

1

Page 2: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Today’slecture

• ThenaïveBayesClassifier

• LearningthenaïveBayesClassifier

• Practicalconcerns

2

Page 3: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Today’slecture

• ThenaïveBayesClassifier

• LearningthenaïveBayesClassifier

• Practicalconcerns

3

Page 4: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Wherearewe?

WehaveseenBayesianlearning– Usingaprobabilisticcriteriontoselectahypothesis– Maximumaposterioriandmaximumlikelihoodlearning

• Question:Whatisthedifferencebetweenthem?

Wecouldalsolearnfunctionsthatpredictprobabilitiesofoutcomes

– Differentfromusingaprobabilisticcriteriontolearn

Maximumaposteriori(MAP)predictionasopposedtoMAPlearning

4

Page 5: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Wherearewe?

WehaveseenBayesianlearning– Usingaprobabilisticcriteriontoselectahypothesis– Maximumaposterioriandmaximumlikelihoodlearning

• Question:Whatisthedifferencebetweenthem?

Wecouldalsolearnfunctionsthatpredictprobabilitiesofoutcomes

– Differentfromusingaprobabilisticcriteriontolearn

Maximumaposteriori(MAP)predictionasopposedtoMAPlearning

5

Page 6: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

MAPprediction

Let’sbeusetheBayesruleforpredictingy givenaninputx

6

Posteriorprobabilityoflabelbeingy forthisinputx

Page 7: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

MAPprediction

Let’sbeusetheBayesruleforpredictingy givenaninputx

Predicty fortheinputx using

7

Page 8: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

MAPprediction

Let’sbeusetheBayesruleforpredictingy givenaninputx

Predicty fortheinputx using

8

Page 9: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

MAPprediction

Let’sbeusetheBayesruleforpredictingy givenaninputx

Predicty fortheinputx using

9

Don’tconfusewithMAPlearning:findshypothesisby

Page 10: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

MAPprediction

Predicty fortheinputx using

10

Likelihood ofobservingthisinputx whenthelabelisy

Priorprobabilityofthelabelbeingy

Allweneedarethesetwosetsofprobabilities

Page 11: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Tennisagain

11

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?

OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

Page 12: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Tennisagain

12

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?

OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

Page 13: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Tennisagain

13

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Withoutanyotherinformation,whatisthepriorprobabilitythatIshouldplaytennis?

OndaysthatIdo playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

OndaysthatIdon’t playtennis,whatistheprobabilitythatthetemperatureisTandthewindisW?

Page 14: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Tennisagain

14

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Input:Temperature=Hot(H)Wind=Weak(W)

ShouldIplaytennis?

Page 15: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Tennisagain

15

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Input:Temperature=Hot(H)Wind=Weak(W)

ShouldIplaytennis?

argmaxy P(H,W|play?)P(play?)

Page 16: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Tennisagain

16

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Input:Temperature=Hot(H)Wind=Weak(W)

ShouldIplaytennis?

argmaxy P(H,W|play?)P(play?)

P(H,W|Yes)P(Yes)=0.4£ 0.3=0.12

P(H,W|No)P(No)=0.1£ 0.7=0.07

Page 17: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Tennisagain

17

Temperature Wind P(T, W|Tennis=Yes)

Hot Strong 0.15

Hot Weak 0.4

Cold Strong 0.1

Cold Weak 0.35

Temperature Wind P(T, W|Tennis=No)

Hot Strong 0.4

Hot Weak 0.1

Cold Strong 0.3

Cold Weak 0.2

Playtennis P(Playtennis)

Yes 0.3

No 0.7Prior

Likelihood

Input:Temperature=Hot(H)Wind=Weak(W)

ShouldIplaytennis?

argmaxy P(H,W|play?)P(play?)

P(H,W|Yes)P(Yes)=0.4£ 0.3=0.12

P(H,W|No)P(No)=0.1£ 0.7=0.07

MAPprediction=Yes

Page 18: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Outlook: S(unny),O(vercast),R(ainy)

Temperature: H(ot),M(edium),C(ool)

Humidity: H(igh),N(ormal),L(ow)

Wind: S(trong),W(eak)

18

Page 19: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

Outlook: S(unny),O(vercast),R(ainy)

Temperature: H(ot),M(edium),C(ool)

Humidity: H(igh),N(ormal),L(ow)

Wind: S(trong),W(eak)

19

Weneedtolearn

1. ThepriorP(Play?)2. ThelikelihoodsP(X|Play?)

Page 20: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(play?)

• Asinglenumber(Whyonlyone?)

LikelihoodP(X|Play?)

• Thereare4features

• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)

• (24 – 1)parametersineachcase

Oneforeachassignment

20

Page 21: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(play?)

• Asinglenumber(Whyonlyone?)

LikelihoodP(X|Play?)

• Thereare4features

• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)

21

Page 22: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

3 3 3 2

PriorP(play?)

• Asinglenumber(Whyonlyone?)

LikelihoodP(X|Play?)

• Thereare4features

• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)

22Valuesforthisfeature

Page 23: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

3 3 3 2

PriorP(play?)

• Asinglenumber(Whyonlyone?)

LikelihoodP(X|Play?)

• Thereare4features

• ForeachvalueofPlay? (+/-),weneedavalueforeachpossibleassignment:P(x1,x2, x3,x4 |Play?)

• (3 ⋅ 3 ⋅ 3 ⋅ 2 − 1)parametersineachcase

Oneforeachassignment

23Valuesforthisfeature

Page 24: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• Iftherearedfeatures,then:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

24

Ingeneral

Page 25: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

25

Ingeneral

Page 26: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

26

Ingeneral

Page 27: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

27

Highmodelcomplexity

Ifthereisverylimiteddata,highvarianceintheparameters

Page 28: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

28

Highmodelcomplexity

Ifthereisverylimiteddata,highvarianceintheparameters

Howcanwedealwiththis?

Page 29: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Howhardisittolearnprobabilisticmodels?

PriorP(Y)

• Ifthereareklabels,thenk– 1parameters(whynotk?)

LikelihoodP(X|Y)

• IftherearedBooleanfeatures:

• WeneedavalueforeachpossibleP(x1,x2,!,xd |y)foreachy

• k(2d – 1)parameters

Needalotofdatatoestimatethesemanynumbers!

29

Highmodelcomplexity

Ifthereisverylimiteddata,highvarianceintheparameters

Howcanwedealwiththis?

Answer:Makeindependenceassumptions

Page 30: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Recall:Conditionalindependence

SupposeX,YandZarerandomvariables

XisconditionallyindependentofYgivenZiftheprobabilitydistributionofXisindependentofthevalueofYwhenZisobserved

Orequivalently

30

Page 31: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Modelingthefeatures

𝑃(𝑥+, 𝑥-,⋯ , 𝑥/|𝑦) requiredk(2d – 1)parameters

Whatifallthefeatureswereconditionallyindependentgiventhelabel?

Thatis,𝑃 𝑥+, 𝑥-,⋯ , 𝑥/ 𝑦 = 𝑃 𝑥+ 𝑦 𝑃 𝑥- 𝑦 ⋯𝑃 𝑥/ 𝑦

Requiresonlydnumbersforeachlabel.kd featuresoverall.Notbad!

31

TheNaïveBayesAssumption

Page 32: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Modelingthefeatures

𝑃(𝑥+, 𝑥-,⋯ , 𝑥/|𝑦) requiredk(2d – 1)parameters

Whatifallthefeatureswereconditionallyindependentgiventhelabel?

Thatis,𝑃 𝑥+, 𝑥-,⋯ , 𝑥/ 𝑦 = 𝑃 𝑥+ 𝑦 𝑃 𝑥- 𝑦 ⋯𝑃 𝑥/ 𝑦

Requiresonlydnumbersforeachlabel.kd parametersoverall.Notbad!

32

TheNaïveBayesAssumption

Page 33: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

TheNaïveBayesClassifier

Assumption:FeaturesareconditionallyindependentgiventhelabelY

Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)

33

Page 34: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

TheNaïveBayesClassifier

Assumption:FeaturesareconditionallyindependentgiventhelabelY

Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)

Decisionrule

34

ℎ45 𝒙 = argmax<

𝑃 𝑦 𝑃 𝑥+, 𝑥-,⋯ , 𝑥/ 𝑦)

Page 35: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

TheNaïveBayesClassifier

Assumption:FeaturesareconditionallyindependentgiventhelabelY

Topredict,weneedtwosetsofprobabilities– PriorP(y)– Foreachxj,wehavethelikelihoodP(xj |y)

Decisionrule

35

ℎ45 𝒙 = argmax<

𝑃 𝑦 𝑃 𝑥+, 𝑥-,⋯ , 𝑥/ 𝑦)

= argmax<

𝑃 𝑦 =𝑃(𝑥>|𝑦)�

>

Page 36: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

DecisionboundariesofnaïveBayes

WhatisthedecisionboundaryofthenaïveBayesclassifier?

Considerthetwoclasscase.Wepredictthelabeltobe+if

36

𝑃 𝑦 = + =𝑃 𝑥> 𝑦 = + > 𝑃 𝑦 = − =𝑃 𝑥> 𝑦 = −)�

>

>

Page 37: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

DecisionboundariesofnaïveBayes

WhatisthedecisionboundaryofthenaïveBayesclassifier?

Considerthetwoclasscase.Wepredictthelabeltobe+if

37

𝑃 𝑦 = + =𝑃 𝑥> 𝑦 = + > 𝑃 𝑦 = − =𝑃 𝑥> 𝑦 = −)�

>

>

𝑃 𝑦 = + ∏ 𝑃 𝑥> 𝑦 = +)�>

𝑃 𝑦 = − ∏ 𝑃(𝑥>|𝑦 = −)�>

> 1

Page 38: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

DecisionboundariesofnaïveBayes

WhatisthedecisionboundaryofthenaïveBayesclassifier?

Takinglogandsimplifying,weget

38

Thisisalinearfunctionofthefeaturespace!

Easytoprove.Seenoteoncoursewebsite

log𝑃(𝑦 = −|𝒙)𝑃(𝑦 = +|𝒙) = 𝒘F𝒙 + 𝑏

Page 39: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Today’slecture

• ThenaïveBayesClassifier

• LearningthenaïveBayesClassifier

• PracticalConcerns

39

Page 40: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:P(y)• Likelihoodsforfeaturexj givenalabel:P(xj|y)

IfwehaveadatasetD={(xi,yi)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatistheprobabilisticcriteriontoselectthehypothesis?

40

Page 41: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)

IfwehaveadatasetD={(xi,yi)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatistheprobabilisticcriteriontoselectthehypothesis?

41

Page 42: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)

Supposewehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples

42

Page 43: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)

Supposewehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples

43

Anoteonconventionforthissection:• Examplesinthedatasetareindexedbythesubscript𝑖 (e.g. 𝒙𝑖)• Featureswithinanexampleareindexedbythesubscript𝑗

• The𝑗MN featureofthe𝑖MN examplewillbe𝑥O>

Page 44: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

• Whatisthehypothesisfunctionh definedby?– Acollectionofprobabilities

• Priorforeachlabel:𝑃(𝑦)• Likelihoodsforfeaturexj givenalabel:𝑃(𝑥𝑗|𝑦)

Ifwehaveadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamplesAndwewanttolearntheclassifierinaprobabilisticway– Whatisaprobabilisticcriteriontoselectthehypothesis?

44

Page 45: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

45

HerehisdefinedbyalltheprobabilitiesusedtoconstructthenaïveBayesdecision

Page 46: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Maximumlikelihoodestimation

Givenadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)} withmexamples

46

Eachexampleinthedatasetisindependentandidenticallydistributed

SowecanrepresentP(D|h)asthisproduct

Page 47: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Maximumlikelihoodestimation

Givenadataset𝐷 = {(𝒙𝑖, 𝑦𝑖)}withmexamples

47

Asks“Whatprobabilitywouldthisparticularh assigntothepair(xi,yi)?”

Eachexampleinthedatasetisindependentandidenticallydistributed

SowecanrepresentP(D|h)asthisproduct

Page 48: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Maximumlikelihoodestimation

GivenadatasetD={(xi,yi)}withmexamples

48

Page 49: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Maximumlikelihoodestimation

GivenadatasetD={(xi,yi)}withmexamples

49

TheNaïveBayesassumption

xij isthejthfeatureofxi

Page 50: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Maximumlikelihoodestimation

GivenadatasetD={(xi,yi)}withmexamples

50

Howdoweproceed?

Page 51: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Maximumlikelihoodestimation

GivenadatasetD={(xi,yi)}withmexamples

51

Page 52: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

52

Whatnext?

Page 53: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

53

Whatnext?

Weneedtomakeamodelingassumptionaboutthefunctionalformoftheseprobabilitydistributions

Page 54: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

54

Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary

• Prior:P(y=1)=p andP(y=0)=1– p

Page 55: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

55

Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary

• Prior:P(y=1)=p andP(y=0)=1– p

• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj

Page 56: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

56

Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary

• Prior:P(y=1)=p andP(y=0)=1– p

• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj

Page 57: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

57

Forsimplicity,supposetherearetwolabels1 and0 andallfeaturesarebinary

• Prior:P(y=1)=p andP(y=0)=1– p

• Likelihood foreachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj

hconsistsofp,allthea’sandb’s

Page 58: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

58

• Prior:P(y=1)=p andP(y=0)=1– p

Page 59: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

59

• Prior:P(y=1)=p andP(y=0)=1– p

[z]iscalledtheindicatorfunctionortheIversonbracket

Itsvalueis1iftheargumentzistrueandzerootherwise

Page 60: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Maximumlikelihoodestimation

60

Likelihoodforeachfeaturegivenalabel• P(xj =1|y=1)=aj andP(xj =0 |y=1)=1– aj• P(xj =1|y=0)=bj andP(xj =0 |y=0)=1- bj

Page 61: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Substitutingandderivingtheargmax,weget

61

P(y=1)=p

Page 62: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Substitutingandderivingtheargmax,weget

62

P(y=1)=p

P(xj =1|y=1)=aj

Page 63: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

LearningthenaïveBayesClassifier

Substitutingandderivingtheargmax,weget

63

P(y=1)=p

P(xj =1|y=1)=aj

P(xj =1|y=0)=bj

Page 64: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Let’slearnanaïveBayesclassifier

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

64

Page 65: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Let’slearnanaïveBayesclassifier

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

65

P(Play=+)=9/14 P(Play=-)=5/14

Page 66: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Let’slearnanaïveBayesclassifier

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

66

P(Play=+)=9/14 P(Play=-)=5/14

P(O =S|Play=+)=2/9

Page 67: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Let’slearnanaïveBayesclassifier

67

P(Play=+)=9/14 P(Play=-)=5/14

P(O =S|Play=+)=2/9

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

P(O =R|Play=+)=3/9

Page 68: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Let’slearnanaïveBayesclassifier

68

P(Play=+)=9/14 P(Play=-)=5/14

P(O =S|Play=+)=2/9

O T H W Play?1 S H H W -2 S H H S -3 O H H W +4 R M H W +5 R C N W +6 R C N S -7 O C N S +8 S M H W -9 S C N W +10 R M N W +11 S M N S +12 O M H S +13 O H N W +14 R M H S -

P(O =R|Play=+)=3/9

P(O =O|Play=+)=4/9

Andsoon,forotherattributesandalsoforPlay=-

Page 69: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

NaïveBayes:LearningandPrediction

• Learning– Counthowoftenfeaturesoccurwitheachlabel.Normalizetogetlikelihoods

– Priorsfromfractionofexampleswitheachlabel– Generalizestomulticlass

• Prediction– Uselearnedprobabilitiestofindhighestscoringlabel

69

Page 70: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Today’slecture

• ThenaïveBayesClassifier

• LearningthenaïveBayesClassifier

• Practicalconcerns+anexample

70

Page 71: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

ImportantcaveatswithNaïveBayes

1. Featuresneednotbeconditionallyindependentgiventhelabel– Justbecauseweassumethattheyaredoesn’tmeanthat

that’showtheybehaveinnature– Wemadeamodelingassumptionbecauseitmakes

computation andlearningeasier

2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts

71

Page 72: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

ImportantcaveatswithNaïveBayes

1. Featuresarenotconditionallyindependentgiventhelabel

AllbetsareoffifthenaïveBayesassumptionisnotsatisfied

Andyet,veryoftenusedinpracticebecauseofsimplicityWorksreasonablywellevenwhentheassumptionisviolated

72

Page 73: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

ImportantcaveatswithNaïveBayes

2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts

73

Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.

Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes

Shouldwetreatthosecountsaszero?

Page 74: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

ImportantcaveatswithNaïveBayes

2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts

74

Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.

Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes

Shouldwetreatthosecountsaszero? Butthatwillmaketheprobabilitieszero

Page 75: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

ImportantcaveatswithNaïveBayes

2. Notenoughtrainingdatatogetgoodestimatesoftheprobabilitiesfromcounts

75

Thebasicoperationforlearninglikelihoodsiscountinghowoftenafeatureoccurswithalabel.

Whatifweneverseeaparticularfeaturewithaparticularlabel?Eg:SupposeweneverobserveTemperature=coldwithPlayTennis=Yes

Shouldwetreatthosecountsaszero?

Answer:Smoothing• Addfakecounts(verysmallnumberssothatthecountsarenotzero)• TheBayesianinterpretationofsmoothing:Priors onthehypothesis(MAPlearning)

Butthatwillmaketheprobabilitieszero

Page 76: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Classifyingtext

• Instancespace:Textdocuments• Labels:Spam orNotSpam

• Goal:TolearnafunctionthatcanpredictwhetheranewdocumentisSpam orNotSpam

HowwouldyoubuildaNaïveBayesclassifier?

76

Letusbrainstorm

Howtorepresentdocuments?Howtoestimateprobabilities?Howtoclassify?

Page 77: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

77

Page 78: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

78

Page 79: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

79

Page 80: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

80

Page 81: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

81

Howoftendoesawordoccurwithalabel?

Page 82: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Example:Classifyingtext

1. RepresentdocumentsbyavectorofwordsAsparsevectorconsistingofonefeatureperword

2. LearningfromNlabeleddocuments1. Priors

2. Foreachwordwinvocabulary:

82

Smoothing

Page 83: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Continuousfeatures

• Sofar,wehavebeenlookingatdiscretefeatures– P(xj |y)isaBernoullitrial(i.e.acointoss)

• WecouldmodelP(xj |y)withotherdistributionstoo– Thisisaseparateassumptionfromtheindependence

assumptionthatnaiveBayesmakes– Eg:Forrealvaluedfeatures,(Xj |Y)couldbedrawnfroma

normaldistribution

• Exercise:Derivethemaximumlikelihoodestimatewhenthefeaturesareassumedtobedrawnfromthenormaldistribution

83

Page 84: The Naïve Bayes Classifier - svivek.com · Let’s be use the Bayes rule for predicting ygiven an input x Predict yfor the input x using 9 Don’t confuse with MAP learning: finds

Summary:NaïveBayes

• Independenceassumption– Allfeaturesareindependentofeachothergiventhelabel

• Maximumlikelihoodlearning:Learningissimple– Generalizestorealvaluedfeatures

• PredictionviaMAPestimation– Generalizestobeyondbinaryclassification

• Importantcaveatstoremember– Smoothing– Independenceassumptionmaynotbevalid

• Decisionboundaryislinearforbinaryclassification

84