Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
CS4705ProbabilityReviewandNaïveBayesSlidesfromDragomirRadevandmodified
Announcements• Readingfortoday:C.4,4.5NLP• Readingfornextclass:C3,NLP
• NextclasswillbetaughtbyChrisKedzie• Fornewstudentsinclass:• Nolaptoppolicy• ClassparKcipaKonusingPollEverywhereorin-classcomments
Today• SciKitLearnTutorial• WrapuponopKmizaKon• GeneraKvemethods
Regularization• Considerthecasewhereoneormoredocumentsaremis-labeled• Textfromanovelmaybemis-labeledassocialmediaifpostedasaquote
• TheclassifierwillaRempttolearnweightsthatpromotewordscharacterisKcofnovelsaspredictorsofsocialmedia• OverfiTngcanalsooccurwhenthesocialmediadocumentsinthetrainingsetarenotrepresentaKve
Loss• TopreventoverfiTng,aregularizaKonparameterR(Θ)isadded:
TwoCommonregularizers• L2regularizaKon• Keepssumofsquaresofparametervalueslow
• Gaussianpriororweightdecay(HereWisweightsnotincludingb)• Preferstodecreaseparameterwithhighweightby1than10parameterswithlowweights
• L1regularizaKon• KeepssumofabsolutevalueofparameterslowPunisheduniformlyforhighandlowvalues
Gradientbasedoptimization• RepeatunKlL(Loss)<margin• ComputeLoverthetrainingset• ComputegradientsofΘwithrespecttoL• MovetheparametersintheoppositedirecKonofthegradient
StochasticGradientDescent
Problem• Erroriscalculatedbasedonjustonetrainingsample• MaynotberepresentaKveofcorpuswideloss• Insteadcalculatetheerrorbasedonasetoftrainingexamples:minibatch• ->MinibatchstochasKcgradientdescent
ComputingGradients
Summary• Smoothinghelpstoaccountforzerovaluedn-grams• TextclassificaKonusingfeaturevectorsrepresenKngn-gramsandotherproperKes• DiscriminaKvelearning• MethodsforopKmizaKon,lossfuncKonsandregularizaKon
ClassiCicationusingaGenerativeApproach• StartwithNaïveBayesandMaximumLikelihoodExpectaKon• Butweneedsomebackgroundinprobabilityfirst
ProbabilitiesinNLP• Veryimportantforlanguageprocessing• ExampleinspeechrecogniKon:• “recognizespeech”vs“wreckanicebeach”
• ExampleinmachinetranslaKon:• “l’avocatgeneral”:“theaRorneygeneral”vs.“thegeneralavocado”
• ExampleininformaKonretrieval:• Ifadocumentincludesthreeoccurrencesof“sKr”andoneof“rice”,whatistheprobabilitythatitisarecipe
• ProbabiliKesmakeitpossibletocombineevidencefrommulKplesourcessystemaKcally
Probabilities• Probabilitytheory• predicKnghowlikelyitisthatsomethingwillhappen
• Experiment(trial)• e.g.,throwingacoin
• Possibleoutcomes• headsortails
• Samplespaces• discrete(numberof“rice”)orconKnuous(e.g.,temperature)
• Events• Ωisthecertainevent• ∅istheimpossibleevent• eventspace-allpossibleevents
SampleSpace• Randomexperiment:anexperimentwithuncertainoutcome• e.g.,flippingacoin,pickingawordfromtext• Samplespace:allpossibleoutcomes,e.g.,• Tossing2faircoins,Ω={HH,HT,TH,TT}
Events• Event:asubspaceofthesamplespace• E⊆Ω,EhappensiffoutcomeisinE,e.g.,• E={HH}(allheads)• E={HH,TT}(sameface)
• ProbabilityofEvent:0≤P(E)≤1,s.t.• P(Ω)=1(outcomealwaysinΩ)• P(A∪B)=P(A)+P(B),if(A∩B)=∅(e.g.,A=sameface,B=differentface)
Example:TossaDie
• Samplespace:Ω={1,2,3,4,5,6}• Fairdie:• p(1)=p(2)=p(3)=p(4)=p(5)=p(6)=1/6
• Unfairdie:p(1)=0.3,p(2)=0.2,...• N-dimensionaldie:• Ω={1,2,3,4,…,N}
• Exampleinmodelingtext:• TossadietodecidewhichwordtowriteinthenextposiKon• Ω={cat,dog,Kger,…}
Example:FlipaCoin• Ω:{Head,Tail}• Faircoin:• p(H)=0.5,p(T)=0.5• Unfaircoin,e.g.:• p(H)=0.3,p(T)=0.7• Flippingtwofaircoins:• Samplespace:{HH,HT,TH,TT}
• Exampleinmodelingtext:• Flipacointodecidewhetherornottoincludeawordinadocument• Samplespace={appear,absence}
Probabilities
• ProbabiliKes• numbersbetween0and1
• ProbabilitydistribuKon• distributesaprobabilitymassof1throughoutthesamplespaceΩ.
• Example:• AfaircoinistossedthreeKmes.• Whatistheprobabilityof3heads?
Probabilities
• Jointprobability:P(A∩B),alsowriRenasP(A,B)• CondiKonalProbability:P(A|B)=P(A∩B)/P(B)• P(A∩B)=P(A)P(B|A)=P(B)P(A|B)• So,P(A|B)=P(B|A)P(A)/P(B)(Bayes’Rule)• Forindependentevents,P(A∩B)=P(A)P(B),soP(A|B)=P(A)
• Totalprobability:IfA1,…,AnformaparKKonofS,then• P(B)=P(B∩S)=P(B,A1)+…+P(B,An)• So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]• ThisallowsustocomputeP(Ai|B)basedonP(B|Ai)
Probabilities
• Jointprobability:P(A∩B),alsowriRenasP(A,B)• CondiKonalProbability:P(A|B)=P(A∩B)/P(B)• P(A∩B)=P(A)P(B|A)=P(B)P(A|B)• So,P(A|B)=P(B|A)P(A)/P(B)(Bayes’Rule)• Forindependentevents,P(A∩B)=P(A)P(B),soP(A|B)=P(A)
• Totalprobability:IfA1,…,AnformaparKKonofS,then• P(B)=P(B∩S)=P(B,A1)+…+P(B,An)• So,P(Ai|B)=P(B|Ai)P(Ai)/P(B)=P(B|Ai)P(Ai)/[P(B|A1)P(A1)+…+P(B|An)P(An)]• ThisallowsustocomputeP(Ai|B)basedonP(B|Ai)
PropertiesofProbabilities• p(∅)=0• P(certainevent)=1• p(X)≤p(Y),ifX⊆Y• p(X∪Y)=p(X)+p(Y),ifX∩Y=∅
ConditionalProbability
• Priorandposteriorprobability• CondiKonalprobability
P(A|B)=P(A∩B)
P(B)
Ω
A B
A∩B
ConditionalProbability• Six-sidedfairdie• P(Deven)=?• P(D>=4)=?• P(Deven|D>=4)=?• P(Dodd|D>=4)=?• MulKplecondiKons• P(Dodd|D>=4,D<=5)=?
Independence
• TwoeventsareindependentwhenP(A∩B)=P(A)P(B)
• UnlessP(B)=0thisisequivalenttosayingthatP(A)=P(A|B)• Iftwoeventsarenotindependent,theyareconsidereddependent
[slidefromBrendanO’Connor]
NaïveBayesClassiCier• WeuseBaye’srule:• P(C|D)=P(D|C)P(C)P(D)HereC=Class,D=Document
• WecansimplifyandignoreP(D)sinceitisindependentofclasschoice• P(C|D)≅P(D|C)P(C)≅P(C)ΠP(wi|C)i=1,n• ThisesKmatestheprobabilityofDbeinginClassCassumingthatDasntokensandwisatokeninD.
UseLabeledTrainingData• P(C)isequivalenttothenumberoflabeleddocumentsintheclass/totalnumberofdocuments:
P(C)=Dc/DP(wi|C)isequivalenttothenumberofKmeswioccurswithlabelC/thenumberofKmesallwordsinthevocabulary(V)occurwithlabelC
P(w,|C)=Count(wiC)/ΣCount(viC)viεV
MultinomialNaïveBayesIndependenceAssumptions
• BagofWordsassumpKon• AssumeposiKondoesn’tmaRer
• CondiKonalIndependence• AssumethefeatureprobabiliKesP(wi|c)areindependentgiventheclassc.
[JurafskyandMarKn]
P(w1,…wn)
P(w1,…wn)=ΠP(wi|C)i=1,n
MultinomialNaïveBayesClassiCier• CMAP=argmaxP(w1…wn|C)P(C)• CNB=argmaxP(Cj)ΠP(w|C)wεW
Thisiswhyit’snaïve!
[JurafskyandMarKn]
Laplace Smoothing: Needed because counts may be zero
P̂(wi | c) =count(wi,c)+1count(w,c)+1( )
w∈V∑
=count(wi,c)+1
count(w,cw∈V∑ )
#
$%%
&
'(( + V
P̂(wi | c) =count(wi,c)count(w,c)( )
w∈V∑
[JurafskyandMarKn]
Questions?
SciKitLearn