View
217
Download
2
Category
Preview:
Citation preview
DanJurafsky
Posi%veornega%vemoviereview?
• unbelievablydisappoin+ng• Fullofzanycharactersandrichlyappliedsa+re,andsome
greatplottwists• thisisthegreatestscrewballcomedyeverfilmed• Itwaspathe+c.Theworstpartaboutitwastheboxing
scenes.
2
DanJurafsky Twi;ersen%mentversusGallupPollofConsumerConfidence
Brendan O'Connor, Ramnath Balasubramanyan, Bryan R. Routledge, and Noah A. Smith. 2010. FromTweetstoPolls:LinkingTextSen+menttoPublicOpinionTimeSeries.InICWSM-2010
5
DanJurafsky
Twi;ersen%ment:
JohanBollen,HuinaMao,XiaojunZeng.2011.TwiYermoodpredictsthestockmarket,JournalofComputa+onalScience2:1,1-8.10.1016/j.jocs.2010.12.007.
6
DanJurafsky
7
DowJo
nes• CALMpredicts
DJIA3dayslater
• Atleastonecurrenthedgefundusesthisalgorithm
CALM
Bollenetal.(2011)
DanJurafsky
TargetSen%mentonTwi;er
• TwiYerSen+mentApp• AlecGo,RichaBhayani,LeiHuang.2009.
TwiYerSen+mentClassifica+onusingDistantSupervision
8
DanJurafsky
Sen%mentanalysishasmanyothernames
• Opinionextrac+on• Opinionmining• Sen+mentmining• Subjec+vityanalysis
9
DanJurafsky
Whysen%mentanalysis?
• Movie:isthisreviewposi+veornega+ve?• Products:whatdopeoplethinkaboutthenewiPhone?• Publicsen1ment:howisconsumerconfidence?Isdespairincreasing?
• Poli1cs:whatdopeoplethinkaboutthiscandidateorissue?• Predic1on:predictelec+onoutcomesormarkettrendsfromsen+ment
10
DanJurafsky
SchererTypologyofAffec%veStates
• Emo%on:brieforganicallysynchronized…evalua+onofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated
• Mood:diffusenon-causedlow-intensitylong-dura+onchangeinsubjec+vefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant
• Interpersonalstances:affec+vestancetowardanotherpersoninaspecificinterac+on• friendly,flirta1ous,distant,cold,warm,suppor1ve,contemptuous
• AGtudes:enduring,affec+velycoloredbeliefs,disposi+onstowardsobjectsorpersons• liking,loving,ha1ng,valuing,desiring
• Personalitytraits:stablepersonalitydisposi+onsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hos1le,jealous
11
DanJurafsky
SchererTypologyofAffec%veStates
• Emo%on:brieforganicallysynchronized…evalua+onofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated
• Mood:diffusenon-causedlow-intensitylong-dura+onchangeinsubjec+vefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant
• Interpersonalstances:affec+vestancetowardanotherpersoninaspecificinterac+on• friendly,flirta1ous,distant,cold,warm,suppor1ve,contemptuous
• AGtudes:enduring,affec%velycoloredbeliefs,disposi%onstowardsobjectsorpersons• liking,loving,ha1ng,valuing,desiring
• Personalitytraits:stablepersonalitydisposi+onsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hos1le,jealous
12
DanJurafsky
Sen%mentAnalysis
• Sen+mentanalysisisthedetec+onofaGtudes“enduring,affec+velycoloredbeliefs,disposi+onstowardsobjectsorpersons”1. Holder(source)ofagtude2. Target(aspect)ofagtude3. Typeofagtude• Fromasetoftypes
• Like,love,hate,value,desire,etc.• Or(morecommonly)simpleweightedpolarity:
• posi1ve,nega1ve,neutral,togetherwithstrength4. Textcontainingtheagtude• Sentenceoren+redocument13
DanJurafsky
Sen%mentAnalysis
• Simplesttask:• Istheagtudeofthistextposi+veornega+ve?
• Morecomplex:• Ranktheagtudeofthistextfrom1to5
• Advanced:• Detectthetarget,source,orcomplexagtudetypes
14
DanJurafsky
Sen%mentAnalysis
• Simplesttask:• Istheagtudeofthistextposi+veornega+ve?
• Morecomplex:• Ranktheagtudeofthistextfrom1to5
• Advanced:• Detectthetarget,source,orcomplexagtudetypes
15
DanJurafsky
Sentiment Classification in Movie Reviews
• Polaritydetec+on:• IsanIMDBmoviereviewposi+veornega+ve?
• Data:PolarityData2.0:• hYp://www.cs.cornell.edu/people/pabo/movie-review-data
BoPang,LillianLee,andShivakumarVaithyanathan.2002.Thumbsup?Sen+mentClassifica+onusingMachineLearningTechniques.EMNLP-2002,79—86.BoPangandLillianLee.2004.ASen+mentalEduca+on:Sen+mentAnalysisUsingSubjec+vitySummariza+onBasedonMinimumCuts.ACL,271-278
18
DanJurafsky
IMDBdatainthePangandLeedatabase
when_starwars_cameoutsometwentyyearsago,theimageoftravelingthroughoutthestarshasbecomeacommonplaceimage.[…]whenhansologoeslightspeed,thestarschangetobrightlines,goingtowardstheviewerinlinesthatconvergeataninvisiblepoint.cool._octobersky_offersamuchsimplerimage–thatofasinglewhitedot,travelinghorizontallyacrossthenightsky.[...]
“snakeeyes”isthemostaggrava+ngkindofmovie:thekindthatshowssomuchpoten+althenbecomesunbelievablydisappoin+ng.it’snotjustbecausethisisabriandepalmafilm,andsincehe’sagreatdirectorandonewho’sfilmsarealwaysgreetedwithatleastsomefanfare.andit’snotevenbecausethiswasafilmstarringnicolascageandsincehegivesabrauvaraperformance,thisfilmishardlyworthhistalents.
✓ ✗
19
DanJurafsky
BaselineAlgorithm(adaptedfromPangandLee)
• Tokeniza+on• FeatureExtrac+on• Classifica+onusingdifferentclassifiers
• NaïveBayes• MaxEnt• SVM
20
DanJurafsky
Sen%mentTokeniza%onIssues
• DealwithHTMLandXMLmarkup• TwiYermark-up(names,hashtags)• Capitaliza+on(preserveforwordsinallcaps)• Phonenumbers,dates• Emo+cons• Usefulcode:
• ChristopherPoYssen+menttokenizer• BrendanO’ConnortwiYertokenizer21
[<>]? # optional hat/brow[:;=8] # eyes[\-o\*\']? # optional nose[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth | #### reverse orientation[\)\]\(\[dDpP/\:\}\{@\|\\] # mouth[\-o\*\']? # optional nose[:;=8] # eyes[<>]? # optional hat/brow
PoYsemo+cons
DanJurafsky
Extrac%ngFeaturesforSen%mentClassifica%on
• Howtohandlenega+on• I didn’t like this movievs• I really like this movie
• Whichwordstouse?• Onlyadjec+ves• Allwords• AllwordsturnsouttoworkbeYer,atleastonthisdata
22
DanJurafsky
Nega%on
AddNOT_toeverywordbetweennega+onandfollowingpunctua+on:
didn’t like this movie , but I
didn’t NOT_like NOT_this NOT_movie but I
Das,SanjivandMikeChen.2001.Yahoo!forAmazon:Extrac+ngmarketsen+mentfromstockmessageboards.InProceedingsoftheAsiaPacificFinanceAssocia+onAnnualConference(APFA).Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up? Sentiment Classification using Machine Learning Techniques. EMNLP-2002, 79—86.
23
DanJurafsky
Reminder:NaïveBayes
24
P̂(w | c) = count(w,c)+1count(c)+ V
cNB = argmaxc j∈C
P(cj ) P(wi | cj )i∈positions∏
DanJurafsky
Binarized(Booleanfeature)Mul%nomialNaïveBayes
• Intui+on:• Forsen+ment(andprobablyforothertextclassifica+ondomains)• WordoccurrencemaymaYermorethanwordfrequency• Theoccurrenceofthewordfantas1ctellsusalot• Thefactthatitoccurs5+mesmaynottellusmuchmore.
• BooleanMul+nomialNaïveBayes• Clipsallthewordcountsineachdocumentat1
25
DanJurafsky
BooleanMul%nomialNaïveBayes:Learning
• CalculateP(cj)terms• ForeachcjinCdo
docsj←alldocswithclass=cj
P(cj )←| docsj |
| total # documents|
P(wk | cj )←nk +α
n+α |Vocabulary |
• Textj←singledoccontainingalldocsj• ForeachwordwkinVocabulary
nk←#ofoccurrencesofwkinTextj
• Fromtrainingcorpus,extractVocabulary
• CalculateP(wk|cj)terms• Removeduplicatesineachdoc:
• Foreachwordtypewindocj• Retainonlyasingleinstanceofw
26
DanJurafsky
BooleanMul%nomialNaïveBayesonatestdocumentd
27
• Firstremoveallduplicatewordsfromd• ThencomputeNBusingthesameequa+on:
cNB = argmaxc j∈C
P(cj ) P(wi | cj )i∈positions∏
DanJurafsky
Normalvs.BooleanMul%nomialNBNormal Doc Words ClassTraining 1 ChineseBeijingChinese c
2 ChineseChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j
Test 5 ChineseChineseChineseTokyoJapan ?
28
Boolean Doc Words ClassTraining 1 ChineseBeijing c
2 ChineseShanghai c3 ChineseMacao c4 TokyoJapanChinese j
Test 5 ChineseTokyoJapan ?
DanJurafsky
Binarized(Booleanfeature)Mul%nomialNaïveBayes
• BinaryseemstoworkbeYerthanfullwordcounts• ThisisnotthesameasMul+variateBernoulliNaïveBayes• MBNBdoesn’tworkwellforsen+mentorothertexttasks
• Otherpossibility:log(freq(w))29
B.Pang,L.Lee,andS.Vaithyanathan.2002.Thumbsup?Sen+mentClassifica+onusingMachineLearningTechniques.EMNLP-2002,79—86.V.Metsis,I.Androutsopoulos,G.Paliouras.2006.SpamFilteringwithNaiveBayes–WhichNaiveBayes?CEAS2006-ThirdConferenceonEmailandAn+-Spam.K.-M.Schneider.2004.Onwordfrequencyinforma+onandnega+veevidenceinNaiveBayestextclassifica+on.ICANLP,474-485.JDRennie,LShih,JTeevan.2003.Tacklingthepoorassump+onsofnaivebayestextclassifiers.ICML2003
DanJurafsky
Cross-Valida%on
• Breakupdatainto10folds• (Equalposi+veandnega+veinsideeachfold?)
• Foreachfold• Choosethefoldasatemporarytestset
• Trainon9folds,computeperformanceonthetestfold
• Reportaverageperformanceofthe10runs
TrainingTest
Test
Test
Test
Test
Training
Training Training
Training
Training
Iteration
1
2
3
4
530
DanJurafsky
Gene%cAlgorithmsforbinarysen%ment
• Asbefore,createavocabularyonnwords• GArepresenta+onis:
• Eachgeneisafloatinrange(0,1).Ciseitherposi+veornega+vesen+mentastheothervaluewithbe1-p(wi|c)
• Useslide27topredictsen+mentclass.Fitnessiserrorinpredic+on
32
p(wi|c) p(w1|c) p(wn|c) … …
DanJurafsky
GAFitness
• Ifdocumentdihascsi(+1forposi+veor0fornega+vesen+ment)andcpikpredictedbythekthpopula+onmember
• Normalizedfitnessforkthpopula+onmemberformdocuments:
• Here,wewouldbemaximizingfitness,Fk.Thisisinrange(0,1)33
Fk =1−1m
csi − cpiki=1
m
∑
DanJurafsky Problems:Whatmakesreviewshardtoclassify?
• Subtlety:• PerfumereviewinPerfumes:theGuide:• “Ifyouarereadingthisbecauseitisyourdarlingfragrance,pleasewearitathomeexclusively,andtapethewindowsshut.”
• DorothyParkeronKatherineHepburn• “Sherunsthegamutofemo+onsfromAtoB”
34
DanJurafsky
ThwartedExpecta%onsandOrderingEffects
• “Thisfilmshouldbebrilliant.Itsoundslikeagreatplot,theactorsarefirstgrade,andthesuppor+ngcastisgoodaswell,andStalloneisaYemp+ngtodeliveragoodperformance.However,itcan’tholdup.”
• WellasusualKeanuReevesisnothingspecial,butsurprisingly,theverytalentedLaurenceFishbourneisnotsogoodeither,Iwassurprised.
35
DanJurafsky
TheGeneralInquirer
• Homepage:hYp://www.wjh.harvard.edu/~inquirer• ListofCategories:hYp://www.wjh.harvard.edu/~inquirer/homecat.htm
• Spreadsheet:hYp://www.wjh.harvard.edu/~inquirer/inquirerbasic.xls• Categories:
• Posi+v(1915words)andNega+v(2291words)• StrongvsWeak,Ac+vevsPassive,OverstatedversusUnderstated• Pleasure,Pain,Virtue,Vice,Mo+va+on,Cogni+veOrienta+on,etc
• FreeforResearchUse
PhilipJ.Stone,DexterCDunphy,MarshallS.Smith,DanielM.Ogilvie.1966.TheGeneralInquirer:AComputerApproachtoContentAnalysis.MITPress
38
DanJurafsky
LIWC(Linguis%cInquiryandWordCount)Pennebaker,J.W.,Booth,R.J.,&Francis,M.E.(2007).Linguis+cInquiryandWordCount:LIWC2007.Aus+n,TX
• Homepage:hYp://www.liwc.net/• 2300words,>70classes• Affec%veProcesses
• nega+veemo+on(bad,weird,hate,problem,tough)• posi+veemo+on(love,nice,sweet)
• Cogni%veProcesses• Tenta+ve(maybe,perhaps,guess),Inhibi+on(block,constraint)
• Pronouns,Nega%on(no,never),Quan%fiers(few,many)• $30or$90fee39
DanJurafsky
MPQASubjec%vityCuesLexicon
• Homepage:hYp://www.cs.piY.edu/mpqa/subj_lexicon.html• 6885wordsfrom8221lemmas
• 2718posi+ve• 4912nega+ve
• Eachwordannotatedforintensity(strong,weak)• GNUGPL40
Theresa Wilson, Janyce Wiebe, and Paul Hoffmann (2005). Recognizing Contextual Polarity in Phrase-Level Sentiment Analysis. Proc. of HLT-EMNLP-2005. Riloff and Wiebe (2003). Learning extraction patterns for subjective expressions. EMNLP-2003.
DanJurafsky
BingLiuOpinionLexicon
• BingLiu'sPageonOpinionMining• hYp://www.cs.uic.edu/~liub/FBS/opinion-lexicon-English.rar
• 6786words• 2006posi+ve• 4783nega+ve
41
MinqingHuandBingLiu.MiningandSummarizingCustomerReviews.ACMSIGKDD-2004.
DanJurafsky
Sen%WordNetStefanoBaccianella,AndreaEsuli,andFabrizioSebas+ani.2010SENTIWORDNET3.0:AnEnhancedLexicalResourceforSen+mentAnalysisandOpinionMining.LREC-2010
• Homepage:hYp://sen+wordnet.is+.cnr.it/• AllWordNetsynsetsautoma+callyannotatedfordegreesofposi+vity,
nega+vity,andneutrality/objec+veness• [es+mable(J,3)]“maybecomputedores+mated”
Pos 0 Neg 0 Obj 1 • [es+mable(J,1)]“deservingofrespectorhighregard”
Pos .75 Neg 0 Obj .25
42
DanJurafsky
Disagreementsbetweenpolaritylexicons
OpinionLexicon
GeneralInquirer
Sen%WordNet LIWC
MPQA 33/5402(0.6%) 49/2867(2%) 1127/4214(27%) 12/363(3%)
OpinionLexicon 32/2411(1%) 1004/3994(25%) 9/403(2%)
GeneralInquirer 520/2306(23%) 1/204(0.5%)
Sen%WordNet 174/694(25%)
LIWC
43
ChristopherPoYs,Sen+mentTutorial,2011
DanJurafsky
AnalyzingthepolarityofeachwordinIMDB
• Howlikelyiseachwordtoappearineachsen+mentclass?• Count(“bad”)in1-star,2-star,3-star,etc.• Butcan’tuserawcounts:• Instead,likelihood:• Makethemcomparablebetweenwords
• Scaledlikelihood:
PoYs,Christopher.2011.Onthenega+vityofnega+on.SALT20,636-659.
P(w | c) = f (w,c)f (w,c)
w∈c∑
P(w | c)P(w)
44
DanJurafsky
AnalyzingthepolarityofeachwordinIMDB
●●
●●
●●
●
●
●●
POS good (883,417 tokens)
1 2 3 4 5 6 7 8 9 10
0.080.10.12
● ● ● ● ●●
●
●
●
●
amazing (103,509 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.17
0.28
●●
●●
●
●
●
●
●
●
great (648,110 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.11
0.17
● ● ● ●●
●
●
●
●
●
awesome (47,142 tokens)
1 2 3 4 5 6 7 8 9 10
0.05
0.16
0.27
Pr(c|w)
Rating
● ● ● ●
●
●
●
●● ●
NEG good (20,447 tokens)
1 2 3 4 5 6 7 8 9 10
0.03
0.1
0.16● ●
●●
●
●● ● ●
●
depress(ed/ing) (18,498 tokens)
1 2 3 4 5 6 7 8 9 10
0.080.110.13
●
●
●
●
●
●
●
●● ●
bad (368,273 tokens)
1 2 3 4 5 6 7 8 9 10
0.04
0.12
0.21
●
●
●
●
●
●
●● ● ●
terrible (55,492 tokens)
1 2 3 4 5 6 7 8 9 10
0.03
0.16
0.28
Pr(c|w)
Rating
Scaled
likelihoo
dP(w|c)/P(w)
Scaled
likelihoo
dP(w|c)/P(w)
PoYs,Christopher.2011.Onthenega+vityofnega+on.SALT20,636-659.
45
DanJurafsky
Othersen%mentfeature:Logicalnega%on
• Islogicalnega+on(no,not)associatedwithnega+vesen+ment?
• PoYsexperiment:• Countnega+on(not,n’t,no,never)inonlinereviews• Regressagainstthereviewra+ng
PoYs,Christopher.2011.Onthenega+vityofnega+on.SALT20,636-659.
46
DanJurafsky
Semi-supervisedlearningoflexicons
• Useasmallamountofinforma+on• Afewlabeledexamples• Afewhand-builtpaYerns
• Tobootstrapalexicon
50
DanJurafsky
HatzivassiloglouandMcKeownintui%onforiden%fyingwordpolarity
• Adjec+vesconjoinedby“and”havesamepolarity• Fairandlegi+mate,corruptandbrutal• *fairandbrutal,*corruptandlegi+mate
• Adjec+vesconjoinedby“but”donot• fairbutbrutal
51
VasileiosHatzivassiloglouandKathleenR.McKeown.1997.Predic+ngtheSeman+cOrienta+onofAdjec+ves.ACL,174–181
DanJurafsky
Hatzivassiloglou&McKeown1997Step1
• Labelseedsetof1336adjec+ves(all>20in21millionwordWSJcorpus)• 657posi+ve• adequatecentralcleverfamousintelligentremarkablereputedsensi+veslenderthriving…
• 679nega+ve• contagiousdrunkenignorantlankylistlessprimi+vestridenttroublesomeunresolvedunsuspec+ng…
52
DanJurafsky
Hatzivassiloglou&McKeown1997Step2
• Expandseedsettoconjoinedadjec+ves
53
nice, helpful
nice, classy
DanJurafsky
Hatzivassiloglou&McKeown1997Step3
• Supervisedclassifierassigns“polaritysimilarity”toeachwordpair,resul+ngingraph:
54
classy
nice
helpful
fair
brutal
irrational corrupt
DanJurafsky
Hatzivassiloglou&McKeown1997Step4
• Clusteringforpar++oningthegraphintotwo
55
classy
nice
helpful
fair
brutal
irrational corrupt
+ -
DanJurafsky
Outputpolaritylexicon
• Posi+ve• bolddecisivedisturbinggenerousgoodhonestimportantlargematurepa+entpeacefulposi+veproudsounds+mula+ngstraigh�orwardstrangetalentedvigorouswiYy…
• Nega+ve• ambiguouscau+ouscynicalevasiveharmfulhypocri+calinefficientinsecureirra+onalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…
56
DanJurafsky
Outputpolaritylexicon
• Posi+ve• bolddecisivedisturbinggenerousgoodhonestimportantlargematurepa+entpeacefulposi+veproudsounds+mula+ngstraigh�orwardstrangetalentedvigorouswiYy…
• Nega+ve• ambiguouscau%ouscynicalevasiveharmfulhypocri+calinefficientinsecureirra+onalirresponsibleminoroutspokenpleasantrecklessriskyselfishtediousunsupportedvulnerablewasteful…
57
DanJurafsky
TurneyAlgorithm
1. Extractaphrasallexiconfromreviews2. Learnpolarityofeachphrase3. Rateareviewbytheaveragepolarityofitsphrases
58
Turney (2002): Thumbs Up or Thumbs Down? Semantic Orientation Applied to Unsupervised Classification of Reviews
DanJurafsky
Extracttwo-wordphraseswithadjec%ves
FirstWord SecondWord ThirdWord(notextracted)
JJ NNorNNS anythingRB,RBR,RBS JJ NotNNnorNNSJJ JJ NotNNorNNSNNorNNS JJ NorNNnorNNSRB,RBR,orRBS VB,VBD,VBN,VBG anything59
DanJurafsky
Howtomeasurepolarityofaphrase?
• Posi+vephrasesco-occurmorewith“excellent”• Nega+vephrasesco-occurmorewith“poor”• Buthowtomeasureco-occurrence?
60
DanJurafsky
PointwiseMutualInforma%on
• Mutualinforma%onbetween2randomvariablesXandY• Pointwisemutualinforma%on:
• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
I(X,Y ) = P(x, y)y∑
x∑ log2
P(x,y)P(x)P(y)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
61
DanJurafsky
PointwiseMutualInforma%on
• Pointwisemutualinforma%on:• Howmuchmoredoeventsxandyco-occurthaniftheywereindependent?
• PMIbetweentwowords:• Howmuchmoredotwowordsco-occurthaniftheywereindependent?
PMI(word1,word2 ) = log2P(word1,word2)P(word1)P(word2)
PMI(X,Y ) = log2P(x,y)P(x)P(y)
62
DanJurafsky
HowtoEs%matePointwiseMutualInforma%on
• Querysearchengine(Altavista)• P(word)es+matedbyhits(word)/N• P(word1,word2)byhits(word1 NEAR word2)/N• (MorecorrectlythebigramdenominatorshouldbekN,becausethereareatotalofNconsecu+vebigrams(word1,word2),butkNbigramsthatarekwordsapart,butwejustuseNontherestofthisslideandthenext.)
PMI(word1,word2 ) = log2
1Nhits(word1 NEAR word2)
1Nhits(word1) 1
Nhits(word2)
63
DanJurafsky
Doesphraseappearmorewith“poor”or“excellent”?
64
Polarity(phrase) = PMI(phrase,"excellent")−PMI(phrase,"poor")
= log2hits(phrase NEAR "excellent")hits("poor")hits(phrase NEAR "poor")hits("excellent")!
"#
$
%&
= log2hits(phrase NEAR "excellent")
hits(phrase)hits("excellent")hits(phrase)hits("poor")
hits(phrase NEAR "poor")
= log2
1N hits(phrase NEAR "excellent")1N hits(phrase) 1
N hits("excellent")− log2
1N hits(phrase NEAR "poor")1N hits(phrase) 1
N hits("poor")
DanJurafsky
Phrasesfromathumbs-upreview
65
Phrase POStags Polarity
onlineservice JJNN 2.8
onlineexperience JJNN 2.3
directdeposit JJNN 1.3
localbranch JJNN 0.42…
lowfees JJNNS 0.33
trueservice JJNN -0.73
otherbank JJNN -0.85
inconvenientlylocated JJNN -1.5
Average 0.32
DanJurafsky
Phrasesfromathumbs-downreview
66
Phrase POStags Polarity
directdeposits JJNNS 5.8
onlineweb JJNN 1.9
veryhandy RBJJ 1.4…
virtualmonopoly JJNN -2.0
lesserevil RBRJJ -2.3
otherproblems JJNNS -2.8
lowfunds JJNNS -6.8
unethicalprac+ces JJNNS -8.5
Average -1.2
DanJurafsky
ResultsofTurneyalgorithm
• 410reviewsfromEpinions• 170(41%)nega+ve• 240(59%)posi+ve
• Majorityclassbaseline:59%• Turneyalgorithm:74%
• Phrasesratherthanwords• Learnsdomain-specificinforma+on67
DanJurafsky
UsingWordNettolearnpolarity
• WordNet:onlinethesaurus(coveredinlaterlecture).• Createposi+ve(“good”)andnega+veseed-words(“terrible”)• FindSynonymsandAntonyms
• Posi+veSet:Addsynonymsofposi+vewords(“well”)andantonymsofnega+vewords
• Nega+veSet:Addsynonymsofnega+vewords(“awful”)andantonymsofposi+vewords(”evil”)
• Repeat,followingchainsofsynonyms• Filter68
S.M.KimandE.Hovy.2004.Determiningthesen+mentofopinions.COLING2004M.HuandB.Liu.Miningandsummarizingcustomerreviews.InProceedingsofKDD,2004
DanJurafsky
SummaryonLearningLexicons
• Advantages:• Canbedomain-specific• Canbemorerobust(morewords)
• Intui+on• Startwithaseedsetofwords(‘good’,‘poor’)• Findotherwordsthathavesimilarpolarity:• Using“and”and“but”• Usingwordsthatoccurnearbyinthesamedocument• UsingWordNetsynonymsandantonyms
• Useseedsandsemi-supervisedlearningtoinducelexicons
69
DanJurafsky
Findingsen%mentofasentence
• ImportantforfindingaspectsoraYributes• Targetofsen+ment
• The food was great but the service was awful
72
DanJurafsky
Findingaspect/a;ribute/targetofsen%ment
• Frequentphrases+rules• Findallhighlyfrequentphrasesacrossreviews(“fish tacos”)• Filterbyruleslike“occursrighta�ersen+mentword”• “…great fish tacos”meansfish tacos alikelyaspect
Casino casino,buffet,pool,resort,bedsChildren’sBarber haircut,job,experience,kidsGreekRestaurant food,wine,service,appe+zer,lambDepartmentStore selec+on,department,sales,shop,clothing
M.HuandB.Liu.2004.Miningandsummarizingcustomerreviews.InProceedingsofKDD.S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSen+mentSummarizerforLocalServiceReviews.WWWWorkshop.
73
DanJurafsky
Findingaspect/a;ribute/targetofsen%ment
• Theaspectnamemaynotbeinthesentence• Forrestaurants/hotels,aspectsarewell-understood• Supervisedclassifica+on
• Hand-labelasmallcorpusofrestaurantreviewsentenceswithaspect• food,décor,service,value,NONE
• Trainaclassifiertoassignanaspecttoasentence• “Giventhissentence,istheaspectfood,décor,service,value,orNONE”
74
DanJurafsky
PuGngitalltogether:Findingsen%mentforaspects
75
ReviewsFinalSummary
Sentences&Phrases
Sentences&Phrases
Sentences&Phrases
Text Extractor
Sentiment Classifier
Aspect Extractor Aggregator
S.Blair-Goldensohn,K.Hannan,R.McDonald,T.Neylon,G.Reis,andJ.Reynar.2008.BuildingaSen+mentSummarizerforLocalServiceReviews.WWWWorkshop
DanJurafsky
ResultsofBlair-Goldensohnetal.methodRooms(3/5stars,41comments)
(+)Theroomwascleanandeverythingworkedfine–eventhewaterpressure...
(+)Wewentbecauseofthefreeroomandwaspleasantlypleased...
(-)…theworsthotelIhadeverstayedat...Service(3/5stars,31comments)
(+)Uponcheckingoutanothercouplewascheckingearlyduetoaproblem...
(+)Everysinglehotelstaffmembertreatedusgreatandansweredevery...
(-)ThefoodiscoldandtheservicegivesnewmeaningtoSLOW.
Dining(3/5stars,18comments)(+)ourfavoriteplacetostayinbiloxi.thefoodisgreatalsotheservice...(+)OfferoffreebuffetforjoiningthePlay76
DanJurafsky
Baselinemethodsassumeclasseshaveequalfrequencies!
• Ifnotbalanced(commonintherealworld)• can’tuseaccuraciesasanevalua+on• needtouseF-scores
• Severeimbalancingalsocandegradeclassifierperformance• Twocommonsolu+ons:
1. Resamplingintraining• Randomundersampling
2. Cost-sensi+velearning• PenalizeSVMmoreformisclassifica+onoftherarething
77
DanJurafsky
Howtodealwith7stars?
1. Maptobinary2. Uselinearorordinalregression• Orspecializedmodelslikemetriclabeling
78
Bo Pang and Lillian Lee. 2005. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. ACL, 115–124
DanJurafsky
SummaryonSen%ment
• Generallymodeledasclassifica+onorregressiontask• predictabinaryorordinallabel
• Features:• Nega+onisimportant• Usingallwords(innaïvebayes)workswellforsometasks• Findingsubsetsofwordsmayhelpinothertasks• Hand-builtpolaritylexicons• Useseedsandsemi-supervisedlearningtoinducelexicons79
DanJurafsky
SchererTypologyofAffec%veStates
• Emo%on:brieforganicallysynchronized…evalua+onofamajorevent• angry,sad,joyful,fearful,ashamed,proud,elated
• Mood:diffusenon-causedlow-intensitylong-dura+onchangeinsubjec+vefeeling• cheerful,gloomy,irritable,listless,depressed,buoyant
• Interpersonalstances:affec+vestancetowardanotherpersoninaspecificinterac+on• friendly,flirta1ous,distant,cold,warm,suppor1ve,contemptuous
• AGtudes:enduring,affec+velycoloredbeliefs,disposi+onstowardsobjectsorpersons• liking,loving,ha1ng,valuing,desiring
• Personalitytraits:stablepersonalitydisposi+onsandtypicalbehaviortendencies• nervous,anxious,reckless,morose,hos1le,jealous
80
DanJurafsky
Computa%onalworkonotheraffec%vestates
• Emo%on:• Detec+ngannoyedcallerstodialoguesystem• Detec+ngconfused/frustratedversusconfidentstudents
• Mood:• Findingtrauma+zedordepressedwriters
• Interpersonalstances:• Detec+onofflirta+onorfriendlinessinconversa+ons
• Personalitytraits:• Detec+onofextroverts
81
DanJurafsky
Detec%onofFriendliness
• Friendlyspeakersusecollabora+veconversa+onalstyle• Laughter• Lessuseofnega+veemo+onalwords• Moresympathy• That’s too bad I’m sorry to hear that
• Moreagreement• I think so too
• Lesshedges• kind of sort of a little …
82
Ranganath,Jurafsky,McFarland
Recommended