38
Identifying Sentiment and Situation Frames in Low Resource Languages Julia Hirschberg and Zixiaofan (Brenda) Yang Columbia University 1

Identifying Sentiment and Situation Frames in Low …julia/talks/cmu17.pdf · Identifying Sentiment and Situation Frames in Low Resource Languages Julia Hirschberg and Zixiaofan(Brenda)

Embed Size (px)

Citation preview

IdentifyingSentimentandSituationFramesinLowResourceLanguages

JuliaHirschbergandZixiaofan (Brenda)YangColumbiaUniversity

1

• ThankstoGideonMendels,SaraStolbachRosenthal,Axinia Radeva

2

• Indisastersituations,internationalrespondersmaynotspeakthelanguageoftheareaindistressandmayhavelittlereliable accesstolocalinformants– 7100+activelanguagesintheworld-- hardtopredictwhichlanguageswillbeneedednext• 44inBokoHaramarea(Hausa,Kanuri)~522languagesinallofNigeria

• 19inEbolaoutbreakareasinLiberia,SierraLeone,andGuinea• 20+MayanlanguagesspokenbyCentralAmericanrefugeechildren

– Currentmethodsrequire3yearsand$10M’sperlanguage(mostlytopreparetrainingcorpora)• Wouldrequire$70Band230Kperson-yearstohandlealllanguages

HumanitarianAssistanceandDisasterRelief(HADR)inDARPALorelei

• Howcanwedeveloplanguagetechnologiesquicklytohelpfirstrespondersunderstandtextandspeechinformationvitaltotheirmission(socialmedia,hotlinemsgs,newsbroadcasts)?– Triageinformationbyurgencyandsentiment/emotion(anger,stress,fear,happiness)

– Displayinformationinaformthatreliefworkerscaneasilyunderstand

Challenge

SituationFrames

• Civilunrestofwide-spreadcrime

• Electionsandpolitics• Evacuation• Foodsupply• Urgentrescue• Utilities,energy,or

sanitation

• Infrastructure• Intervention• Medicalassistance• Missing,deadorinjured• Shelter• Terrorismorotherextreme

violence• Watersupply

7

OurGoal

• Identifysentimentandemotionintextandspeechtosharewithreliefworkers– Provideadditional,extra-propositionalmeaning• Fearandstressofvictims• Happinessatsuccessofreliefefforts• Angeratreliefworkers

– Approach:DevelopwaystorecognizeandinterpretsentimentandemotioninLRLsbytrainingonHighResourceLanguagesandotherLRLs

ThreeMethods

• CanwerecognizeemotionsrelevanttoLoreleifromlabeledspeech(e.g.anger,stress,fear)

• Cansystemstrainedonemotion/sentimentinspeechofonelanguagebeusedtorecognizeemotion/sentimentinanother?

• Cantext-trained sentimentsystemsbeusedtolabelunlabeled speechtranscriptstotrainsentimentrecognitioninspeech?

• Canweaugmentavailabletextandspeechdatabywebscraping?

LabeledMandarin:Angervs.Neutral

• Corpus:MandarinAffectiveSpeech• Language:Mandarin– Neutralsentences(e.g.“Itwillraintonight.”)andwords(e.g.“train,”“apple”)

– 5 basic emotions(neutral,anger,elation,panic,sadness)simulated by 68 students

• Ourstudy:Anger:5100vs.Neutral:5100

FeatureExtractionUsingopenSMILE

• Baselinefeatures(384)• ‘Standard’simplelow-levelacousticfeatures(e.g.,MFCC’s;max,minandmeanframeenergy)• ‘Unique’features(e.g.slopeandoffsetofalinearapproximationof MFCC1-12)

– Largerfeatureset(6552)• MoreFunctionals andLow-LevelDescriptors

• Random forest: (Scikit-learn)– Train decision tree classifiers onvarioussub-samplesofthetraining setusing384featureset–Usesaveragingtoimprovethepredictiveaccuracyandcontrolover-fitting

• Weighted F-measure: 0.88(0.50baseline);P=.88;R=.88

Machine Learning Results

UsefulFeatures• ArithmeticmeanandmaxvalueofMFCC[1] (melfrequencycepstralcoefficients)

• Theoffsetoflinearapproximationofroot-mean-squareframeenergy

• ArithmeticmeanandmaxvalueofMFCC[2]• Range,maxvalue,quadraticerrorandstandarddeviationofthe1storderdeltacoefficientofMFCC[1]

• OffsetoflinearapproximationofMFCC[1]• Arithmeticmeanofroot-mean-squareframeenergy

LabeledEnglish:Stressvs.Neutral

• Corpus:SUSAS(Speechundersimulatedandactualstress)– Neutralwords(e.g.“break”or“eight”)simulatedby9speakers

– Stressproduceddoingsingletrackingtasks– Stress:630;Neutral:631

• Classification result on random forest model:WeightedF-measure:0.7031 (.50baseline);P=.70;R=.70

• Queen’sUBelfast,http://semaine-db.eu• NaturalinteractionsinEnglishbetweenusersandan’operator’simulatingaSensitiveArtificialListener(SAL)agent

• SAL agent examples:– ‘Dotellmeallthedeliciousdetails.’– ‘Ohh....thatwouldbelovely.’– ‘Whatareyourweaknesses?’– ‘It'sallrubbish.’

Multi-labeledEnglishSemaine Corpus

• Annotationsby6-8raters for each conversation– Fullratingforvalence,activation,power,expectation/anticipation,intensity

– Optional rating for basicemotions:anger,happiness,sadness,fear,contempt…

• SolidSALpart: 87conversations,eachlastingapproximately5minutes

• GivenacorpusofangerinEnglish,canwepredictangerinMandarin?Andviceversa?

• TrainonEnglishSemaine,testonMandarinAffectCorpus:F1=0.56(cf.Mand/Mand 0.88)TrainonMandarinAffect,testonEnglishSemaine:F1=0.62(cf.Eng/Eng:F1=.77)

Cross-LingualTraining

ComparingHumanSentimentLabelstoAutomaticLabels

• Question:– Supposewehaveunlabeledspeech,canweannotatetranscripts

automaticallywithasentimentannotationsystemandusethoselabelsforunlabeledspeechinsteadofmanuallabels?

• TestofSemaine Corpus:– Segmenttranscriptsintosentencesandalignwithspeech– TranslateSemaine manual,continuouspos/neg labelsinto

binaryforuseasgoldstandard– Labeltrainingtranscriptsentencesusingtext-trained

sentimentanalyzertolabelpositive/negative/neutral– Buildclassifierfromsentiment-labeledspeechandcompareto

classifierbuiltusingmanualSemaine speechlabels

Maually LabeledExamples

• Valencescore:-0.8819

• Valencescore:0.1125

• Valencescore:0.5803

• Valencescore:0.8308

EnglishText-basedSentimentAnalysis

• Sentimentdetectionsystem(Rosenthal2014)• Features(lexical,syntactic):- DictionaryofAffectandLanguage(DAL)- WordNet 3.0- Wiktionary- POStags- Top 500n-gramfeatures

• Outputlabel:positive/negative/neutral

• Examples:– Anyway he wouldprobablydo allthewrong shopping.

• Sentiment analysis output label: Negative• Valencescore: - 0.4420

– There mustbe lot’s of happythings in yourlife.• Sentiment analysis output label: Positive• Valencescore: 0.7451

– *And howam I goingtowrap allthepresents?• Sentiment analysis output label: Neutral• Valencescore: - 0.4090

– *Life is verybad, I don’t supposeyours is anybetter.• Sentiment analysis output label: Positive• Valencescore: - 0.7500

ComparisonofSentimentLabelsvs.ValenceScores

ComparisonofSentimentLabelsvs.ValenceScores

• Sentiment:Positive:1301, Negative: 978, Neutral: 1177• Distribution of automaticsentiment labels over manual

valence scores:

ResultsofSentimentAnalysisofTranscripts

• Manuallyannotated valencescores areunbalanced:– 2363 sentences with positive score(score >= 0)– 1093 sentences with negative score(score < 0)

• Set ‘neutral’ threshold to 0.118– 1728 sentences with positive/negative score

• Precision of sentimentlabels using new threshold:– Positive label precision: 57.88%– Negative label precision: 60.22%

SentimentLabelsvs.ValenceScoresasSpeechLabels

• openSMILE baseline (384)feature set• 4 speechexperiments:- Train on sentimentlabels; test on sentimentlabels- *Train on sentimentlabels; test on (human)valence

scores- Train on (human)valencescores; test on sentiment

labels- *Train on (human)valencescores; test on (human)

valencescores• 10-foldcrossvalidation;weightedf-measure

Method• Unbalanced classes in training data:– Moving threshold score for a balanced division– Upsampling– Down sampling

• Machinelearningalgorithms:(Scikit-learn)– Linearmodels:Linear regression; Ridge; Lasso– Nearest neighbors model: KNN– Treemodel:Decisiontree– Ensemble models: Random forest; Ada Boost

• Unbalanced classes in test data:– Evaluation: Weighted F-measure

Experiments:SentimentLabelsvs.ValenceScores

• Baseline: Majority class(positive)

• Shouldimprovewhenweaddlexicalfeaturestoacousticones

Trainon SentimentLabels Semaine ValenceScores

Test on SentimentLabels

ValenceScores

SentimentLabels

ValenceScores

Baseline 0.4140 0.5526 0.4140 0.5526

Random Forest 0.5425 0.6111 0.4979 0.6897

• WecandetectemotionslikeangerandstressfromlabeledMandarinandEnglishspeechreasonablywell

• Wecandetectemotions(e.g.anger)bytrainingononelanguageandtestingonanotherwithperformanceabovethebaselines

• WecandetectmanuallylabeledEnglishemotionsspeechfromtranscriptsautomaticallylabeledwithsentiment,alsowithpromisingresults

• CanweapplythesetechniquestoLowResourceLanguages?

So…

• Collected by Appen fordevelopingspeechrecognitiontechnologiesandkeywordsearchinLRLs

• Containsapproximately213hoursof993naturalconversationsintelephonecalls

• Allconversationswerefullytranscribedwithtime-alignmentonturnlevel

IARPA Babel Turkish Corpus

28

• EmoLex inTurkish– TranslatedfromEnglishNRCEmoLex byGoogle

• SentiTurkNet– Builtusingextensivehumanannotations

• OurTurkishsentimentlexicon– Createdautomatically– MergeEnglishSentiWordNet withbilingualEnglish/Turkishdictionary

Turkishsentimentlexicons

29

Lexicon #ofsentimentwordsinthelexicon

#ofturnslabeledinTurkishBabelcorpus

Machinelearningmodel

Accuracy(baseline=0.5)positive negative

Our Turkishsentimentlexicon

26262 29743 14642 Ridgeregression

0.5724

SentiTurkNet 15592 25901 13450 Ridgeregression

0.5659

EmoLex 3897 12442 6792 RandomForest

0.5702

ComparisonofThreeLexicons

• SentimentlabelsprovidedbyourautomaticallygeneratedTurkishsentimentlexiconarecomparabletomanuallycreatedSentiTurkNet lexicon:butnoGoldDatayet…

30

• AvailableLRLtrainingpacks– Chinese,Turkish,Uzbek,Russian, Hausa,Amharic

• Incidentlanguage:Uyghur• Task:Binary classification– Whether a segment containsa situation frame

• Method:– Extractspeechfeaturesforalltraininglanguages– Assemble the models

31

CanWeDetectSituationFramesinLRLs?

trainon\teston CHN TUR UZB RUS HAU AMH

CHN 0.68* 0.69 0.63 0.60 0.55 0.67

TUR 0.57 0.82* 0.71 0.53 0.53 0.61

UZB 0.53 0.73 0.77* 0.57 0.47 0.57

RUS 0.52 0.69 0.73 0.75* 0.58 0.63

HAU 0.56 0.65 0.58 0.64 0.78* 0.71

AMH 0.56 0.64 0.65 0.58 0.58 0.93*

Cross-LanguageExperiments

• Higher accuracy for language pairs within the same languagefamily,e.g.Uzbek/Turkish(Turkic),Amharic/Hausa(Afroasiatic)32

• GivehigherweightstoTurkishandUzbektrainingdata

• ObtainUyghurlabeleddatafromourNativeInformant

• Bestmodel:RandomForest with max_depth=4– Prevent over-fitting to the training languages– AUC = 0.657– Highest maximum F1 among all participants

33

TestingonUyghur(Turkic)

ResultsforBinaryClassification

34

• Lack of labeled speech data in LRLs– Unbalanced training data in most languages– No training data for the IL(Uyghur)

• Web scraping videos of news from YouTube– Searching SF-related Uyghur keywords• ‘earthquake’, ‘flood’, ‘terrorism’, etc.

– Hypothesis: News with incident words in captionscontainsituation frames

• Using the 1.5GB web-scraped data, the AUCincreased from 0.652 to 0.680

LaterData Augmentation

35

• EmotionDetection– WecandetectemotionslikeangerandstressfromlabeledMandarinandEnglishspeechreasonablywell

– Wecandetectemotions(e.g.anger)bytrainingononelanguageandtestingonanotherreasonablywell

• SentimentLabeling:– WecandetectmanuallylabeledEnglishemotionsspeechfromtranscriptsautomaticallylabeledwithsentiment

– WecancreateautomaticsentimentdetectorsinLRLs(Turkish)thatperformcomparablytomanuallycreatedones

• We can detect Situation Frames using only speechfeaturesinLRLsbycross-lingualtrainingplusweb-collecteddata

Conclusions

36

• Validatethe automatically generatedsentiment labels in Turkish Babel Corpuswithgolddata

• Workonother Babel LRLs• Automatemethodstocollect(speech)webdataforLRLs(imageid?)

• Collectemotionaldatafromsocialmediatotraintext-basedangerandfeardetectors

37

Current/FutureResearch

Thankyou!

[email protected]

38