Upload
trantruc
View
214
Download
0
Embed Size (px)
Citation preview
IdentifyingSentimentandSituationFramesinLowResourceLanguages
JuliaHirschbergandZixiaofan (Brenda)YangColumbiaUniversity
1
• Indisastersituations,internationalrespondersmaynotspeakthelanguageoftheareaindistressandmayhavelittlereliable accesstolocalinformants– 7100+activelanguagesintheworld-- hardtopredictwhichlanguageswillbeneedednext• 44inBokoHaramarea(Hausa,Kanuri)~522languagesinallofNigeria
• 19inEbolaoutbreakareasinLiberia,SierraLeone,andGuinea• 20+MayanlanguagesspokenbyCentralAmericanrefugeechildren
– Currentmethodsrequire3yearsand$10M’sperlanguage(mostlytopreparetrainingcorpora)• Wouldrequire$70Band230Kperson-yearstohandlealllanguages
HumanitarianAssistanceandDisasterRelief(HADR)inDARPALorelei
• Howcanwedeveloplanguagetechnologiesquicklytohelpfirstrespondersunderstandtextandspeechinformationvitaltotheirmission(socialmedia,hotlinemsgs,newsbroadcasts)?– Triageinformationbyurgencyandsentiment/emotion(anger,stress,fear,happiness)
– Displayinformationinaformthatreliefworkerscaneasilyunderstand
Challenge
SituationFrames
• Civilunrestofwide-spreadcrime
• Electionsandpolitics• Evacuation• Foodsupply• Urgentrescue• Utilities,energy,or
sanitation
• Infrastructure• Intervention• Medicalassistance• Missing,deadorinjured• Shelter• Terrorismorotherextreme
violence• Watersupply
7
OurGoal
• Identifysentimentandemotionintextandspeechtosharewithreliefworkers– Provideadditional,extra-propositionalmeaning• Fearandstressofvictims• Happinessatsuccessofreliefefforts• Angeratreliefworkers
– Approach:DevelopwaystorecognizeandinterpretsentimentandemotioninLRLsbytrainingonHighResourceLanguagesandotherLRLs
ThreeMethods
• CanwerecognizeemotionsrelevanttoLoreleifromlabeledspeech(e.g.anger,stress,fear)
• Cansystemstrainedonemotion/sentimentinspeechofonelanguagebeusedtorecognizeemotion/sentimentinanother?
• Cantext-trained sentimentsystemsbeusedtolabelunlabeled speechtranscriptstotrainsentimentrecognitioninspeech?
• Canweaugmentavailabletextandspeechdatabywebscraping?
LabeledMandarin:Angervs.Neutral
• Corpus:MandarinAffectiveSpeech• Language:Mandarin– Neutralsentences(e.g.“Itwillraintonight.”)andwords(e.g.“train,”“apple”)
– 5 basic emotions(neutral,anger,elation,panic,sadness)simulated by 68 students
• Ourstudy:Anger:5100vs.Neutral:5100
FeatureExtractionUsingopenSMILE
• Baselinefeatures(384)• ‘Standard’simplelow-levelacousticfeatures(e.g.,MFCC’s;max,minandmeanframeenergy)• ‘Unique’features(e.g.slopeandoffsetofalinearapproximationof MFCC1-12)
– Largerfeatureset(6552)• MoreFunctionals andLow-LevelDescriptors
• Random forest: (Scikit-learn)– Train decision tree classifiers onvarioussub-samplesofthetraining setusing384featureset–Usesaveragingtoimprovethepredictiveaccuracyandcontrolover-fitting
• Weighted F-measure: 0.88(0.50baseline);P=.88;R=.88
Machine Learning Results
UsefulFeatures• ArithmeticmeanandmaxvalueofMFCC[1] (melfrequencycepstralcoefficients)
• Theoffsetoflinearapproximationofroot-mean-squareframeenergy
• ArithmeticmeanandmaxvalueofMFCC[2]• Range,maxvalue,quadraticerrorandstandarddeviationofthe1storderdeltacoefficientofMFCC[1]
• OffsetoflinearapproximationofMFCC[1]• Arithmeticmeanofroot-mean-squareframeenergy
LabeledEnglish:Stressvs.Neutral
• Corpus:SUSAS(Speechundersimulatedandactualstress)– Neutralwords(e.g.“break”or“eight”)simulatedby9speakers
– Stressproduceddoingsingletrackingtasks– Stress:630;Neutral:631
• Classification result on random forest model:WeightedF-measure:0.7031 (.50baseline);P=.70;R=.70
• Queen’sUBelfast,http://semaine-db.eu• NaturalinteractionsinEnglishbetweenusersandan’operator’simulatingaSensitiveArtificialListener(SAL)agent
• SAL agent examples:– ‘Dotellmeallthedeliciousdetails.’– ‘Ohh....thatwouldbelovely.’– ‘Whatareyourweaknesses?’– ‘It'sallrubbish.’
Multi-labeledEnglishSemaine Corpus
• Annotationsby6-8raters for each conversation– Fullratingforvalence,activation,power,expectation/anticipation,intensity
– Optional rating for basicemotions:anger,happiness,sadness,fear,contempt…
• SolidSALpart: 87conversations,eachlastingapproximately5minutes
• GivenacorpusofangerinEnglish,canwepredictangerinMandarin?Andviceversa?
• TrainonEnglishSemaine,testonMandarinAffectCorpus:F1=0.56(cf.Mand/Mand 0.88)TrainonMandarinAffect,testonEnglishSemaine:F1=0.62(cf.Eng/Eng:F1=.77)
Cross-LingualTraining
ComparingHumanSentimentLabelstoAutomaticLabels
• Question:– Supposewehaveunlabeledspeech,canweannotatetranscripts
automaticallywithasentimentannotationsystemandusethoselabelsforunlabeledspeechinsteadofmanuallabels?
• TestofSemaine Corpus:– Segmenttranscriptsintosentencesandalignwithspeech– TranslateSemaine manual,continuouspos/neg labelsinto
binaryforuseasgoldstandard– Labeltrainingtranscriptsentencesusingtext-trained
sentimentanalyzertolabelpositive/negative/neutral– Buildclassifierfromsentiment-labeledspeechandcompareto
classifierbuiltusingmanualSemaine speechlabels
Maually LabeledExamples
• Valencescore:-0.8819
• Valencescore:0.1125
• Valencescore:0.5803
• Valencescore:0.8308
EnglishText-basedSentimentAnalysis
• Sentimentdetectionsystem(Rosenthal2014)• Features(lexical,syntactic):- DictionaryofAffectandLanguage(DAL)- WordNet 3.0- Wiktionary- POStags- Top 500n-gramfeatures
• Outputlabel:positive/negative/neutral
• Examples:– Anyway he wouldprobablydo allthewrong shopping.
• Sentiment analysis output label: Negative• Valencescore: - 0.4420
– There mustbe lot’s of happythings in yourlife.• Sentiment analysis output label: Positive• Valencescore: 0.7451
– *And howam I goingtowrap allthepresents?• Sentiment analysis output label: Neutral• Valencescore: - 0.4090
– *Life is verybad, I don’t supposeyours is anybetter.• Sentiment analysis output label: Positive• Valencescore: - 0.7500
ComparisonofSentimentLabelsvs.ValenceScores
ComparisonofSentimentLabelsvs.ValenceScores
• Sentiment:Positive:1301, Negative: 978, Neutral: 1177• Distribution of automaticsentiment labels over manual
valence scores:
ResultsofSentimentAnalysisofTranscripts
• Manuallyannotated valencescores areunbalanced:– 2363 sentences with positive score(score >= 0)– 1093 sentences with negative score(score < 0)
• Set ‘neutral’ threshold to 0.118– 1728 sentences with positive/negative score
• Precision of sentimentlabels using new threshold:– Positive label precision: 57.88%– Negative label precision: 60.22%
SentimentLabelsvs.ValenceScoresasSpeechLabels
• openSMILE baseline (384)feature set• 4 speechexperiments:- Train on sentimentlabels; test on sentimentlabels- *Train on sentimentlabels; test on (human)valence
scores- Train on (human)valencescores; test on sentiment
labels- *Train on (human)valencescores; test on (human)
valencescores• 10-foldcrossvalidation;weightedf-measure
Method• Unbalanced classes in training data:– Moving threshold score for a balanced division– Upsampling– Down sampling
• Machinelearningalgorithms:(Scikit-learn)– Linearmodels:Linear regression; Ridge; Lasso– Nearest neighbors model: KNN– Treemodel:Decisiontree– Ensemble models: Random forest; Ada Boost
• Unbalanced classes in test data:– Evaluation: Weighted F-measure
Experiments:SentimentLabelsvs.ValenceScores
• Baseline: Majority class(positive)
• Shouldimprovewhenweaddlexicalfeaturestoacousticones
Trainon SentimentLabels Semaine ValenceScores
Test on SentimentLabels
ValenceScores
SentimentLabels
ValenceScores
Baseline 0.4140 0.5526 0.4140 0.5526
Random Forest 0.5425 0.6111 0.4979 0.6897
• WecandetectemotionslikeangerandstressfromlabeledMandarinandEnglishspeechreasonablywell
• Wecandetectemotions(e.g.anger)bytrainingononelanguageandtestingonanotherwithperformanceabovethebaselines
• WecandetectmanuallylabeledEnglishemotionsspeechfromtranscriptsautomaticallylabeledwithsentiment,alsowithpromisingresults
• CanweapplythesetechniquestoLowResourceLanguages?
So…
• Collected by Appen fordevelopingspeechrecognitiontechnologiesandkeywordsearchinLRLs
• Containsapproximately213hoursof993naturalconversationsintelephonecalls
• Allconversationswerefullytranscribedwithtime-alignmentonturnlevel
IARPA Babel Turkish Corpus
28
• EmoLex inTurkish– TranslatedfromEnglishNRCEmoLex byGoogle
• SentiTurkNet– Builtusingextensivehumanannotations
• OurTurkishsentimentlexicon– Createdautomatically– MergeEnglishSentiWordNet withbilingualEnglish/Turkishdictionary
Turkishsentimentlexicons
29
Lexicon #ofsentimentwordsinthelexicon
#ofturnslabeledinTurkishBabelcorpus
Machinelearningmodel
Accuracy(baseline=0.5)positive negative
Our Turkishsentimentlexicon
26262 29743 14642 Ridgeregression
0.5724
SentiTurkNet 15592 25901 13450 Ridgeregression
0.5659
EmoLex 3897 12442 6792 RandomForest
0.5702
ComparisonofThreeLexicons
• SentimentlabelsprovidedbyourautomaticallygeneratedTurkishsentimentlexiconarecomparabletomanuallycreatedSentiTurkNet lexicon:butnoGoldDatayet…
30
• AvailableLRLtrainingpacks– Chinese,Turkish,Uzbek,Russian, Hausa,Amharic
• Incidentlanguage:Uyghur• Task:Binary classification– Whether a segment containsa situation frame
• Method:– Extractspeechfeaturesforalltraininglanguages– Assemble the models
31
CanWeDetectSituationFramesinLRLs?
trainon\teston CHN TUR UZB RUS HAU AMH
CHN 0.68* 0.69 0.63 0.60 0.55 0.67
TUR 0.57 0.82* 0.71 0.53 0.53 0.61
UZB 0.53 0.73 0.77* 0.57 0.47 0.57
RUS 0.52 0.69 0.73 0.75* 0.58 0.63
HAU 0.56 0.65 0.58 0.64 0.78* 0.71
AMH 0.56 0.64 0.65 0.58 0.58 0.93*
Cross-LanguageExperiments
• Higher accuracy for language pairs within the same languagefamily,e.g.Uzbek/Turkish(Turkic),Amharic/Hausa(Afroasiatic)32
• GivehigherweightstoTurkishandUzbektrainingdata
• ObtainUyghurlabeleddatafromourNativeInformant
• Bestmodel:RandomForest with max_depth=4– Prevent over-fitting to the training languages– AUC = 0.657– Highest maximum F1 among all participants
33
TestingonUyghur(Turkic)
• Lack of labeled speech data in LRLs– Unbalanced training data in most languages– No training data for the IL(Uyghur)
• Web scraping videos of news from YouTube– Searching SF-related Uyghur keywords• ‘earthquake’, ‘flood’, ‘terrorism’, etc.
– Hypothesis: News with incident words in captionscontainsituation frames
• Using the 1.5GB web-scraped data, the AUCincreased from 0.652 to 0.680
LaterData Augmentation
35
• EmotionDetection– WecandetectemotionslikeangerandstressfromlabeledMandarinandEnglishspeechreasonablywell
– Wecandetectemotions(e.g.anger)bytrainingononelanguageandtestingonanotherreasonablywell
• SentimentLabeling:– WecandetectmanuallylabeledEnglishemotionsspeechfromtranscriptsautomaticallylabeledwithsentiment
– WecancreateautomaticsentimentdetectorsinLRLs(Turkish)thatperformcomparablytomanuallycreatedones
• We can detect Situation Frames using only speechfeaturesinLRLsbycross-lingualtrainingplusweb-collecteddata
Conclusions
36
• Validatethe automatically generatedsentiment labels in Turkish Babel Corpuswithgolddata
• Workonother Babel LRLs• Automatemethodstocollect(speech)webdataforLRLs(imageid?)
• Collectemotionaldatafromsocialmediatotraintext-basedangerandfeardetectors
37
Current/FutureResearch