Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
1. Jump features (words transi4ons): word-level forward and backwardgazejumps
2. Totaljumpdistance:totalgazedistancecoveredwhileevalua:ng3. Inter-regionjumps:gazejumpsbetweentransla:onandthereference4. Dwell4me:longer:meeyesspendonaregion5. Lexicalizedfeatures:
• extractstreamsoflexicalsequencesR• scoreusingatrigramlanguagemodel
Iden:fysoundFindtheleAer
Introduction & Motivation
Eyes Don’t Lie: Predicting Machine Translation Quality Using Eye Movement
Features
Experimental Setup
Results
Conclusion
HassanSajjad,FranciscoGuzman,NadirDurrani,HoudaBouamor*,AhmedAbdelali,IrinaTemnikova,StephanVogelQatarCompu:ngResearchIns:tute,HBKUQatar;CarnegieMellonUniversity,Qatar*
Problem:Humanevalua:onsuffersfrominter-andintra-annotatoragreementsEvalua:onscoresaretoosubjec:ve
Hypothesis:ReadingpaLernsfromevaluatorscanhelpto
• shedlightintotheevalua:onprocess• understandwhichpartsofthesentencesaredifficulttoevaluate• developasemi-automa:cevalua:onsystembasedonreadingpaAerns
OurSolu4on:usereadingpaAernsasamethodtodis:nguishbetweengoodandbadtransla:onsInaddi:on:• iden:fiednovelfeaturesfromgazedata• modelandpredictthequalityoftransla:onsasperceivedbyevaluators
• Linearregressionmodelwithridgeregulariza:on• Ridgecoefficientminimizestheerror
• Parameter λ controls the amount of shrink applied to regressioncoefficients
• UsedtheglmnetpackageofRforcross-valida:ontofindthebestvalueofλonthetrainingdata
Data• SubsetoftheSpanish-EnglishWMT’12Evalua:ontask• Selected 60 medium-length sentences, evaluated by at least 2 differentannotators
• Selected the best & the worst transla:ons, according to a humanevalua:onscore,basedonexpectedwins.
• Total120evalua:ontasksx6differentevaluators=720evalua:onsEye-trackingAnnota4ons• Presentevaluatorswithatransla:on-referencepair• Thebest/worsttransla:onsofthesamesentencehavebeenshownwithatleast40differenttasksinbetween
• Assigna0-100scoretoeachtask• Inter-annotator kappa = 0.321 (slightly higher than the overall IAA inWMT’12forSpanish-English–0.284)
Evalua4onProtocolsimilartoWMT’12• Pairwiseevalua:on• Computed the Kendall’s tau
coefficient• Evaluated using 10-fold cross-
valida:on
Model
Tool• EyeTribeeye-tracker• Samplingfrequencyof30Hz.• Evalua:onenvironment:iAppraise
Lackpredic:vepower
ReadingpaAernsontransla:onandinter-regionbringusefulinforma:on
Lexicalizedgazejumpsbrings
addi:onalvaluethananLM
Combina4onswithBLEUBbleu 0.34
Bbleu+EyeTrabj 0.38
Bbleu+EyeLexall 0.42
Combining the best features with BLEU brings: readingpaAernscapturemorethanjustfluencyandadequacy
Conclusions: Eye–tracking features extracted captures addi:onalinforma:onandcancomplementtradi:onalmeasures(BLEU).Future work: more users, language pairs, early termina:on features,deepenanalysis.