PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

PEAK:PyramidEvaluationviaAutomatedKnowledgeExtraction

QianYang,RebeccaJ.Passonneau,GerarddeMelo

PhDCandidate,TsinghuaUniversityVisitingStudent,ColumbiaUniversity

http://www.larayang.com/

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Content




EvaluatingSummary Content

• Human assessors– Judgeeachsummaryindividually– Verytime-consuming anddoesnotscale well

• ROUGE (Lin2004)– Automaticallycomparesn-gramswithmodelsummaries– Notreliable enoughforindividualsummaries(Gillick 2011)

• Pyramid Method (Nenkova andPassonneau, 2004)– Semanticcomparison,reliableforindividualsummaries– Hasrequiredmanual annotation

Content




Content




Our Contribution

• Noneed formanually createdpyramids• Alsogood resultsonautomaticassessmentgivenapyramid

Content




Content




Content




SemanticContentAnalysis

Source: http://www1.ccls.columbia.edu/~beck/pubs/2458_PassonneauEtAl.pdf

Figure 1: Sample SCU from Pyramid Annotation Guide: DUC 2006.


Weight: 4


• “Thelawofconservationofenergyisthenotionthatenergycanbetransferredbetweenobjects butcannotbecreatedordestroyed.”• Openinformationextraction(OpenIE)methodssplitthemandextract

<subject,predicate,object>triples

• “Thesecharacteristicsdetermine thepropertiesofmatter”

yieldsthetriple⟨Thesecharacteristics,determine,thepropertiesofmatter⟩• WeuseClausIE (DelCorro andGemulla 2013)


Figure 2: Hypergraph to capture similarites between elements of triples, with salient nodes circled in red

Similarity Score: Align,DisambiguateandWalk(ADW) (Pilehvar, Jurgens,andNavigli 2013),


Content




Content




PyramidInduction

PyramidInduction

PyramidInduction

Content




Content




Scoring – Pyramid Method

• Score atargetsummaryagainstapyramid–AnnotatorsmarkspansoftextinthetargetsummarythatexpressanSCU

–TheSCUweightsincrementtherawscoreforthetargetsummary.

• An Example– SCULabel: PlaidCymru wantsfullindependence–Target Summary:PlaidCymru demandsanindependentWales

AutomatedScoring – PEAK

Content




Content




Dataset

• Student summarydatasetfromPerin etal.(2013)with20 targetsummarieswrittenbystudents• Passonneau etal.(2013)hadproduced5referencemodelsummaries,and2manuallycreatedpyramids

Results

Results

Result

• Machine-GeneratedSummaries–Dataset: the2006DocumentUnderstandingConference(DUC)administeredbyNIST(“DUC06”)

–ThePearson’scorrelationscorebetweenPEAK’sscoresandthemanualonesis0.7094.

Content




Content




Conclusion

• Thefirstfullyautomaticversionofthepyramidmethod• Notonlyevaluatestargetsummariesbutalsogeneratesthepyramidsautomatically• Experimentsshowthat–OurSCUsaresimilartothosecreatedbyhumans–The methodforassessingtargetsummariesautomaticallyhasahighcorrelationwithhumanassessors

• Overall, our research shows great promise forautomated scoring and assessment of manual orautomated summaries, opening up the possibilityof wide-spread use in the education domain and ininformation management.

Thisdataandcodesareavailableathttp://www.larayang.com/peak/.

Thankyou!

Documents

PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target