35
PEAK: Pyramid Evaluation via Automated Knowledge Extraction Qian Yang, Rebecca J. Passonneau,Gerard de Melo PhD Candidate, Tsinghua University Visiting Student, Columbia University http://www.larayang.com/

PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

PEAK:PyramidEvaluationviaAutomatedKnowledgeExtraction

QianYang,RebeccaJ.Passonneau,GerarddeMelo

PhDCandidate,TsinghuaUniversityVisitingStudent,ColumbiaUniversity

http://www.larayang.com/

Page 2: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 3: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 4: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

EvaluatingSummary Content

• Human assessors– Judgeeachsummaryindividually– Verytime-consuming anddoesnotscale well

• ROUGE (Lin2004)– Automaticallycomparesn-gramswithmodelsummaries– Notreliable enoughforindividualsummaries(Gillick 2011)

• Pyramid Method (Nenkova andPassonneau, 2004)– Semanticcomparison,reliableforindividualsummaries– Hasrequiredmanual annotation

Page 5: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 6: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 7: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Our Contribution

• Noneed formanually createdpyramids• Alsogood resultsonautomaticassessmentgivenapyramid

Page 8: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 9: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 10: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 11: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

SemanticContentAnalysis

Source: http://www1.ccls.columbia.edu/~beck/pubs/2458_PassonneauEtAl.pdf

Page 12: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Figure 1: Sample SCU from Pyramid Annotation Guide: DUC 2006.

SemanticContentAnalysis

Weight: 4

Page 13: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

SemanticContentAnalysis

• “Thelawofconservationofenergyisthenotionthatenergycanbetransferredbetweenobjects butcannotbecreatedordestroyed.”• Openinformationextraction(OpenIE)methodssplitthemandextract

<subject,predicate,object>triples

Page 14: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

• “Thesecharacteristicsdetermine thepropertiesofmatter”

yieldsthetriple⟨Thesecharacteristics,determine,thepropertiesofmatter⟩• WeuseClausIE (DelCorro andGemulla 2013)

SemanticContentAnalysis

Page 15: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Figure 2: Hypergraph to capture similarites between elements of triples, with salient nodes circled in red

Similarity Score: Align,DisambiguateandWalk(ADW) (Pilehvar, Jurgens,andNavigli 2013),

SemanticContentAnalysis

Page 16: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 17: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 18: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

PyramidInduction

Page 19: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

PyramidInduction

Page 20: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

PyramidInduction

Page 21: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 22: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 23: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Scoring – Pyramid Method

• Score atargetsummaryagainstapyramid–AnnotatorsmarkspansoftextinthetargetsummarythatexpressanSCU

–TheSCUweightsincrementtherawscoreforthetargetsummary.

• An Example– SCULabel: PlaidCymru wantsfullindependence–Target Summary:PlaidCymru demandsanindependentWales

Page 24: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

AutomatedScoring – PEAK

Page 25: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 26: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 27: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Dataset

• Student summarydatasetfromPerin etal.(2013)with20 targetsummarieswrittenbystudents• Passonneau etal.(2013)hadproduced5referencemodelsummaries,and2manuallycreatedpyramids

Page 28: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Results

Page 29: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Results

Page 30: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Result

• Machine-GeneratedSummaries–Dataset: the2006DocumentUnderstandingConference(DUC)administeredbyNIST(“DUC06”)

–ThePearson’scorrelationscorebetweenPEAK’sscoresandthemanualonesis0.7094.

Page 31: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 32: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Content

• EvaluatingSummary Content• Our Contribution• How does PEAK work?

– SemanticContentAnalysis– PyramidInduction– AutomatedScoring

• Our Results• Conclusion

Page 33: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Conclusion

• Thefirstfullyautomaticversionofthepyramidmethod• Notonlyevaluatestargetsummariesbutalsogeneratesthepyramidsautomatically• Experimentsshowthat–OurSCUsaresimilartothosecreatedbyhumans–The methodforassessingtargetsummariesautomaticallyhasahighcorrelationwithhumanassessors

Page 34: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

• Overall, our research shows great promise forautomated scoring and assessment of manual orautomated summaries, opening up the possibilityof wide-spread use in the education domain and ininformation management.

Page 35: PEAK: Pyramid Evaluation via Automated Knowledge Extraction · Scoring–Pyramid Method • Scorea target summary against a pyramid –Annotators mark spans of text in the target

Thisdataandcodesareavailableathttp://www.larayang.com/peak/.

Thankyou!