14
A. Overwijk , D. Nguyen, C. Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong

On the Evaluation of Snippet Selection for Information Retrieval

  • Upload
    lyris

  • View
    28

  • Download
    1

Embed Size (px)

DESCRIPTION

On the Evaluation of Snippet Selection for Information Retrieval. A. Overwijk , D. Nguyen, C. Hauff, R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong. Contents. Properties of a good evaluation method Evaluation method of WebCLEF Approach Results Analysis Conclusion. Good evaluation method. - PowerPoint PPT Presentation

Citation preview

Page 1: On the Evaluation of Snippet Selection for Information Retrieval

A. Overwijk, D. Nguyen, C. Hauff,

R.B. Trieschnigg, D. Hiemstra, F.M.G. de Jong

Page 2: On the Evaluation of Snippet Selection for Information Retrieval

ContentsProperties of a good evaluation methodEvaluation method of WebCLEFApproachResultsAnalysisConclusion

Page 3: On the Evaluation of Snippet Selection for Information Retrieval

Good evaluation methodReflects the quality of the systemReusability

Page 4: On the Evaluation of Snippet Selection for Information Retrieval

Evaluation method of WebCLEFRecall

The sum of character lengths of all spans in the response of the system linked to nuggets (i.e. an aspect the user includes in his article), divided by the total sum of span lengths in the responses for a topic in all submitted runs.

PrecisionThe number of characters that belong to at

least one span linked to a nugget, divided by the total character length of the system’s response.

Page 5: On the Evaluation of Snippet Selection for Information Retrieval

ApproachBetter system, better performance scores?Similar system, same performance scores?Worse system, lower performance scores?

Page 6: On the Evaluation of Snippet Selection for Information Retrieval

Better systemLast year’s best performing system contains a bug

our %stopwords = qw( for my $w … { ‘s next if exists $stopwords{$w}; a … … } zwischen);

Page 7: On the Evaluation of Snippet Selection for Information Retrieval

Better systemSystem Precision Recall

With bug 0.2018 0.2561

Without bug 0.1328 0.1685

Not filtering stop words 0.1087 0.1380

Page 8: On the Evaluation of Snippet Selection for Information Retrieval

Similar systemGeneral idea

Almost identical snippets should have almost the same precision and recall

ExperimentRemove the last word for every snippet in the output of

last year’s best performing system

Page 9: On the Evaluation of Snippet Selection for Information Retrieval

Similar system

System Precision Recall

Original 0.2018 0.2561

Last word removed 0.0597 0.0758

Page 10: On the Evaluation of Snippet Selection for Information Retrieval

Worse systemDelivering snippets based on occurrence

1st snippet = 1st paragraph of 1st document2nd snippet = 2nd paragraph of 2nd document...

No difference with search engines, except that documents are split up in snippets

Page 11: On the Evaluation of Snippet Selection for Information Retrieval

Worse systemOriginal First occurrence

Topic Precision Recall Precision Recall

17 0.0389 0.0436 0.0389 0.0436

18 0.1590 0.6190 0.1590 0.6190

21 0.4083 0.6513 0.4083 0.6513

23 0.1140 0.1057 0.1140 0.1057

25 0.4240 0.4041 0.4240 0.4041

26 0.0780 0.1405 0.0780 0.1405

Avg. 0.2018 0.2561 0.0536 0.0680

Page 12: On the Evaluation of Snippet Selection for Information Retrieval

AnalysisPool of snippetsImplementationAssessments

Page 13: On the Evaluation of Snippet Selection for Information Retrieval

ConclusionEvaluation method is not sufficient:

Biased towards participating systemsCorrectness of a snippet is too strict

Recommendations:N-grams (e.g. ROUGE)Multiple assessors per topic

Page 14: On the Evaluation of Snippet Selection for Information Retrieval

Questions