Exploring Sentence Level Query Expansion in Language Modeling Based Information Retrieval

Exploring Sentence Level Query Expansion in Language Modeling

Based Information RetrievalDebasis Ganguly Johannes Leveling Gareth Jones

Outline

Standard blind relevance feedback

Sentence based query expansion

Does it fit into LM?

Evaluation on FIRE Bengali and English ad-hoc topics

Comparison with term based query expansion

Conclusions

Standard Blind Relevance Feedback (BRF)

Assume top R documents from initial retrieval as relevant.

Extract feedback terms from these documents:

Choose terms which occur in most number of pseudo-relevant documents (e.g. VSM)

Choose terms with highest value of RSV scores (e.g. BM25)

Choose terms with highest value of LM scores (e.g. LM)

Expand query with and final retrieval

What standard BRF assumes (wrongly)

The whole document is relevant

All R feedback documents are equally relevant

Query

t1

t2

Ideal scenario

The whole document is relevant.

Query

t1

t2

Restrict the choice of feedback terms to the relevant segments of the documents

Can we get closer to the ideal?

Extract sentences most similar to the query assuming these sentences constitute relevant text chunks.

Impossible to accurately know the relevant segments

Query

Sentence selection using rank

Make the number of sentences to add for a document proportional to its rank

Not all documents are equally relevant

Query

In short

Documents are often composed of a few main topics and a series of short, sometimes densely discussed subtopics.

Feedback terms chosen from a whole document might introduce a topic shift.

Good expansion terms might exist in a particular subtopic.

Terms with close proximity to the query terms might be useful for feedback.

Does this fit into LM?

Noisy channel

D1

D2

Dn

Query

Add a part of D1 to Q

Add a part of D2 to Q

As a result Q starts looking like D1 and D2 which increases the likelihood of generation Qexp

Qexp

Tools

FIRE collection comprises of newspaper articles from different genres like sports, business etc. in several Indian languages

Morphadorner package used for sentence demarcation

Stopword listsStandard SMART stopword list for English

Default stopword list provided by FIRE organizers for Bengali

StemmersRule based stemmer for Bengali

Porter’s stemmer for English

LM implemented in SMART used for indexing and retrieval

Setup

Baseline is standard BRF using terms occurring in most number of relevant documents

Two variants of sentence based expansion tried out

BRFcns: constant number of sentences for each document

BRFvns: variable number of sentences (proportional to retrieval rank)

Parameter Settings

R: # of documents assumed to be relevant,

varied in [10,40]

T: # of terms to add

varied in [10,40]

m: # of sentences to add from the top ranked document

varied in [2,10]

Best MAPs

Topics

R T MAP

EN-2008 10 10 0.5682

EN-2010 10 30 0.4953

BN-2008 20 40 0.3885

BN-2010 10 30 0.4537

BRF

Topics

R m MAP

EN-2008 30 5 0.5964

EN-2010 20 4 0.5032

BN-2008 20 4 0.4226

BN-2010 10 5 0.4467

BRFcns

Topics

R m MAP

EN-2008 30 10 0.6015

EN-2010 20 8 0.5102

BN-2008 30 10 0.4302

BN-2010 10 8 0.4581

BRFvns

Query drift analysis

As a result of adding too many terms the original query might be completely off-the-mark from the original information need

Measured with impact of changes in precision values per query

An easy query is one for which P@20 for initial retrieval is good

Queries categorized into groups by initial retrieval P@20

A good feedback algorithm would improve many (ideally bad) queries and hurt performance of a few (ideally good) queries

Query drift analysis

BRF

BRFcns

BRFvns

Comparison to True Relevance Feedback

The best possible average precision in IR is obtained by True Relevance Feedback

A BRF method should be as close as possible to this oracle.

Topic |TRF| o(|TBRF|)

o(|Tvns |)

EN08 937 743 912

EN10 433 407 432

BN08

979 744 955

BN10

991 728 933

Conclusions

The new approach improves over standard BRF by

using sentences instead of whole documentsdistinguishing between the amount of pseudo-relevance

Significantly improves MAP on four ad-hoc topic sets as compared to standard BRF for two languages

Is able to add more true relevant terms as compared to standard BRF

Queries?