46
Evalua&ng Mul&Query Sessions Evangelos Kanoulas * , Ben Cartere9e + , Paul Clough * , Mark Sanderson $ * University of Sheffield, UK + University of Delaware, USA $ RMIT University, Australia

Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Embed Size (px)

DESCRIPTION

 

Citation preview

Page 1: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Evalua&ng Mul&-­‐Query Sessions

Evangelos Kanoulas*, Ben Cartere9e+, Paul Clough*, Mark Sanderson$

* University of Sheffield, UK + University of Delaware, USA

$ RMIT University, Australia

Page 2: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Why sessions?

•  Current evalua&on framework – Assesses the effec&veness of systems over one-­‐shot queries

•  Users reformulate their ini&al query

•  S&ll fine if … – op&mizing system for one-­‐shot queries led to op&mal performance over an en&re session

Page 3: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

When was the DuPont Science Essay Contest created?

Ini&al Query : DuPont Science Essay Contest

Reformula&on : When was the DSEC created?

•  e.g. retrieval systems should accumulate informa&on along a session

Why sessions?

Page 4: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Extend the evalua&on framework

From one query evalua&on

To mul&-­‐query sessions evalua&on

Page 5: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Construct appropriate test collec&ons

Rethink of evalua&on measures

Page 6: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

What is the appropriate collec&on?

Page 7: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Test collec&ons we built…

•  Text REtrieval Conference (TREC) – sponsored by NIST – many compe&&ons; among them

Session Track 2010, 2011, …

Page 8: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Test collec&on we built in 2010…

•  Corpus: ClueWeb09 – 1 billion web pages (5TB compressed)

•  Queries and Reformula&ons – 150 query pairs: ini$al query, reformula$on – 3 types of reformula&ons (not disclosed to par&cipants) •  Specifica&on (52 query pairs) •  Generaliza&on (48 query pairs) •  Drifing / Parallel Reformula&on (50 query pairs)

Page 9: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Some Cri&cism…

•  Ar&ficial reformula&ons •  Short reformula&ons –  just 2 queries

•  No other user interac&on data –  clicks, dwell &mes, etc.

•  Reformula&ons are sta&c (do not depend on the SE’s response) –  The collec&on does not allow early abandonment –  The reformula&on itself does not change up on SE’s response

Page 10: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Test Collec&on in 2011

•  Corpus: ClueWeb09 –  1 billion web pages (5TB compressed)

•  Queries and Reformula&ons –  Real users searching ClueWeb09 –  76 sessions of 2 up 10 reformula&ons

•  Other interac&ons –  Clicks, dwell &mes, mouse movements, relevance judgments

•  But… reformula&ons are s&ll sta&c

Page 11: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

•  A set of informa&on needs What do we know about black powder ammunition?

– A sta&c sequence of m queries

Basic test collec&on

Ini&al Query :

1st Reformula&on :

2nd Reformula&on : … (m-­‐1)th Reformula&on :

black powder ammunition

black powder wiki

gun powder wiki …

history of gunpowder

Page 12: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Experiment

black powder wiki

gun powder wiki

black powder ammunition

1

2

3

4

5

6

7

8

9

10

Page 13: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Evalua&on over a single ranked list

black powder wiki

gun powder wiki

black powder ammunition

1

2

3

4

5

6

7

8

9

10

Experiment

Page 14: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Construct appropriate test collec&ons

Rethink of evalua&on measures

Page 15: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

What is a good system?

Page 16: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

How can we measure “goodness”?

Page 17: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Measuring “goodness”

The user steps down a ranked list of documents and observes each one of them un&l a decision point and either

a)  abandons the search, or

b)  reformulates

While stepping down or sideways, the user accumulates u&lity

Page 18: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

What are the challenges?

Page 19: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Evalua&on over a single ranked list

black powder wiki

gun powder wiki

black powder ammunition

1

2

3

4

5

6

7

8

9

10

Evalua&on over mul&ple ranked lists

Page 20: Evangelos Kanoulas — Advances in Information Retrieval Evaluation
Page 21: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Exis&ng measures

•  Session DCG [Järvelin et al ECIR 2008] The user steps down the ranked list un&l rank k and reformulates [Determinis&c; no early abandonment]

•  Expected session u&lity [Yang and Lad ICTIR 2009] The user steps down a ranked list of documents un&l a decision point and reformulates [Stochas&c; no early abandonment]

Page 22: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Evalua&ng over paths

Op&mize Model-­‐free measures

Integrate out Model-­‐based measures

Page 23: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Evalua&on measures

•  Evalua&ng over paths

•  Model – free measures

•  Model – based measures

Page 24: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model-­‐free measures

The user is an oracle that knows when to reformulate

Ω(k,j) : paths of length k, ending at reformula&on j

Count number of relevant docs on the op&mal path ω of length k ending at query j

Page 25: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model-­‐free measures

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

Define :

Precision@k,j Recall@k,j Precision@recall,j

ω(10,3) : length 10, ending at 3rd query

Page 26: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model-­‐free measures

recall

reformulation

precision

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

Page 27: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model-­‐free measures

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ranking 1

recall

precision

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ranking 2

recall

precision

0.0 0.2 0.4 0.6 0.8 1.0

0.0

0.2

0.4

0.6

0.8

1.0

ranking 3

recall

precision

Page 28: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model-­‐free measures

recall

reformulation

precision

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

Page 29: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Evalua&on measures

•  Evalua&ng over paths

•  Model – free measures

•  Model – based measures

Page 30: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model-­‐based measures

Probabilis&c space of users following different paths

•  Ω is the space of all paths •  P(ω) is the prob of a user following a path ω in Ω •  Mω is a measure over a path ω

esM =

ω∈Ω

P (ω)Mω

[Yang and Lad ICTIR 2009]

Page 31: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model Browsing Behavior

Posi&on-­‐based models

The chance of observing a document depends on the posi&on of the document in the ranked list.

black powder ammunition

1

2

3

4

5

6

7

8

9

10

Page 32: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Rank Biased Precision [Moffat and Zobel, TOIS08]

Query

Stop

View Next Item

black powder ammunition

1

2

3

4

5

6

7

8

9

10

Page 33: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model Browsing Behavior

Cascade-­‐based models

black powder ammunition

1

2

3

4

5

6

7

8

9

10

The chance of observing a document depends on the posi&on of the document in the ranked list and the relevance of documents/

snippets already viewed.

Page 34: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Expected Reciprocal Rank [Chapelle et al CIKM09]

Query

Stop

Relevant?

View Next Item

no somewhat highly

black powder ammunition

1

2

3

4

5

6

7

8

9

10

Page 35: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

DEBU(r ) = P(Er )⋅ P(C | Rr )

EBU = DEBU(r )r =1

n

∑ ⋅ Rr

Expected Browsing U&lity [Yilmaz et al CIKM10]

Page 36: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Probability of a path

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

(1)

(2)

Joint probability of

abandoning at reform 2

reformula&ng at rank 3 of first query

Page 37: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Probability of a path

Probability of abandoning at reform 2

X Probability of

reformula&ng at rank 3 of first query

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

(1)

(2)

Page 38: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

Probability of abandoning the session at reformula&on i

Geometric w/ parameter preform

(1)

Page 39: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

Truncated Geometric w/ parameter preform

Probability of abandoning the session at reformula&on i

(1)

Page 40: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Q1 Q2 Q3

N R R

N R R

N R R

N R R

N R R

N N R

N N R

N N R

N N R

N N R

… … …

Truncated Geometric w/ parameter preform

Geo

metric w/ parameter p

down

Probability of reformula&ng

at rank j (of 1 to i-­‐1 reform)

(2)

Page 41: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Model-­‐based measures

Probabilis&c space of users following different paths

•  Ω is the space of all paths •  P(ω) is the prob of a user following a path ω in Ω •  Mω is a measure over a path ω

esM =

ω∈Ω

P (ω)Mω

Page 42: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Evalua&on measures

•  Evalua&ng over paths

•  Model – free measures

•  Model – based measures

Page 43: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Evalua&on measures

•  Proper&es

– How do the new measures correlate with previously introduced?

– Do they behave as expected, i.e. do they reward early retrieval of relevant documents?

Page 44: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Correla&ons

0.10 0.15 0.20

0.04

0.06

0.08

nsDCG vs. esAP

nsDCG

esAP

Kendall''s tau : 0.5247

0.10 0.15 0.20

0.10

0.15

0.20

nsDCG vs. esNDCG

nsDCG

esNDCG

Kendall''s tau : 0.7972

•  TREC 2010 Session track

Page 45: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Reward early retrieval

esMPC@20 esMRC@20 esMAP

“good”-­‐>”good” 0.378 0.036 0.122

“good”-­‐>”bad” 0.363 0.034 0.112

“bad”-­‐>”good” 0.271 0.023 0.083

“bad”-­‐>”bad” 0.254 0.022 0.073

•  TREC9 Query track – 50 topics and 23 query sets (formula&ons)

•  Simulate sessions

Page 46: Evangelos Kanoulas — Advances in Information Retrieval Evaluation

Conclusions

•  Extend the evalua&on framework to sessions –  Built the appropriate test collec&on –  Rethink of evalua&on measures

•  Basic test collec&on •  Model-­‐free and model-­‐based measures

•  Did not talk about: – Duplicate documents –  Efficient computa&on of the measures