Transcript
Page 1: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

http://ir.ii.uam.es

Explicit Relevance Models in Intent-Aware IR Diversification

35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

SaΓΊl Vargas, Pablo Castells and David Vallet Universidad AutΓ³noma de Madrid

http://ir.ii.uam.es

Portland, OR, 13 August 2012

Page 2: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Outline

Context: IR diversification formulation and algorithms

Proposed approach: relevance-based reformulation

of diversification algorithms

Experiments

Adjustable tolerance to redundancy

Conclusion

Page 3: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Brief recap

Appliance

Golf

Chemical element

Nutrition / Health

Mining / Metallurgy

Page 4: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Brief recap

Appliance

Golf

Chemical element

Nutrition / Health

Mining / Metallurgy

Diversity as a means to address uncertainty in user queries

– The same query may have different intents or aspects in the information need underneath

Revision of document relevance independence

– Marginal utility of additional relevant documents decreases fast

Trade diminishing marginal utility for increased intent coverage

– Thus maximize the number of users who obtain at least some useful document

Page 5: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversification – Problem statement

Given a query π‘ž on a collection

Find 𝑆 βŠ‚ of given size maximizing:

𝑝 some 𝑑 ∈ 𝑆 relevant π‘ž

Agrawal 2009, Santos 2010, Chen 2006, …

𝝋 𝒅, 𝑺 𝒒 ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant π‘ž

Greedy approx

NP-hard

arg maxπ‘‘βˆˆπ‘…βˆ’π‘†

𝝋 𝒅, 𝑺 𝒒

𝑆 Diversified ranking

𝑅 βˆ’ 𝑆 Baseline ranking 𝑝(𝑑|π‘ž)

Page 6: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝒛 π‘ž 𝑝 𝒛 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝒛 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝒛 π‘ž 𝑝 𝑑 π‘ž, 𝒛 1βˆ’ 𝑝 𝑑′ π‘ž, 𝒛

π‘‘β€²βˆˆπ‘†π‘§

Explicit query aspects

Explicit query aspects

State of the art aspect-based approaches

Page 7: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Query aspect coverage

State of the art aspect-based approaches

Page 8: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Document β€œrelevance” for query aspect

State of the art aspect-based approaches

Page 9: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

Redundancy penalization

State of the art aspect-based approaches

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Page 10: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Mixture with baseline

State of the art aspect-based approaches

πœ† Degree of diversification

Page 11: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Instantiations of objective function

IA-Select scheme (Agrawal 2009)

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝑧 𝑑 𝑝 𝑑 π‘ž 1βˆ’ 𝑝 𝑧 𝑑′ 𝑝 𝑑 π‘ž

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme (Santos 2010)

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑑,Β¬ 𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝑑 π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝑑 π‘ž, 𝑧 1βˆ’ 𝑝 𝑑′ π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Probability to observe documents

πœ‘ 𝑑, 𝑆 π‘ž ∝ 𝑝 𝑑 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 π‘ž

Page 12: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 𝒓 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 𝒓 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝒓𝑑 π‘ž + πœ† 𝑝 𝒓𝑑 , Β¬ 𝒓𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 𝒓 𝑑, π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 𝒓 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 𝒓 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Probability of relevance

Our proposal

πœ‘ 𝑑, 𝑆 π‘ž ∝ 𝑝 𝑑 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 π‘ž

Page 13: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 𝒓𝑑 π‘ž + πœ† 𝑝 𝒓𝑑 , Β¬ 𝒓𝑆 π‘ž

= 1 βˆ’ πœ† 𝑝 π‘Ÿ 𝑑, π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

More literal interpretation of initial problem statement

πœ‘ 𝑑, 𝑆 π‘ž ∝ 𝑝 𝑑 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 ∧ no 𝑑′ ∈ 𝑆 is 𝐫𝐞π₯𝐞𝐯𝐚𝐧𝐭 π‘ž

Page 14: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

IR diversity – Relevance-based instantiation of objective function

IA-Select scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

xQuAD scheme – relevance-based

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ πœ† 𝑝 π‘Ÿπ‘‘ π‘ž + πœ† 𝑝 π‘Ÿπ‘‘ , Β¬ π‘Ÿπ‘† π‘ž

= 1 βˆ’ πœ† 𝑝 π‘Ÿ 𝑑, π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

Equivalent for πœ† = 1

πœ‘ 𝑑, 𝑆 π‘ž ∝ 𝑝 𝑑 is relevant ∧ no 𝑑′ ∈ 𝑆 is relevant π‘ž

Page 15: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance distirbution vs. document distribution

𝑑 0

1

𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧𝑑

= E nr relevant docs β‰₯ 1

1 βˆ’ πœ† 𝑝 π‘Ÿ 𝑑, π‘ž + πœ† 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 1βˆ’ 𝑝 π‘Ÿ 𝑑′, π‘ž, 𝑧

π‘‘β€²βˆˆπ‘†π‘§

𝑝 𝑑 π‘ž, 𝑧𝑑

= 1

Different potential behavior E.g. stronger redundancy penalization

𝑝 π‘Ÿ 𝑑,Β· vs. 𝑝 𝑑 Β· – The difference does matter (in this context)

Potential rank equivalences do not apply here

Page 16: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 17: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Aspect-based relevance model

Estimate 𝒑 𝒓 𝒅, 𝒒, 𝒛

Cannot use odds, logs, constant removal… or any other rank-preserving step

(we need the specific values)

𝑝 π‘Ÿ 𝑑, π‘ž

𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

𝑝 𝑧 𝑑

𝑝 𝑧 π‘ž

𝑝 𝑑 π‘ž

𝑝(𝑧)

Normalized baseline IR system score (as in e.g. Bache 2009)

Estimate 𝑝 𝑧 𝑑 or 𝑝 𝑧 π‘ž depending

on available observations:

β€’ 𝑧 as document classes (e.g. ODP)

β€’ 𝑧 as subqueries (e.g. reformulations)

Then derive the other two parameters

Positional relevance 𝑝 π‘Ÿ rank 𝑑, π‘ž

Page 18: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Positional relevance distribution estimate

𝒑 𝒓 𝒅, 𝒒 ∼ 𝑝 π‘Ÿ rank 𝑑, π‘ž = 𝒑 𝒓 π’Œ

1E-05

1E-04

1E-03

1E-02

1E-01

1E+00

0 20 40 60 80 100 120 140 160 180 200

p(r

|k)

k

pLSA

Lemur

AOL

Click log statistics

Precision estimates

𝑝 π‘Ÿ π‘˜

π‘˜

Page 19: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 20: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments

Collection: ClueWeb09 category B (50M documents)

Query/subtopic set: TREC 2009/10 diversity task (100 queries)

Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100

Query aspect space:

a) ODP categories level 4 (~7K categories)

b) TREC subtopics (oracle for reference)

Specific parameter estimates:

𝑝 𝑧 π‘ž Uniform

𝑝 𝑧 𝑑

𝑝 π‘Ÿ π‘˜

Search diversity

ODP categories: semi-supervised text classification by Textwise

TREC subtopics: Indri search system run on 𝑧 as if a query

i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)

ii. Click statistics from AOL log (thus different IR system)

Page 21: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Search diversity on TREC

ERR

-IA

Based on 𝑝 𝑑 π‘ž, 𝑧

Based on 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

ERR

-IA

Ξ»

ODP categories TREC subtopics

Ξ»

xQuAD scheme

𝑝 π‘Ÿ π‘˜ from qrels

Page 22: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Search diversity on TREC

-nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20

Lemur - 0.2587 0.1630 0.2396 0.4636

a) O

DP

ca

tego

rie

s IA-Select - 0.2651 0.1681 0.2423 0.4483

xQuAD 0.9 0.2675 0.1656 0.2451 0.4864

Rel-based xQuAD

i. Qrels 0.1 0.2858β–³β–² 0.1828β–³β–² 0.2655β–³β–² 0.4898β–²β–³

ii. Clicks 0.4 0.2841β–²β–³ 0.1831β–³β–³ 0.2605β–³β–² 0.4830β–²β–½

b)

TR

EC

sub

top

ics IA-Select - 0.3541 0.2346 0.3213 0.5787

xQuAD 1.0 0.3445 0.2241 0.3127 0.5704

Rel-based xQuAD

i. Qrels 1.0 0.3543β–³β–³ 0.2349β–³β–³ 0.3192β–½β–³ 0.5782β–½β–³

ii. Clicks 1.0 0.3512β–½β–³ 0.2320β–½β–³ 0.3166β–½β–³ 0.5748β–½β–³

β€œinformally” maximizing ERR-IA by 0.1 steps for each diversifier

Best value in bold green

β–² β–Ό 𝑝 < 0.05

Page 23: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments

Dataset 1: MovieLens 1M

Dataset 2: Last.fm crawl

Adaptation of IR diversity paradigm

(Vargas, Castells & Vallet SIGIR 2011)

Baseline rankings: Diversified top n: 100

Specific parameter estimates:

𝑝 𝑧 π‘ž Uniform

𝑝 𝑧 𝑑 Uniform on 𝑑 (based on binary aspect/item association)

𝑝 π‘Ÿ π‘˜ P@k estimates with 2-fold cross-validation on test users

Recommendation diversity

Queries users Documents items (movies, music artists) Subtopics item features (genres, tags) Relevance judgments test ratings from data split

Collection: 6K users, 4K movies, 1M ratings

Subtopic set: 10 movie genres

Collection: 1K users, 175K artists, 20M playcounts

Subtopic set: 120K social tags on artists by Last.fm users

a) pLSA

b) Popularity-based recommendation

Page 24: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Experiments – Recommendation diversity on MovieLens and Last.fm

Ξ»

MovieLens 1M

ERR

-IA

Last.fm

Ξ»

pLS

A r

eco

mm

en

der

R

eco

mm

end

atio

n

by

item

po

pu

lari

ty

ERR

-IA

Based on 𝑝 𝑑 π‘ž, 𝑧

Based on 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

Page 25: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Relevance-based greedy diversification

Relevance-based reformulation of diversification algorithm

1. Need to estimate 𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧

2. Does it work? Test empirically

3. Further development: parameterized tolerance to redundancy

Page 26: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Adjustable tolerance to redundancy

Generalization of relevance-based diversification scheme

Formally support adjustable redundancy penalization

Approach: generalize relevance to browsing model

πœ‘ 𝑑, 𝑆 π‘ž = 1 βˆ’ Ξ» 𝑝 π‘Ÿ 𝑑, π‘ž + Ξ» 𝑝 π‘Ÿπ‘‘ , Β¬ 𝒔𝒕𝒐𝒑𝑆 π‘ž = β‹―

= 1 βˆ’ Ξ» 𝑝 π‘Ÿ 𝑑, π‘ž + Ξ» 𝑝 𝑧 π‘ž 𝑝 π‘Ÿ 𝑑, 𝑧, π‘ž 1βˆ’ 𝑝 π‘Ÿ 𝑑′, 𝑧, π‘ž 𝒑 𝒔𝒕𝒐𝒑 𝒓

π‘‘β€²βˆˆπ‘†π‘

Adjustable redundancy tolerance parameter 𝑝 π‘ π‘‘π‘œπ‘ π‘Ÿ ∈ [0,1]

– High 𝑝 π‘ π‘‘π‘œπ‘ π‘Ÿ for aggresive penalization, low for e.g. high-recall searches

– In this view, original formulations would implicitly assume 𝑝 π‘ π‘‘π‘œπ‘ π‘Ÿ = 1,

i.e. a single relevant document is sought

Tolerance to redundancy

Page 27: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Empirical observation: 𝑝 π‘ π‘‘π‘œπ‘ π‘Ÿ vs. in -nDCG

Adjustable tolerance to redundancy

π‘π‘ π‘‘π‘œπ‘π‘Ÿ

π‘π‘ π‘‘π‘œπ‘π‘Ÿ

Search task Lemur on TREC / Subtopics

Recommendation task pLSA on MovieLens / Genres

0 0 1 1

1 1

best -nDCG value of column

worst -nDCG value of column For each

Page 28: SIGIR 2012 - Explicit Relevance Models in Intent-Oriented  Information Retrieval Diversification

IRGIR Group @ UAM

Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)

Portland, OR, 13 August 2012

Conclusion

Alternative, relevance-based formulation of greedy aspect-based diversification

– Unifies two previous aspect-based algorithms

– More literal expression of formal problem statement (and metrics?)

𝑝 π‘Ÿ 𝑑, π‘ž, 𝑧 vs. 𝑝 𝑑 π‘ž, 𝑧

– Literal value estimates needed (rather than rank-equivalent approximations)

– Estimate based on positional relevance (relevance or click data needed)

Seems to perform well empirically

– Light requirements on relevance or click data for training positional relevance

– Improvement trend, but needs to be tested under further optimizations

Formal support for redundancy tolerance adjustment


Recommended