IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
http://ir.ii.uam.es
Explicit Relevance Models in Intent-Aware IR Diversification
35th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
SaΓΊl Vargas, Pablo Castells and David Vallet Universidad AutΓ³noma de Madrid
http://ir.ii.uam.es
Portland, OR, 13 August 2012
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Outline
Context: IR diversification formulation and algorithms
Proposed approach: relevance-based reformulation
of diversification algorithms
Experiments
Adjustable tolerance to redundancy
Conclusion
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Brief recap
Appliance
Golf
Chemical element
Nutrition / Health
Mining / Metallurgy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Brief recap
Appliance
Golf
Chemical element
Nutrition / Health
Mining / Metallurgy
Diversity as a means to address uncertainty in user queries
β The same query may have different intents or aspects in the information need underneath
Revision of document relevance independence
β Marginal utility of additional relevant documents decreases fast
Trade diminishing marginal utility for increased intent coverage
β Thus maximize the number of users who obtain at least some useful document
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversification β Problem statement
Given a query π on a collection
Find π β of given size maximizing:
π some π β π relevant π
Agrawal 2009, Santos 2010, Chen 2006, β¦
π π , πΊ π β π π is relevant β§ no πβ² β π is relevant π
Greedy approx
NP-hard
arg maxπβπ βπ
π π , πΊ π
π Diversified ranking
π β π Baseline ranking π(π|π)
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π π π π π π π π 1β π π πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π π π π π, π 1β π πβ² π, π
πβ²βππ§
Explicit query aspects
Explicit query aspects
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
Query aspect coverage
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
Document βrelevanceβ for query aspect
State of the art aspect-based approaches
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
Redundancy penalization
State of the art aspect-based approaches
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
Mixture with baseline
State of the art aspect-based approaches
π Degree of diversification
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Instantiations of objective function
IA-Select scheme (Agrawal 2009)
π π, π π = π π§ π π π§ π π π π 1β π π§ πβ² π π π
πβ²βππ§
xQuAD scheme (Santos 2010)
π π, π π = 1 β π π π π + π π π,Β¬ π π
= 1 β π π π π + π π π§ π π π π, π§ 1β π πβ² π, π§
πβ²βππ§
Probability to observe documents
π π, π π β π π is π«ππ₯ππ―ππ§π β§ no πβ² β π is π«ππ₯ππ―ππ§π π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Relevance-based instantiation of objective function
IA-Select scheme β relevance-based
π π, π π = π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
xQuAD scheme β relevance-based
π π, π π = 1 β π π ππ π + π π ππ , Β¬ ππ π
= 1 β π π π π, π + π π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
Probability of relevance
Our proposal
π π, π π β π π is π«ππ₯ππ―ππ§π β§ no πβ² β π is π«ππ₯ππ―ππ§π π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Relevance-based instantiation of objective function
IA-Select scheme β relevance-based
π π, π π = π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
xQuAD scheme β relevance-based
π π, π π = 1 β π π ππ π + π π ππ , Β¬ ππ π
= 1 β π π π π, π + π π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
More literal interpretation of initial problem statement
π π, π π β π π is π«ππ₯ππ―ππ§π β§ no πβ² β π is π«ππ₯ππ―ππ§π π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
IR diversity β Relevance-based instantiation of objective function
IA-Select scheme β relevance-based
π π, π π = π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
xQuAD scheme β relevance-based
π π, π π = 1 β π π ππ π + π π ππ , Β¬ ππ π
= 1 β π π π π, π + π π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
Equivalent for π = 1
π π, π π β π π is relevant β§ no πβ² β π is relevant π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance distirbution vs. document distribution
π 0
1
π π π, π, π§π
= E nr relevant docs β₯ 1
1 β π π π π, π + π π π§ π π π π, π, π§ 1β π π πβ², π, π§
πβ²βππ§
π π π, π§π
= 1
Different potential behavior E.g. stronger redundancy penalization
π π π,Β· vs. π π Β· β The difference does matter (in this context)
Potential rank equivalences do not apply here
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate π π π, π, π§
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Aspect-based relevance model
Estimate π π π , π, π
Cannot use odds, logs, constant removal⦠or any other rank-preserving step
(we need the specific values)
π π π, π
π π π, π, π§
π π§ π
π π§ π
π π π
π(π§)
Normalized baseline IR system score (as in e.g. Bache 2009)
Estimate π π§ π or π π§ π depending
on available observations:
β’ π§ as document classes (e.g. ODP)
β’ π§ as subqueries (e.g. reformulations)
Then derive the other two parameters
Positional relevance π π rank π, π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Positional relevance distribution estimate
π π π , π βΌ π π rank π, π = π π π
1E-05
1E-04
1E-03
1E-02
1E-01
1E+00
0 20 40 60 80 100 120 140 160 180 200
p(r
|k)
k
pLSA
Lemur
AOL
Click log statistics
Precision estimates
π π π
π
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate π π π, π, π§
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments
Collection: ClueWeb09 category B (50M documents)
Query/subtopic set: TREC 2009/10 diversity task (100 queries)
Baseline ranking: Lemur Indri search engine (Web service) Diversified top n : 100
Query aspect space:
a) ODP categories level 4 (~7K categories)
b) TREC subtopics (oracle for reference)
Specific parameter estimates:
π π§ π Uniform
π π§ π
π π π
Search diversity
ODP categories: semi-supervised text classification by Textwise
TREC subtopics: Indri search system run on π§ as if a query
i. P@k estimates with TREC relevance judgments (2-fold 2009/10 cross validation)
ii. Click statistics from AOL log (thus different IR system)
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments β Search diversity on TREC
ERR
-IA
Based on π π π, π§
Based on π π π, π, π§
ERR
-IA
Ξ»
ODP categories TREC subtopics
Ξ»
xQuAD scheme
π π π from qrels
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments β Search diversity on TREC
-nDCG@20 ERR-IA@20 nDCGIA@20 S-recall@20
Lemur - 0.2587 0.1630 0.2396 0.4636
a) O
DP
ca
tego
rie
s IA-Select - 0.2651 0.1681 0.2423 0.4483
xQuAD 0.9 0.2675 0.1656 0.2451 0.4864
Rel-based xQuAD
i. Qrels 0.1 0.2858β³β² 0.1828β³β² 0.2655β³β² 0.4898β²β³
ii. Clicks 0.4 0.2841β²β³ 0.1831β³β³ 0.2605β³β² 0.4830β²β½
b)
TR
EC
sub
top
ics IA-Select - 0.3541 0.2346 0.3213 0.5787
xQuAD 1.0 0.3445 0.2241 0.3127 0.5704
Rel-based xQuAD
i. Qrels 1.0 0.3543β³β³ 0.2349β³β³ 0.3192β½β³ 0.5782β½β³
ii. Clicks 1.0 0.3512β½β³ 0.2320β½β³ 0.3166β½β³ 0.5748β½β³
βinformallyβ maximizing ERR-IA by 0.1 steps for each diversifier
Best value in bold green
β² βΌ π < 0.05
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments
Dataset 1: MovieLens 1M
Dataset 2: Last.fm crawl
Adaptation of IR diversity paradigm
(Vargas, Castells & Vallet SIGIR 2011)
Baseline rankings: Diversified top n: 100
Specific parameter estimates:
π π§ π Uniform
π π§ π Uniform on π (based on binary aspect/item association)
π π π P@k estimates with 2-fold cross-validation on test users
Recommendation diversity
Queries users Documents items (movies, music artists) Subtopics item features (genres, tags) Relevance judgments test ratings from data split
Collection: 6K users, 4K movies, 1M ratings
Subtopic set: 10 movie genres
Collection: 1K users, 175K artists, 20M playcounts
Subtopic set: 120K social tags on artists by Last.fm users
a) pLSA
b) Popularity-based recommendation
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Experiments β Recommendation diversity on MovieLens and Last.fm
Ξ»
MovieLens 1M
ERR
-IA
Last.fm
Ξ»
pLS
A r
eco
mm
en
der
R
eco
mm
end
atio
n
by
item
po
pu
lari
ty
ERR
-IA
Based on π π π, π§
Based on π π π, π, π§
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Relevance-based greedy diversification
Relevance-based reformulation of diversification algorithm
1. Need to estimate π π π, π, π§
2. Does it work? Test empirically
3. Further development: parameterized tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Adjustable tolerance to redundancy
Generalization of relevance-based diversification scheme
Formally support adjustable redundancy penalization
Approach: generalize relevance to browsing model
π π, π π = 1 β Ξ» π π π, π + Ξ» π ππ , Β¬ πππππ π = β―
= 1 β Ξ» π π π, π + Ξ» π π§ π π π π, π§, π 1β π π πβ², π§, π π ππππ π
πβ²βππ
Adjustable redundancy tolerance parameter π π π‘ππ π β [0,1]
β High π π π‘ππ π for aggresive penalization, low for e.g. high-recall searches
β In this view, original formulations would implicitly assume π π π‘ππ π = 1,
i.e. a single relevant document is sought
Tolerance to redundancy
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Empirical observation: π π π‘ππ π vs. in -nDCG
Adjustable tolerance to redundancy
ππ π‘πππ
ππ π‘πππ
Search task Lemur on TREC / Subtopics
Recommendation task pLSA on MovieLens / Genres
0 0 1 1
1 1
best -nDCG value of column
worst -nDCG value of column For each
IRGIR Group @ UAM
Explicit Relevance Models in Intent-Aware IR Diversification 35th ACM SIGIR Conference on Research and Development in Information Retrieval (SIGIR 2012)
Portland, OR, 13 August 2012
Conclusion
Alternative, relevance-based formulation of greedy aspect-based diversification
β Unifies two previous aspect-based algorithms
β More literal expression of formal problem statement (and metrics?)
π π π, π, π§ vs. π π π, π§
β Literal value estimates needed (rather than rank-equivalent approximations)
β Estimate based on positional relevance (relevance or click data needed)
Seems to perform well empirically
β Light requirements on relevance or click data for training positional relevance
β Improvement trend, but needs to be tested under further optimizations
Formal support for redundancy tolerance adjustment