Upload
others
View
2
Download
0
Embed Size (px)
Citation preview
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Recommender Systems EvaluationBeyond Accuracy
ACM Latin American School on Recommender Systems
Pablo CastellsUniversidad Autónoma de Madrid
http://ir.ii.uam.es/castells
Fortaleza, Brazil, October 10, 2019
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Outline
1. Motivation: beyond relevance
2. Measuring novelty and diversity
3. Enhancing novelty and diversity
4. Biases in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Outline
1. Motivation: beyond relevance
2. Measuring novelty and diversity
3. Enhancing novelty and diversity
4. Biases in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Motivation
What is the purpose of recommendation?
Satisfying users …by making suggestions they like
If we recommend things that a user likes,
then the user will be satisfied
Ergo…
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Motivation
Do you like this?
Book Tourist attraction Music albumMovie
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Motivation
Would you find it useful to recommend it?
Probably notEverybody knows those already
Movie Book Tourist attraction Music album
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Motivation
Would you find it useful to recommend this?
Maybe, provided they are liked…
Movie Book Tourist attraction Music album
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Motivation
Would you find it useful to recommend this?
Not obvious or widely known
…but too much of the same genre?
Sci-fi Sci-fi Sci-fi Sci-fi Sci-fi
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Motivation
Would you find it useful to recommend this?
Sci-fi AnimationComedy Adventure Documentary
Seems better?
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Beyond accuracy…
How to improve?
Define
Understand
Measure
…then try to improve
NoveltyDiversity
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Definition
How different recommendations are
from “something else”
E.g. user knowledge or experience
Novelty
(Vargas & Castells RecSys 2011, Castells et al. Handbook 2015)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Definition
How different recommendations are
to each other
How novel each item is to the other
recommended items
Diversity
(Vargas & Castells RecSys 2011, Castells et al. Handbook 2015)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Why diverse and novel recommendations
For the sake of it: direct user satisfaction
Natural variety-seeking drive in human behavior
– Within a recommendation and over time
– Desire for the unfamiliar, alternation among the familiar
– Ideal level of stimulation
Broaden the user’s horizon / avoid bubbles
The task is often explicitly about discovery
(Castells et al. Handbook 2015, Kaminskas & Bridge ACM TIIS 2017)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Why diverse and novel recommendations
For enhanced business performance
Sales diversity: mitigate risk, expand the business
Long tail: draw revenues from market niches
– “Sell less of more”
– Higher profit margin on cheaper long-tail products
Fairness! Give all stakeholders a fair chance
(Castells et al. Handbook 2015, Kaminskas & Bridge ACM TIIS 2017)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Why diverse and novel recommendations
For better system effectiveness (“a safer bet”)
Uncertainty about user preferences
– System observations are ambiguous, very incomplete
– User preferences are multiple, dynamic, contextual…
Increase chances of at least some relevant item
(Castells et al. Handbook 2015, Kaminskas & Bridge ACM TIIS 2017)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Outline
1. Motivation: beyond relevance
2. Measuring novelty and diversity
3. Enhancing novelty and diversity
4. Biases in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Outline
1. Motivation: beyond relevance
2. Measuring novelty and diversity
3. Enhancing novelty and diversity
4. Biases in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Measuring accuracy
Rating matrix with some available cell values, most cells empty
Rank items by predicting missing ratings
4 4 2 2 2
4 1 4
4 3 2 5 2
4 3 5 2
1 5 1
Use
rs
Items
Abstraction of user-iteminteraction
The “rating” matrix
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Measuring accuracy
4 4 2 2 2
4 1 4
4 3 ? 2 5 ? 2
4 3 5 2
1 5 1
Use
rs
Items
Abstraction of user-iteminteraction
The “rating” matrix
Rating matrix with some available cell values, most cells empty
Rank items by predicting missing ratings
Evaluation: see if predictions match reality
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Measuring accuracy
4 4 2 2 2
4 1 4
4 3 2 5 2
4 3 5 2
1 5 1
Use
rs
Items
Abstraction of user-iteminteraction
The “rating” matrix
Rating matrix with some available cell values, most cells empty
Rank items by predicting missing ratings
Evaluation: see if predictions match reality
Offline evaluation: just hide a few cell values and use them as test
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Diversity
Measuring diversity and novelty
Different metrics for different notions
Common notions
– Unpopularity
– Unexpectedness
– Serendipity
– Intra-list dissimilarity
– Sales diversity
– Aspect-based diversity
Many other, more particular metrics
𝑅. . .
𝑖1𝑖2𝑖3𝑖4𝑖5
𝑢
Novelty
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Novelty: never seen vs. not familiar
I have not seen this movie
But I have seen these movies…
Measuring novelty
(Vargas & Castells RecSys 2011, Castells et al. Handbook 2015)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
𝑝𝑖
𝑖
Shorthead
Notnovel Novel
Long tail
Long-tail novelty
What is the chance the user has never seen the items
How “not popular” are the recommended items
E.g. mean self-information
MSI = −1
𝑅
𝑖∈𝑅
log2 𝑝𝑖
Popularity of 𝑖
Measuring novelty – never seen
𝑝𝑖 =#users who have interacted with 𝑖
total #users
(Zhou et al. PNAS 2010, Vargas & Castells RecSys 2011, Zhang et al. WSDM 2012, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Unexpectedness
User-specific
How unfamiliar the items are to the user experience
E.g. average distance to items in user profile
𝑅
𝑑 𝑖, 𝑗
Measuring novelty – not familiar
Unexp =1
𝑅 𝑢 𝑖∈𝑅𝑗∈𝑢
𝑑 𝑖, 𝑗Items“rated”by 𝑢
𝑢
(Adamopoulos & Tuzhilin ACM TIST 2014, Hurley & Zhang ACM TOIT 2011, Zhang et al. WSDM 2012, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Serendipity novelty + relevance
Novel
Relevant
Measuring novelty – serendipity
E.g. compute a novelty metric counting only relevant items(Iaquinta HIS 2008, Ge et al. RecSys 2010, Zhang et al. WSDM 2012)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Intra-list dissimilarity: average pairwise distance
ILD =2
𝑅 𝑅 − 1 𝑖,𝑗∈𝑅𝑖≠𝑗
𝑑 𝑖, 𝑗
Measuring diversity
𝑅
𝑑 𝑖, 𝑗 = 1 − 𝑠𝑖𝑚 𝑖, 𝑗 (based on item features)
(Smyth & McClave ICCBR 2001, Ziegler et al. WWW 2005, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Aspect-based diversity
With respect to a space of user “subtastes”:
genres, categories, etc.
Inspired on intent-oriented search diversity
(Vargas et al. SIGIR 2011, Wasilewski & Hurley UMAP 2018, Kaya & Bridge UMUAI 2019, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Diversity in search
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Diversity in search
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
“Avoid redundancy of possible user intents (aspects)
as a means to cope with the uncertainty in the query”
Diversity in search
(Carbonell & Goldstein SIGIR 1998, Clarke et al. SIGIR 2008, Agrawal et al. WSDM 2009, Santos et al. WWW 2010, Santos et al. Found. & Trends in IR 2015)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Search result diversity
Added utility Added utility
Rel
evan
t d
ocu
men
t ra
nk
Relevan
t do
cum
ent ran
k
Query senses / aspects
. . .
. . .
(query ambiguity / incompleteness)
Uniformresults Diverse
results
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Metrics
Aspect recall
Intent-aware metrics
ERR−IA =
𝑑𝑘∈𝑅
1
𝑘
𝑎∈𝒜𝑞
𝑟𝑒𝑙 𝑑𝑘 𝑎
𝑗<𝑘
1 − 𝛼 𝑟𝑒𝑙 𝑑𝑗 𝑎
Search diversity evaluation
=1
𝒜𝑞# 𝑎 ∈ 𝒜𝑞 ∃𝑑 ∈ 𝑅 that covers 𝑎
Novelty
Diversity
RelevanceRanking
(Clarke et al. SIGIR 2008, Agrawal et al. WSDM 2009, Chapelle et al. Inf. Ret. 2011)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Metrics
Aspect recall
Intent-aware metrics
ERR−IA =
𝑑𝑘∈𝑅
1
𝑘
𝑎∈𝒜𝑞
𝑟𝑒𝑙 𝑑𝑘 𝑎
𝑗<𝑘
1 − 𝛼 𝑟𝑒𝑙 𝑑𝑗 𝑎
Aspects?
Query aspects: manually defined (e.g. TREC), Wikipedia
disambiguation, suggested query reformulations…
Document aspects: categories, clusters…
Search diversity evaluation
=1
𝒜𝑞# 𝑎 ∈ 𝒜𝑞 ∃𝑑 ∈ 𝑅 that covers 𝑎
(Clarke et al. SIGIR 2008, Agrawal et al. WSDM 2009, Chapelle et al. Inf. Ret. 2011, Santos et al. WWW 2010)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
“Avoid redundancy of possible user intents (aspects)
as a means to cope with the uncertainty in the query”
in the observed evidence of user interests”
Aspect-based diversity in recommendation
(Vargas et al. SIGIR 2011, Wasilewski & Hurley UMAP 2018, Kaya & Bridge UMUAI 2019, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
4 4 2 2 2
1 4 4 4
4 3 2 5 2
4 3 3 2 2
1 1 5 1 5 5
Use
rs
Items
𝑢
𝑖
Aspect-based diversity in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
4 4 2 2 2
1 4 4 4
4 3 2 5 2
4 3 3 2 2
1 1 5 1 5 5
Use
rs
Items
User profile𝑢
𝑖
Aspect-based diversity in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
4 3 2 5 2 User profile
Items
𝑢
𝑖
Aspect-based diversity in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Items
Item
feat
ure
s Aspects from item features, using a “meaningful” item feature space
4 3 2 5 2𝑢
𝑖
Aspect-based diversity in recommendation
User profile
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Items
Item
feat
ure
s
4 3 2 5 2
“User aspects”
𝑢
𝑖
Aspect-based diversity in recommendation
Aspects from item features, using a “meaningful” item feature space
Derive user aspect distributions
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Items
Item
feat
ure
s Aspects from item features, using a “meaningful” item feature space
Derive user aspect distributions
4 3 2 5 2
“User aspects”
𝑢
𝑖
Aspect-based diversity in recommendation
IR diversity metrics and algorithms can now be applied
Other approaches to user interest subdivision have been considered
(Vargas et al. SIGIR 2011, Wasilewski & Hurley UMAP 2018, Kaya & Bridge UMUAI 2019, Vargas et al. OAIR 2013)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Sales diversity
Seller perspective
How spread are recommendations over the item inventory
Catalog exposure to sales
Items
Nr
use
rs t
o w
ho
mit
em is
rec
om
end
ed
Items
Recommender BRecommender A
(Adomavicius & Kwon TKDE 2012, Li & Murata WI 2012, Vargas & Castells RecSys 2014, Jannach et al. UMUAI 2015, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Sales diversity
. . .
. . .
. . .
. . .
“Ecosystem”One “species”
Set of all recommendations
Set of all items
Metrics: function over set of recommendations
Metrics adapted from ecology and other fields
Recommendation“slots”
One “individualof some species”
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Sales diversity
“Ecosystem”One “species”
Set of all recommendations
Set of all items
Metrics: function over set of recommendations
Metrics adapted from ecology and other fields
Recommendation“slots”
One “individualof some species”
. . .
. . .
. . .
. . .
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Sales diversity
“Ecosystem”One “species”
Set of all recommendations
Set of all items
Recommendation“slots”
One “individualof some species”
. . .
. . .
. . .
. . .
Aggregate diversity
Total number of different items recommended in top 𝑛
Equivalent to “species richness”
Aggdiv = ∪𝑢 𝑅𝑢
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Aggregate diversity
Total number of different items recommended in top 𝑛
Equivalent to “species richness”
Gini-Simpson index
GSI = 1 −
𝑖
𝑝𝑖2
𝑝𝑖 = ratio of users to whom 𝑖 is recommended
Gini coefficient
Entropy
H = −
𝑖
𝑝𝑖 log2 𝑝𝑖
Sales diversity
G =1
ℐ − 1
𝑘=1
ℐ
2𝑘 − ℐ − 1 𝑝𝑖𝑘
Aggdiv = ∪𝑢 𝑅𝑢
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Sales diversity
Aggregate diversity: A as good as B
Gini, Gini-Simpson, Entropy: B better than A
Items
Nr
use
rs t
o w
ho
mit
em is
rec
om
end
ed
Items
Recommender A Recommender B
𝑛 𝑛
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Common underlying principle to different diversity notions
Context
Recommended item
Target user’sexperience
Everyone else’sexperience
Everyone else’srecommendations
Other items in thesame recommendation
UnexpectednessIntra-listdiversity
Long-tailnovelty Sales diversity
Distance or identity
Item novelty model
(Vargas & Castells RecSys 2011, Castells et al. Handbook 2015)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
nDCG
ERR-IA 0.64
Aspect recall -0.02 0.03
ILD 0.71 -0.09 0.03
Unexpectedness 0.85 0.62 -0.06 0.07
Long-tail novelty (MSI) -0.19 -0.21 -0.19 0.10 0.02
Sales diversity (IUD) 0.87 -0.23 -0.27 -0.20 0.14 0.06
Relation between metrics
Pearson correlation (on MF baseline recommender)
Aspect-based diversity
(Castells et al. Handbook 2015)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Outline
1. Motivation: beyond relevance
2. Measuring novelty and diversity
3. Enhancing novelty and diversity
4. Biases in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Outline
1. Motivation: beyond relevance
2. Measuring novelty and diversity
3. Enhancing novelty and diversity
4. Biases in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Diversity and accuracy viewed as opposing objectives
• Enhancing diversity (or novelty) is expected to involve
some sacrifice in accuracy
• The goal is to achieve an optimal trade-off:
a multiobjective optimization problem
• Results assessed by two metrics:
relevance vs. diversity/novelty
Novelty and diversity enhancement
Diversity
Acc
ura
cy
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Greedy reranking for novelty / diversity
Input data(observations)
Accuracyalgorithm
Initialranking
End user
Diversifiedranking
𝑅 𝑆
. . .
Greedy versionof target metric
𝜙 𝑖 𝑆, 𝑢 = 1 − 𝜆 rel 𝑢, 𝑖 + 𝜆 div 𝑖 𝑆, 𝑢
Initial ranking
(Ziegler et al. WWW 2005, Carbonell SIGIR 1998, Agrawal et al. WSDM 2009, Santos et al. WWW 2010, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Greedy reranking for novelty / diversity
For instance…
div 𝑖 𝑆, 𝑢 =
𝑗∈𝑆
𝑑 𝑖, 𝑗
div 𝑖 𝑆, 𝑢 =
𝑗∈𝐮
𝑑 𝑖, 𝑗
div 𝑖 𝑆, 𝑢 = − log2 𝑝𝑖
IL Diversity (ILD)
Unexpectedness
Long-tail novelty (MSI)
𝜙 𝑖 𝑆, 𝑢 = 1 − 𝜆 rel 𝑢, 𝑖 + 𝜆 div 𝑖 𝑆, 𝑢(Ziegler et al. WWW 2005, Carbonell SIGIR 1998, Agrawal et al. WSDM 2009, Santos et al. WWW 2010, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Many other specific approaches…
More sophisticated multiobjective optimization
User subprofiles, latent factors
Graph-based, clustering, portfolio theory…
Progressive transition towards the long tail
External vs. internal to algorithm
Novelty and diversity enhancement approaches
(Smyth & McClave ICCBR 2001, Ziegler et al. WWW 2005, Celma & Herrera RecSys 2008, Zhou et al. PNAS 2010, Hurley & Zhang TOIT 2011, Zhang et al. WSDM 2012, Shi et al. SIGIR 2012, Adomavicius & Kwon IEEE TKDE 2012, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Novelty and diversity by weighted recommender ensembles
Multi-objective maximization of accuracy & novelty (MSI) &
diversity (ILD) evolutionary algorithm
Find the Pareto frontier on tradeoffs between the 3 metrics
MSI MSI
Acc
ura
cy (
reca
ll)
ILD ILDMovieLens Last.fm
Novelty and diversity enhancement
(Ribeiro-Neto et al. RecSys 2012, Veloso et al. ACM TIST 2014)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Sales diversity enhancement
Recommend users to items
(Vargas & Castells RecSys 2014)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Sales diversity enhancement
Recommend users to items
By taking inverse kNN
neighborhoods
(Vargas & Castells RecSys 2014)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Outline
1. Motivation: beyond relevance
2. Measuring novelty and diversity
3. Enhancing novelty and diversity
4. Biases in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Outline
1. Motivation: beyond relevance
2. Measuring novelty and diversity
3. Enhancing novelty and diversity
4. Biases in recommendation
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Think about novelty again
Would you find it useful to recommend these?
Why not?
Movie Book Tourist attraction Music album
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Think about novelty again
We do not always wantthe same amount of novelty
(Kapoor et al. RecSys 2015, Mcalister & Pessemier Cons. Res. 2010, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
~13K ratings 25 ratings~80K ratings
Think about novelty again
Degrees of popularity beyond the short head
What is the effect of popularity?
5 ratings~300K ratings
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
There is a relation between popularity and accuracy
Popular items(short head)
Rest of items(long tail)
Observed user-item interaction
Unobserved preference
Items
Use
rs
Ratings are missingnot at random (MNAR)
(Marlin et al. RecSys 2010, Steck RecSys 2010, 2011, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
There is a relation between popularity and accuracy
Test data (relevant items)
Training data
Unobserved preference
Items
Use
rs
Popular items(short head)
Rest of items(long tail)
avg P@𝑘 ∼𝑘
𝑘
Ratings are missingnot at random (MNAR)
(Marlin et al. RecSys 2010, Steck RecSys 2010, 2011, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
There is a relation between popularity and accuracy
Random
Nr. positive ratings
User-based kNN
Matrix factorization0.3
0.2
0.1
0
nD
CG
@1
0
MovieLens 1M
(Cremonesi et al. RecSys 2010, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Popularity bias in recommendation algorithms
Matrix factorization
# positive ratings
# ti
me
sre
com
men
de
d
in t
op
10
0
400
800
0 1000 2000
Popularity
800
400
00 1000 2000
User-based kNN
# positive ratings
2000
1000
00 1000 2000
(Jannach et al. UMUAI 2015, Cañamares & Castells SIGIR 2017)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Can we trust our experiments?
Computed on availableuser taste observations
Computed with fullknowledge of user tastes
Observed metric value True metric value
Items
Use
rs
Relevant
Non relevant
Missing ratings
?≈
Items
Use
rs
(Cañamares & Castells SIGIR 2018)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Get rid of the popularity bias
In the data
Items Items
# ra
tin
gs
Flat test Popularity strata
Time
Temporal split
Test data (relevant items)
Training data
Unobserved preference
(Bellogín et al. Inf. Ret. 2017)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Get rid of the popularity bias
In the metrics
Stratified recallOff-policy evaluationInverse propensity scoring···
Divide the relevance of items by the probability to be discovered
Problem: howto estimatepropensity
𝑃 =1
𝑅
𝑖∈𝑅
𝑟𝑒𝑙 𝑖, 𝑢 𝑜𝑏𝑠 𝑖, 𝑢 →1
𝑅
𝑖∈𝑅
𝑟𝑒𝑙 𝑖, 𝑢 𝑜𝑏𝑠 𝑖, 𝑢
𝑝 𝑜𝑏𝑠 𝑖, 𝑢
In the algorithms unbiased learning
In the data: unbiased datasets, e.g. Yahoo! R3, CM100k
(Steck RecSys 2011, Schnabel et al. ICML 2016, Swaminathan et. al NIPS 2017, Yang et al. RecSys 2018, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Should we really get rid of the popularity bias?
Items
# in
tera
ctio
ns
𝑎 𝑏
What made 𝑎 be so much
more popular than 𝑏?
(Cañamares & Castells SIGIR 2018)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
?
Should we really get rid of the popularity bias?
The popularity bias may not be “bad”
– If item discovery and user rating is aligned
with relevance, popularity is a relevance signal
– Rational herd behavior
But it can distort evaluation
– If popularity is generated independently from relevance
– E.g. marketing, conformity, manipulation, randomness
Implications on state of the art algorithms
Can have unfair implications if tied to sensitive features
(Cañamares & Castells SIGIR 2018)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
The recommendation feedback loop
Recommender systems bias themselves– Self-reinforced (popularity) concentration
– Increasingly poor sales diversity
Biases in offline evaluation with the logged observations
External sources: search, browsing, questionnaires, etc.
Input data(observations)
Recommendationalgorithm
Recommendation
Feedback
Learning (exploration)
Satisfaction(exploitation)
Feedbackloop
(Fleder & Hossanagar Mgt. Sci. 2009, Chaney et al. RecSys 2018, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Breaking the feedback loop: multi-armed bandits
Banditpolicy 1. Select
arm2. Get
reward
Estimated(models)
𝜇
𝜇
𝜇
3. Update estimated reward model of arm
True (unob-served)
𝜇
𝜇
𝜇
Reward distributions
Arms
Multi-armed bandit problem:
Choose an arm iteratively and maximize total payoff
without knowing reward distributions in advance
(Sutton & Barto RL book 2018, Chapelle & Li NIPS 2011, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Breaking the feedback loop: multi-armed bandits
Banditpolicy
Estimated(models)
3. Update estimated reward model of arm𝜇
𝜇
𝜇
True (unob-served)
𝜇
𝜇
𝜇
Recommendation keeps an ingredient of randomness (exploration) in its actions– Aware (explicit model) of uncertainty in present knowledge about the user
– Gives apparently suboptimal options a chance to be reconsidered
Actions can be items, latent factors, clusters, neighbors, algorithms…
Do much better in the mid/long run!!
Reward distributions
Arms
1. Selectarm
2. Getreward
(Li et al. SIGIR 2016, Lacerda Neurocomputing 2017, McInerney et al. RecSys 2018, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Conclusions
Novelty, bias and reinforcement learning are related problems
Novelty & diversity are now state of the art
– Different notions and metrics for different angles
Bias: popular items score high in accuracy in offline experiments
– Progress made in understanding and seeking to avoid
Reinforcement loop bias: multi-armed bandits and
reinforcement learning can greatly help
– And improve sales diversity
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Open directions
Large room for research, matters to industry
Better understand the role of novelty and diversity in user needs
Unbiased evaluation
– How to estimate propensity
– Model complex biases e.g. involving user pairs
– Build unbiased datasets
Multi-armed bandits and reinforcement learning
– How to map the task, algorithmic research
– How to evaluate methods and represent different scenarios
(Nguyen et al. WWW 2014, Kapoor et al. RecSys 2015, Karumur et al. CSCW 2016, etc.)
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Thank you for your attention!
Questions?
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
References
Adamopoulos, P., Tuzhilin, A. On Unexpectedness in Recommender Systems: Or How to Expect the Unexpected. ACM TIST 5(4), Special Issue on Novelty and Diversity in Recommender Systems, January 2015.
Adomavicius, G., Kwon, Y. Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques. IEEE Trans. on Knowl. and Data Eng. 24(5), May 2012.
Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S. Diversifying search results. WSDM 2009, Barcelona, Spain, pp. 5-14.
Anderson, C. The Long Tail: Why the Future of Business is Selling Less of More. Hyperion, New York, NY, USA, 2006.
Bellogín, A. , Castells, P. and Cantador, I. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval 20(6), July 2017, 606-634.
Brickman, P., D’Amato, B. Exposure Effects in a Free Choice Situation. Journal of Personality and Social Psychology 32(3), 1975, pp. 415-420.
Cañamares, R. and Castells, P. Should I follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2017, Ann Arbor, MI, USA, pp. 415-424.
Cañamares, R. and Castells, P. A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases. SIGIR 2017, Ann Arbor, MI, USA, pp. 215-224.
Cañamares, R., Redondo, M., Castells, P. Multi-Armed Recommender System Bandit Ensembles. RecSys 2019, Copenhagen, Denmark, pp. 432-436.
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Carbonell, J. G. and Goldstein, J. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. SIGIR 1998, Melbourne, Australia, 335-336.
Castells, P, Hurley, N. J., Vargas, S. Novelty and Diversity in Recommender Systems. In: Recommender Systems Handbook, 2nd edition, F. Ricci, L. Rokach, B. Shapira (Eds.). Springer, New York, NY, USA, pp. 881-918.
Castells, P., Wang. J., Lara, R., Zhang. D. Workshop on novelty and diversity in recommender systems – DiveRS 2011. RecSys 2011, Chicago, Illinois, USA, pp. 393-394.
Celma, O. and Herrera, P. A New Approach to Evaluating Novel Recommendations. RecSys 2008, Lausanne, Switzerland, pp. 179-186.
Chaney, A. J. B., Stewart, B. M., Engelhardt, B. E. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. RecSys 2018, Vancouver, Canada, pp. 224-232.
Chapelle, O., Ji, S., Liao, C., Velipasaoglu, E., Lai, L., Wu, S-L. Intent-based diversification of web search results: metrics and algorithms. Information Retrieval 14(6), December 2011, pp. 572-592.
Chapelle, O. and Li, L. An empirical evaluation of Thompson Sampling. NIPS 2011, Granada, Spain, pp. 2249-2257.
Chen, H. and Karger, D. R. Less is More. 29th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 2006). Seattle, WA, USA, pp. 429-436.
Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I. Novelty and diversity in information retrieval evaluation. SIGIR 2008, Singapore, pp. 659-666.
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Clarke, C. L. A., Craswell, N., Soboroff, I, Cormack, G. V. Overview of the TREC 2010 Web Track. TREC 2010, Gaithersburg, MD, USA.
Clarke, C. L. A., Craswell, N., Soboroff, I., Ashkan, A. A Comparative Analysis of Cascade Measures for Novelty and Diversity. WSDM 2011, Hong-Kong, China, pp. 75-84.
Cremonesi, P., Koren, Y. and Turrin, R. Performance of recommender algorithms on top-n recommendation tasks. RecSys 2010, Barcelona, Spain, pp. 39-46.
Fleder, D. M. and Hosanagar, K. Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity. Management Science 35(5), May 2009, pp. 697-712.
Ge. M., Delgado-Battenfeld, C., Jannach,D. Beyond accuracy: evaluating recommender systems by coverage and serendipity. RecSys 2010, Barcelona, Spain, pp. 257-260.
Hurley, N., Zhang, M. Novelty and Diversity in Top-N Recommendation – Analysis and Evaluation. ACM TIIT 10(4), March 2011.
Iaquinta, L., de Gemmis, M., Lops, P., Semeraro, G., Filannino, M., Molino, P. Introducing Serendipity in a Content-based Recommender System. HIS 2008, Barcelona, Spain, September 2008.
Jalili, M., Javari, A. Accurate and novel recommendations: An algorithm based on popularity forecasting. ACM TIST 5(4), Special Issue on Novelty and Diversity in Recommender Systems, Jan. 2015.
Jannach, D., Lerche, L., Kamehkhosh, I. Jugovac, M. What recommenders recommend: an analysis of recommendation biases and possible countermeasures. UMUAI 25(5), Dec. 2015, pp. 427-491.
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Kahn, B. E. Consumer variety-seeking among goods and services: An integrative review. Journal of Retailing and Consumer Services 2(3), July 1995, pp.139-148.
Kaminskas, M., Bridge, D. Diversity, Serendipity, Novelty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems. ACM TIIS 7(1), March 2017.
Kapoor, K., Kumar, V., Terveen, L. G., Konstan, J. A., Schrater, P. R. “I like to explore sometimes”: Adapting to Dynamic User Novelty Preferences. RecSys 2015, Vienna, Austria, pp. 19-26.
Karumur, R. P., Nguyen, T. T., Konstan, J. A. Early Activity Diversity: Assessing Newcomer Retention from First-Session Activity. CSCW 2016, San Francisco, CA, USA, pp. 594-607.
Lacerda, A. Multi-Objective Ranked Bandits for Recommender Systems. Neurocomputing 246, July 2017, 12-24.
Lathia, N., Hailes, S., Capra, L., Amatriain, X. Temporal Diversity in Recommender Systems. SIGIR 2010, Geneva, Switzerland, 210-217.
Li, S., Karatzoglou, A. and Gentile, C. Collaborative Filtering Bandits. SIGIR 2016, Pisa, pp. 539-548.
Maddi, S. R. The Pursuit of Consistency and Variety. In Abelson, R. P. et al. (Eds.), Theories of Cognitive Consistency: A Sourcebook, Rand McNally, Chicago, 1968, pp. 61-85.
Marlin, B. M., Zemel, R. S. Collaborative prediction and ranking with non-random missing data. RecSys 2009, New York, NY, USA, pp. 5-12.
McAlister, L. Choosing Multiple Items from a Product Class. Journal of Consumer Research 6, December 1979, pp. 213-224.
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
McAlister, L., Pessemier, E. A. Variety seeking behavior: an interdisciplinary review. Journal of Consumer Research 9, December 1982.
McNee, S. M., Riedl, J., Konstan, J. A. Being Accurate is Not Enough: How Accuracy Metrics have hurt Recommender Systems. CHI 2006, Montréal, Canada, pp. 1097-1101.
McInerney, J., Lacker, B., Hansen, S., Higley, K., Bouchard, H., Gruson, A. and Mehrotra, R. Explore, exploit, and explain: personalizing explainable recommendations with bandits. RecSys 2018, Vancouver, Canada, pp. 31-39.
Mourão, F., Fonseca, C., Araújo, C., Meira Jr., W. The Oblivion Problem: Exploiting Forgotten Items to Improve Recommendation Diversity. Workshop on Novelty and Diversity in Recommender Systems (DiveRS 2011) at RecSys 2011, Chicago, Illinois, October 2011, pp. 27-34.
Murakami, T., Mori, K., Orihara, R. Metrics for Evaluating the Serendipity of Recommendation Lists. JSAI 2007. Mizayaki, Japan, June 2007. Also in Springer Verlag LNCS Vol. 4914, 2008, pp 40-46.
Nguyen T. T., Hui, P-M., Harper, F. M., Terveen, L. G., Konstan, J. A. Exploring the filter bubble: the effect of using recommender systems on content diversity. WWW 2014, Seoul, Korea, pp. 677-686.
Onuma, K., Tong, H., Faloutsos, C. TANGENT: a novel, ‘Surprise me’, recommendation algorithm. KDD 2009, pp. 657-666.
Park, Y-J., Tuzhilin, A. The long tail of recommender systems and how to leverage it. RecSys 2008, Lausanne, Switzerland, pp. 11-18.
Patil, G. P., Taillie, C. Diversity as a Concept and its Measurement. Journal of the American Statistical Association 77(379), September 1982, pp. 548-561.
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Raju, P. S. Optimum Stimulation Level: Its Relationship to Personality, Demographics and Exploratory Behavior. Journal of Consumer Research 7(3), December 1980, pp. 272-282.
Ribeiro, M. T., Lacerda, A., Veloso, A. and Ziviani, N. Pareto-efficient hybridization for multi-objective recommender systems. RecSys 2012, Dublin, Ireland, September 2012, pp. 19-26.
Salganik, M. J., Dodds, P. S. and Watts, D. J. Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science 311(5762), February 2006, pp. 854-856.
Santos, R. L. T., Macdonald, C., Ounis, I. Exploiting query reformulations for web search result diversification. WWW 2010, Raleigh, NC, USA, April 2010, pp. 881-890.
Santos, R. L. T., Macdonald, C., Ounis, I. Search Result Diversification. Foundations and Trends in Information Retrieval 9(1), 2015.
Sanz-Cruzado, J. Castells, P. Enhancing Structural Diversity in Social Networks by Recommending Weak Ties. RecSys 2018, Vancouver, Canada, pp. 233-241.
Sanz-Cruzado, J., Castells, P., López, E. A Simple Multi-Armed Nearest-Neighbor Bandit for Interactive Recommendation. RecSys 2019, Copenhagen, Denmark, pp. 358-362.
Schnabel, T., Swaminathan, A., Singh, A., Chandak, N. and Joachims, T. Recommendations as Treatments: Debiasing Learning and Evaluation. ICML 2016, New York, NY, USA, pp. 1670-1679.
Shi, Y., Zhao, X., Wang, J., Larson, M., Hanjalic, A. Adaptive diversification of recommendation results via latent factor portfolio. SIGIR 2012, Portland, OR, USA, pp. 175-184.
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Sinha, A., Gleich, D. F. Ramani, K. Deconvolving Feedback Loops in Recommender Systems. NIPS 2016, Barcelona, Spain, December 2016, pp. 3243-3251.
Smyth, B. McClave, P. Similarity vs. diversity. ICCBR 2001. London, UK, pp. 347-361.
Steck, H. Training and Testing of Recommender Systems on Data Missing not at Random. KDD 2010, Washington D. C., USA, pp. 713-722.
Steck, H. Item popularity and recommendation accuracy. RecSys 2011, Chicago, IL, pp. 125-132.
Sutton R. and Barto, A. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, MA, USA, 2018.
Swaminathan, A., Krishnamurthy, A., Agarwal, A., Dudik, M., Langford, J., Jose, D. and Zitouni, I. Off-policy Evaluation for Slate Recommendation. NIPS 2017, Long Beach, CA, USA, pp. 3635-3645.
Vallet, D. and Castells, P. Personalized Diversification of Search Results. SIGIR 2012, Portland, OR, USA, pp. 841-850.
Varadarajan, P. Product Diversity and Firm Performance: An Empirical Investigation. Journal of Marketing 50(3), July 1986, pp. 43-57.
Vargas, S., Castells, P. and Vallet, D. Intent-Oriented Diversity in Recommender Systems. SIGIR 2011, Beijing, China, pp. 1211-1212.
Vargas, S. and Castells, P. Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems. RecSys 2011. Chicago, Illinois, pp. 109-116.
IRGIR Group @ UAM
Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)
Fortaleza, Brazil, October 10, 2019
Vargas, S. and Castells, P. Exploiting the Diversity of User Preferences for Recommendation. OAIR 2013, Lisbon, Portugal, May 2013.
Veloso, A., Ribeiro, M., Lacerda, A., Moura, E., Hata, I. and Ziviani, N. Multi-Objective Pareto-Efficient Approaches for Recommender Systems. ACM TIST 5(4), Special Issue on Novelty and Diversity in Recommender Systems, January 2015.
Yang, L., Cui, Y., Xuan, Y. , Wang, C. , Belongie, S. and Estrin, D. Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback. RecSys 2018, Vancouver, Canada, pp. 279-287.
Zhang, M. and Hurley, N. Avoiding Monotony: Improving the Diversity of Recommendation Lists. RecSys 2008, Lausanne, Switzerland, 123-130.
Zhang, M., Hurley, N. Novel Item Recommendation by User Profile Partitioning. Web Intelligence 2009, pp. 508-515.
Zhang, Y. C., Ó Séaghdha, D., Quercia, D., Jambor, T. Auralist: introducing serendipity into music recommendation. WSDM 2012, Seattle, WA, USA, pp. 13-22.
Zhou, T., Kuscsik, Z., Liu, J-G., Medo, M., Wakeling, J. R., Zhang, Y-C. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS 107(10), March 2010, pp. 4511-4515.
Ziegler, C-N., McNee, S. M., Konstan, J. A., Lausen, G. Improving recommendation lists through topic diversification. WWW 2005, Chiba, Japan, pp. 22-32.