Recommender systems evaluation beyond accuracyir.ii.uam.es/castells/lars2019.pdf · ACM Latin American School on Recommender Systems (LARS 2019) Fortaleza, Brazil, October 10, 2019

IRGIR Group @ UAM

Recommender Systems Evaluation Beyond AccuracyACM Latin American School on Recommender Systems (LARS 2019)

Fortaleza, Brazil, October 10, 2019

Recommender Systems EvaluationBeyond Accuracy

ACM Latin American School on Recommender Systems

Pablo CastellsUniversidad Autónoma de Madrid

http://ir.ii.uam.es/castells


IRGIR Group @ UAM



Outline

1. Motivation: beyond relevance

2. Measuring novelty and diversity

3. Enhancing novelty and diversity

4. Biases in recommendation

IRGIR Group @ UAM



Outline





IRGIR Group @ UAM



Motivation

What is the purpose of recommendation?

Satisfying users …by making suggestions they like

If we recommend things that a user likes,

then the user will be satisfied

Ergo…

IRGIR Group @ UAM



Motivation

Do you like this?

Book Tourist attraction Music albumMovie

IRGIR Group @ UAM



Motivation

Would you find it useful to recommend it?

Probably notEverybody knows those already

Movie Book Tourist attraction Music album

IRGIR Group @ UAM



Motivation

Would you find it useful to recommend this?

Maybe, provided they are liked…


IRGIR Group @ UAM



Motivation


Not obvious or widely known

…but too much of the same genre?

Sci-fi Sci-fi Sci-fi Sci-fi Sci-fi

IRGIR Group @ UAM



Motivation


Sci-fi AnimationComedy Adventure Documentary

Seems better?

IRGIR Group @ UAM



Beyond accuracy…

How to improve?

Define

Understand

Measure

…then try to improve

NoveltyDiversity

IRGIR Group @ UAM



Definition

How different recommendations are

from “something else”

E.g. user knowledge or experience

Novelty

(Vargas & Castells RecSys 2011, Castells et al. Handbook 2015)

IRGIR Group @ UAM



Definition

How different recommendations are

to each other

How novel each item is to the other

recommended items

Diversity


IRGIR Group @ UAM



Why diverse and novel recommendations

For the sake of it: direct user satisfaction

Natural variety-seeking drive in human behavior

– Within a recommendation and over time

– Desire for the unfamiliar, alternation among the familiar

– Ideal level of stimulation

Broaden the user’s horizon / avoid bubbles

The task is often explicitly about discovery

(Castells et al. Handbook 2015, Kaminskas & Bridge ACM TIIS 2017)

IRGIR Group @ UAM




For enhanced business performance

Sales diversity: mitigate risk, expand the business

Long tail: draw revenues from market niches

– “Sell less of more”

– Higher profit margin on cheaper long-tail products

Fairness! Give all stakeholders a fair chance


IRGIR Group @ UAM




For better system effectiveness (“a safer bet”)

Uncertainty about user preferences

– System observations are ambiguous, very incomplete

– User preferences are multiple, dynamic, contextual…

Increase chances of at least some relevant item


IRGIR Group @ UAM



Outline





IRGIR Group @ UAM



Outline





IRGIR Group @ UAM



Measuring accuracy

Rating matrix with some available cell values, most cells empty

Rank items by predicting missing ratings

4 4 2 2 2

4 1 4

4 3 2 5 2

4 3 5 2

1 5 1

Use

rs

Items

Abstraction of user-iteminteraction

The “rating” matrix

IRGIR Group @ UAM



Measuring accuracy

4 4 2 2 2

4 1 4

4 3 ? 2 5 ? 2

4 3 5 2

1 5 1

Use

rs

Items





Evaluation: see if predictions match reality

IRGIR Group @ UAM



Measuring accuracy

4 4 2 2 2

4 1 4

4 3 2 5 2

4 3 5 2

1 5 1

Use

rs

Items





Evaluation: see if predictions match reality

Offline evaluation: just hide a few cell values and use them as test

IRGIR Group @ UAM



Diversity

Measuring diversity and novelty

Different metrics for different notions

Common notions

– Unpopularity

– Unexpectedness

– Serendipity

– Intra-list dissimilarity

– Sales diversity

– Aspect-based diversity

Many other, more particular metrics

𝑅. . .

𝑖1𝑖2𝑖3𝑖4𝑖5

𝑢

Novelty

IRGIR Group @ UAM



Novelty: never seen vs. not familiar

I have not seen this movie

But I have seen these movies…

Measuring novelty


IRGIR Group @ UAM



𝑝𝑖

𝑖

Shorthead

Notnovel Novel

Long tail

Long-tail novelty

What is the chance the user has never seen the items

How “not popular” are the recommended items

E.g. mean self-information

MSI = −1

𝑅

𝑖∈𝑅

log2 𝑝𝑖

Popularity of 𝑖

Measuring novelty – never seen

𝑝𝑖 =#users who have interacted with 𝑖

total #users

(Zhou et al. PNAS 2010, Vargas & Castells RecSys 2011, Zhang et al. WSDM 2012, etc.)

IRGIR Group @ UAM



Unexpectedness

User-specific

How unfamiliar the items are to the user experience

E.g. average distance to items in user profile

𝑅

𝑑 𝑖, 𝑗

Measuring novelty – not familiar

Unexp =1

𝑅 𝑢 𝑖∈𝑅𝑗∈𝑢

𝑑 𝑖, 𝑗Items“rated”by 𝑢

𝑢

(Adamopoulos & Tuzhilin ACM TIST 2014, Hurley & Zhang ACM TOIT 2011, Zhang et al. WSDM 2012, etc.)

IRGIR Group @ UAM



Serendipity novelty + relevance

Novel

Relevant

Measuring novelty – serendipity

E.g. compute a novelty metric counting only relevant items(Iaquinta HIS 2008, Ge et al. RecSys 2010, Zhang et al. WSDM 2012)

IRGIR Group @ UAM



Intra-list dissimilarity: average pairwise distance

ILD =2

𝑅 𝑅 − 1 𝑖,𝑗∈𝑅𝑖≠𝑗

𝑑 𝑖, 𝑗

Measuring diversity

𝑅

𝑑 𝑖, 𝑗 = 1 − 𝑠𝑖𝑚 𝑖, 𝑗 (based on item features)

(Smyth & McClave ICCBR 2001, Ziegler et al. WWW 2005, etc.)

IRGIR Group @ UAM



Aspect-based diversity

With respect to a space of user “subtastes”:

genres, categories, etc.

Inspired on intent-oriented search diversity

(Vargas et al. SIGIR 2011, Wasilewski & Hurley UMAP 2018, Kaya & Bridge UMUAI 2019, etc.)

IRGIR Group @ UAM



Diversity in search

IRGIR Group @ UAM



Diversity in search

IRGIR Group @ UAM



“Avoid redundancy of possible user intents (aspects)

as a means to cope with the uncertainty in the query”

Diversity in search

(Carbonell & Goldstein SIGIR 1998, Clarke et al. SIGIR 2008, Agrawal et al. WSDM 2009, Santos et al. WWW 2010, Santos et al. Found. & Trends in IR 2015)

IRGIR Group @ UAM



Search result diversity

Added utility Added utility

Rel

evan

t d

ocu

men

t ra

nk

Relevan

t do

cum

ent ran

k

Query senses / aspects

. . .

. . .

(query ambiguity / incompleteness)

Uniformresults Diverse

results

IRGIR Group @ UAM



Metrics

Aspect recall

Intent-aware metrics

ERR−IA =

𝑑𝑘∈𝑅

1

𝑘

𝑎∈𝒜𝑞

𝑟𝑒𝑙 𝑑𝑘 𝑎

𝑗<𝑘

1 − 𝛼 𝑟𝑒𝑙 𝑑𝑗 𝑎

Search diversity evaluation

=1

𝒜𝑞# 𝑎 ∈ 𝒜𝑞 ∃𝑑 ∈ 𝑅 that covers 𝑎

Novelty

Diversity

RelevanceRanking

(Clarke et al. SIGIR 2008, Agrawal et al. WSDM 2009, Chapelle et al. Inf. Ret. 2011)

IRGIR Group @ UAM



Metrics

Aspect recall

Intent-aware metrics

ERR−IA =

𝑑𝑘∈𝑅

1

𝑘

𝑎∈𝒜𝑞

𝑟𝑒𝑙 𝑑𝑘 𝑎

𝑗<𝑘

1 − 𝛼 𝑟𝑒𝑙 𝑑𝑗 𝑎

Aspects?

Query aspects: manually defined (e.g. TREC), Wikipedia

disambiguation, suggested query reformulations…

Document aspects: categories, clusters…

Search diversity evaluation

=1

𝒜𝑞# 𝑎 ∈ 𝒜𝑞 ∃𝑑 ∈ 𝑅 that covers 𝑎

(Clarke et al. SIGIR 2008, Agrawal et al. WSDM 2009, Chapelle et al. Inf. Ret. 2011, Santos et al. WWW 2010)

IRGIR Group @ UAM



“Avoid redundancy of possible user intents (aspects)

as a means to cope with the uncertainty in the query”

in the observed evidence of user interests”

Aspect-based diversity in recommendation

(Vargas et al. SIGIR 2011, Wasilewski & Hurley UMAP 2018, Kaya & Bridge UMUAI 2019, etc.)

IRGIR Group @ UAM



4 4 2 2 2

1 4 4 4

4 3 2 5 2

4 3 3 2 2

1 1 5 1 5 5

Use

rs

Items

𝑢

𝑖


IRGIR Group @ UAM



4 4 2 2 2

1 4 4 4

4 3 2 5 2

4 3 3 2 2

1 1 5 1 5 5

Use

rs

Items

User profile𝑢

𝑖


IRGIR Group @ UAM



4 3 2 5 2 User profile

Items

𝑢

𝑖


IRGIR Group @ UAM



Items

Item

feat

ure

s Aspects from item features, using a “meaningful” item feature space

4 3 2 5 2𝑢

𝑖


User profile

IRGIR Group @ UAM



Items

Item

feat

ure

s

4 3 2 5 2

“User aspects”

𝑢

𝑖


Aspects from item features, using a “meaningful” item feature space

Derive user aspect distributions

IRGIR Group @ UAM



Items

Item

feat

ure

s Aspects from item features, using a “meaningful” item feature space

Derive user aspect distributions

4 3 2 5 2

“User aspects”

𝑢

𝑖


IR diversity metrics and algorithms can now be applied

Other approaches to user interest subdivision have been considered

(Vargas et al. SIGIR 2011, Wasilewski & Hurley UMAP 2018, Kaya & Bridge UMUAI 2019, Vargas et al. OAIR 2013)

IRGIR Group @ UAM



Sales diversity

Seller perspective

How spread are recommendations over the item inventory

Catalog exposure to sales

Items

Nr

use

rs t

o w

ho

mit

em is

rec

om

end

ed

Items

Recommender BRecommender A

(Adomavicius & Kwon TKDE 2012, Li & Murata WI 2012, Vargas & Castells RecSys 2014, Jannach et al. UMUAI 2015, etc.)

IRGIR Group @ UAM



Sales diversity

. . .

. . .

. . .

. . .

“Ecosystem”One “species”

Set of all recommendations

Set of all items

Metrics: function over set of recommendations

Metrics adapted from ecology and other fields

Recommendation“slots”

One “individualof some species”

IRGIR Group @ UAM



Sales diversity



Set of all items

Metrics: function over set of recommendations

Metrics adapted from ecology and other fields



. . .

. . .

. . .

. . .

IRGIR Group @ UAM



Sales diversity



Set of all items



. . .

. . .

. . .

. . .

Aggregate diversity

Total number of different items recommended in top 𝑛

Equivalent to “species richness”

Aggdiv = ∪𝑢 𝑅𝑢

IRGIR Group @ UAM



Aggregate diversity

Total number of different items recommended in top 𝑛

Equivalent to “species richness”

Gini-Simpson index

GSI = 1 −

𝑖

𝑝𝑖2

𝑝𝑖 = ratio of users to whom 𝑖 is recommended

Gini coefficient

Entropy

H = −

𝑖

𝑝𝑖 log2 𝑝𝑖

Sales diversity

G =1

ℐ − 1

𝑘=1

ℐ

2𝑘 − ℐ − 1 𝑝𝑖𝑘

Aggdiv = ∪𝑢 𝑅𝑢

IRGIR Group @ UAM



Sales diversity

Aggregate diversity: A as good as B

Gini, Gini-Simpson, Entropy: B better than A

Items

Nr

use

rs t

o w

ho

mit

em is

rec

om

end

ed

Items

Recommender A Recommender B

𝑛 𝑛

IRGIR Group @ UAM



Common underlying principle to different diversity notions

Context

Recommended item

Target user’sexperience

Everyone else’sexperience

Everyone else’srecommendations

Other items in thesame recommendation

UnexpectednessIntra-listdiversity

Long-tailnovelty Sales diversity

Distance or identity

Item novelty model


IRGIR Group @ UAM



nDCG

ERR-IA 0.64

Aspect recall -0.02 0.03

ILD 0.71 -0.09 0.03

Unexpectedness 0.85 0.62 -0.06 0.07

Long-tail novelty (MSI) -0.19 -0.21 -0.19 0.10 0.02

Sales diversity (IUD) 0.87 -0.23 -0.27 -0.20 0.14 0.06

Relation between metrics

Pearson correlation (on MF baseline recommender)

Aspect-based diversity

(Castells et al. Handbook 2015)

IRGIR Group @ UAM



Outline





IRGIR Group @ UAM



Outline





IRGIR Group @ UAM



Diversity and accuracy viewed as opposing objectives

• Enhancing diversity (or novelty) is expected to involve

some sacrifice in accuracy

• The goal is to achieve an optimal trade-off:

a multiobjective optimization problem

• Results assessed by two metrics:

relevance vs. diversity/novelty

Novelty and diversity enhancement

Diversity

Acc

ura

cy

IRGIR Group @ UAM



Greedy reranking for novelty / diversity

Input data(observations)

Accuracyalgorithm

Initialranking

End user

Diversifiedranking

𝑅 𝑆

. . .

Greedy versionof target metric

𝜙 𝑖 𝑆, 𝑢 = 1 − 𝜆 rel 𝑢, 𝑖 + 𝜆 div 𝑖 𝑆, 𝑢

Initial ranking

(Ziegler et al. WWW 2005, Carbonell SIGIR 1998, Agrawal et al. WSDM 2009, Santos et al. WWW 2010, etc.)

IRGIR Group @ UAM



Greedy reranking for novelty / diversity

For instance…

div 𝑖 𝑆, 𝑢 =

𝑗∈𝑆

𝑑 𝑖, 𝑗

div 𝑖 𝑆, 𝑢 =

𝑗∈𝐮

𝑑 𝑖, 𝑗

div 𝑖 𝑆, 𝑢 = − log2 𝑝𝑖

IL Diversity (ILD)

Unexpectedness

Long-tail novelty (MSI)

𝜙 𝑖 𝑆, 𝑢 = 1 − 𝜆 rel 𝑢, 𝑖 + 𝜆 div 𝑖 𝑆, 𝑢(Ziegler et al. WWW 2005, Carbonell SIGIR 1998, Agrawal et al. WSDM 2009, Santos et al. WWW 2010, etc.)

IRGIR Group @ UAM



Many other specific approaches…

More sophisticated multiobjective optimization

User subprofiles, latent factors

Graph-based, clustering, portfolio theory…

Progressive transition towards the long tail

External vs. internal to algorithm

Novelty and diversity enhancement approaches

(Smyth & McClave ICCBR 2001, Ziegler et al. WWW 2005, Celma & Herrera RecSys 2008, Zhou et al. PNAS 2010, Hurley & Zhang TOIT 2011, Zhang et al. WSDM 2012, Shi et al. SIGIR 2012, Adomavicius & Kwon IEEE TKDE 2012, etc.)

IRGIR Group @ UAM



Novelty and diversity by weighted recommender ensembles

Multi-objective maximization of accuracy & novelty (MSI) &

diversity (ILD) evolutionary algorithm

Find the Pareto frontier on tradeoffs between the 3 metrics

MSI MSI

Acc

ura

cy (

reca

ll)

ILD ILDMovieLens Last.fm

Novelty and diversity enhancement

(Ribeiro-Neto et al. RecSys 2012, Veloso et al. ACM TIST 2014)

IRGIR Group @ UAM



Sales diversity enhancement

Recommend users to items

(Vargas & Castells RecSys 2014)

IRGIR Group @ UAM



Sales diversity enhancement

Recommend users to items

By taking inverse kNN

neighborhoods

(Vargas & Castells RecSys 2014)

IRGIR Group @ UAM



Outline





IRGIR Group @ UAM



Outline





IRGIR Group @ UAM



Think about novelty again

Would you find it useful to recommend these?

Why not?


IRGIR Group @ UAM




We do not always wantthe same amount of novelty

(Kapoor et al. RecSys 2015, Mcalister & Pessemier Cons. Res. 2010, etc.)

IRGIR Group @ UAM



~13K ratings 25 ratings~80K ratings


Degrees of popularity beyond the short head

What is the effect of popularity?

5 ratings~300K ratings

IRGIR Group @ UAM



There is a relation between popularity and accuracy

Popular items(short head)

Rest of items(long tail)

Observed user-item interaction

Unobserved preference

Items

Use

rs

Ratings are missingnot at random (MNAR)

(Marlin et al. RecSys 2010, Steck RecSys 2010, 2011, etc.)

IRGIR Group @ UAM




Test data (relevant items)

Training data


Items

Use

rs

Popular items(short head)

Rest of items(long tail)

avg P@𝑘 ∼𝑘

𝑘

Ratings are missingnot at random (MNAR)

(Marlin et al. RecSys 2010, Steck RecSys 2010, 2011, etc.)

IRGIR Group @ UAM




Random

Nr. positive ratings

User-based kNN

Matrix factorization0.3

0.2

0.1

0

nD

CG

@1

0

MovieLens 1M

(Cremonesi et al. RecSys 2010, etc.)

IRGIR Group @ UAM



Popularity bias in recommendation algorithms

Matrix factorization

# positive ratings

# ti

me

sre

com

men

de

d

in t

op

10

0

400

800

0 1000 2000

Popularity

800

400

00 1000 2000

User-based kNN

# positive ratings

2000

1000

00 1000 2000

(Jannach et al. UMUAI 2015, Cañamares & Castells SIGIR 2017)

IRGIR Group @ UAM



Can we trust our experiments?

Computed on availableuser taste observations

Computed with fullknowledge of user tastes

Observed metric value True metric value

Items

Use

rs

Relevant

Non relevant

Missing ratings

?≈

Items

Use

rs

(Cañamares & Castells SIGIR 2018)

IRGIR Group @ UAM



Get rid of the popularity bias

In the data

Items Items

# ra

tin

gs

Flat test Popularity strata

Time

Temporal split

Test data (relevant items)

Training data


(Bellogín et al. Inf. Ret. 2017)

IRGIR Group @ UAM



Get rid of the popularity bias

In the metrics

Stratified recallOff-policy evaluationInverse propensity scoring···

Divide the relevance of items by the probability to be discovered

Problem: howto estimatepropensity

𝑃 =1

𝑅

𝑖∈𝑅

𝑟𝑒𝑙 𝑖, 𝑢 𝑜𝑏𝑠 𝑖, 𝑢 →1

𝑅

𝑖∈𝑅

𝑟𝑒𝑙 𝑖, 𝑢 𝑜𝑏𝑠 𝑖, 𝑢

𝑝 𝑜𝑏𝑠 𝑖, 𝑢

In the algorithms unbiased learning

In the data: unbiased datasets, e.g. Yahoo! R3, CM100k

(Steck RecSys 2011, Schnabel et al. ICML 2016, Swaminathan et. al NIPS 2017, Yang et al. RecSys 2018, etc.)

IRGIR Group @ UAM



Should we really get rid of the popularity bias?

Items

# in

tera

ctio

ns

𝑎 𝑏

What made 𝑎 be so much

more popular than 𝑏?


IRGIR Group @ UAM



?

Should we really get rid of the popularity bias?

The popularity bias may not be “bad”

– If item discovery and user rating is aligned

with relevance, popularity is a relevance signal

– Rational herd behavior

But it can distort evaluation

– If popularity is generated independently from relevance

– E.g. marketing, conformity, manipulation, randomness

Implications on state of the art algorithms

Can have unfair implications if tied to sensitive features


IRGIR Group @ UAM



The recommendation feedback loop

Recommender systems bias themselves– Self-reinforced (popularity) concentration

– Increasingly poor sales diversity

Biases in offline evaluation with the logged observations

External sources: search, browsing, questionnaires, etc.

Input data(observations)

Recommendationalgorithm

Recommendation

Feedback

Learning (exploration)

Satisfaction(exploitation)

Feedbackloop

(Fleder & Hossanagar Mgt. Sci. 2009, Chaney et al. RecSys 2018, etc.)

IRGIR Group @ UAM



Breaking the feedback loop: multi-armed bandits

Banditpolicy 1. Select

arm2. Get

reward

Estimated(models)

𝜇

𝜇

𝜇

3. Update estimated reward model of arm

True (unob-served)

𝜇

𝜇

𝜇

Reward distributions

Arms

Multi-armed bandit problem:

Choose an arm iteratively and maximize total payoff

without knowing reward distributions in advance

(Sutton & Barto RL book 2018, Chapelle & Li NIPS 2011, etc.)

IRGIR Group @ UAM



Breaking the feedback loop: multi-armed bandits

Banditpolicy

Estimated(models)

3. Update estimated reward model of arm𝜇

𝜇

𝜇

True (unob-served)

𝜇

𝜇

𝜇

Recommendation keeps an ingredient of randomness (exploration) in its actions– Aware (explicit model) of uncertainty in present knowledge about the user

– Gives apparently suboptimal options a chance to be reconsidered

Actions can be items, latent factors, clusters, neighbors, algorithms…

Do much better in the mid/long run!!

Reward distributions

Arms

1. Selectarm

2. Getreward

(Li et al. SIGIR 2016, Lacerda Neurocomputing 2017, McInerney et al. RecSys 2018, etc.)

IRGIR Group @ UAM



Conclusions

Novelty, bias and reinforcement learning are related problems

Novelty & diversity are now state of the art

– Different notions and metrics for different angles

Bias: popular items score high in accuracy in offline experiments

– Progress made in understanding and seeking to avoid

Reinforcement loop bias: multi-armed bandits and

reinforcement learning can greatly help

– And improve sales diversity

IRGIR Group @ UAM



Open directions

Large room for research, matters to industry

Better understand the role of novelty and diversity in user needs

Unbiased evaluation

– How to estimate propensity

– Model complex biases e.g. involving user pairs

– Build unbiased datasets

Multi-armed bandits and reinforcement learning

– How to map the task, algorithmic research

– How to evaluate methods and represent different scenarios

(Nguyen et al. WWW 2014, Kapoor et al. RecSys 2015, Karumur et al. CSCW 2016, etc.)

IRGIR Group @ UAM



Thank you for your attention!

Questions?

IRGIR Group @ UAM



References

Adamopoulos, P., Tuzhilin, A. On Unexpectedness in Recommender Systems: Or How to Expect the Unexpected. ACM TIST 5(4), Special Issue on Novelty and Diversity in Recommender Systems, January 2015.

Adomavicius, G., Kwon, Y. Improving Aggregate Recommendation Diversity Using Ranking-Based Techniques. IEEE Trans. on Knowl. and Data Eng. 24(5), May 2012.

Agrawal, R., Gollapudi, S., Halverson, A., Ieong, S. Diversifying search results. WSDM 2009, Barcelona, Spain, pp. 5-14.

Anderson, C. The Long Tail: Why the Future of Business is Selling Less of More. Hyperion, New York, NY, USA, 2006.

Bellogín, A. , Castells, P. and Cantador, I. Statistical Biases in Information Retrieval Metrics for Recommender Systems. Information Retrieval 20(6), July 2017, 606-634.

Brickman, P., D’Amato, B. Exposure Effects in a Free Choice Situation. Journal of Personality and Social Psychology 32(3), 1975, pp. 415-420.

Cañamares, R. and Castells, P. Should I follow the Crowd? A Probabilistic Analysis of the Effectiveness of Popularity in Recommender Systems. SIGIR 2017, Ann Arbor, MI, USA, pp. 415-424.

Cañamares, R. and Castells, P. A Probabilistic Reformulation of Memory-Based Collaborative Filtering – Implications on Popularity Biases. SIGIR 2017, Ann Arbor, MI, USA, pp. 215-224.

Cañamares, R., Redondo, M., Castells, P. Multi-Armed Recommender System Bandit Ensembles. RecSys 2019, Copenhagen, Denmark, pp. 432-436.

IRGIR Group @ UAM



Carbonell, J. G. and Goldstein, J. The Use of MMR, Diversity-Based Reranking for Reordering Documents and Producing Summaries. SIGIR 1998, Melbourne, Australia, 335-336.

Castells, P, Hurley, N. J., Vargas, S. Novelty and Diversity in Recommender Systems. In: Recommender Systems Handbook, 2nd edition, F. Ricci, L. Rokach, B. Shapira (Eds.). Springer, New York, NY, USA, pp. 881-918.

Castells, P., Wang. J., Lara, R., Zhang. D. Workshop on novelty and diversity in recommender systems – DiveRS 2011. RecSys 2011, Chicago, Illinois, USA, pp. 393-394.

Celma, O. and Herrera, P. A New Approach to Evaluating Novel Recommendations. RecSys 2008, Lausanne, Switzerland, pp. 179-186.

Chaney, A. J. B., Stewart, B. M., Engelhardt, B. E. How algorithmic confounding in recommendation systems increases homogeneity and decreases utility. RecSys 2018, Vancouver, Canada, pp. 224-232.

Chapelle, O., Ji, S., Liao, C., Velipasaoglu, E., Lai, L., Wu, S-L. Intent-based diversification of web search results: metrics and algorithms. Information Retrieval 14(6), December 2011, pp. 572-592.

Chapelle, O. and Li, L. An empirical evaluation of Thompson Sampling. NIPS 2011, Granada, Spain, pp. 2249-2257.

Chen, H. and Karger, D. R. Less is More. 29th Annual International ACM Conference on Research and Development in Information Retrieval (SIGIR 2006). Seattle, WA, USA, pp. 429-436.

Clarke, C. L. A., Kolla, M., Cormack, G. V., Vechtomova, O., Ashkan, A., Büttcher, S., MacKinnon, I. Novelty and diversity in information retrieval evaluation. SIGIR 2008, Singapore, pp. 659-666.

IRGIR Group @ UAM



Clarke, C. L. A., Craswell, N., Soboroff, I, Cormack, G. V. Overview of the TREC 2010 Web Track. TREC 2010, Gaithersburg, MD, USA.

Clarke, C. L. A., Craswell, N., Soboroff, I., Ashkan, A. A Comparative Analysis of Cascade Measures for Novelty and Diversity. WSDM 2011, Hong-Kong, China, pp. 75-84.

Cremonesi, P., Koren, Y. and Turrin, R. Performance of recommender algorithms on top-n recommendation tasks. RecSys 2010, Barcelona, Spain, pp. 39-46.

Fleder, D. M. and Hosanagar, K. Blockbuster Culture’s Next Rise or Fall: The Impact of Recommender Systems on Sales Diversity. Management Science 35(5), May 2009, pp. 697-712.

Ge. M., Delgado-Battenfeld, C., Jannach,D. Beyond accuracy: evaluating recommender systems by coverage and serendipity. RecSys 2010, Barcelona, Spain, pp. 257-260.

Hurley, N., Zhang, M. Novelty and Diversity in Top-N Recommendation – Analysis and Evaluation. ACM TIIT 10(4), March 2011.

Iaquinta, L., de Gemmis, M., Lops, P., Semeraro, G., Filannino, M., Molino, P. Introducing Serendipity in a Content-based Recommender System. HIS 2008, Barcelona, Spain, September 2008.

Jalili, M., Javari, A. Accurate and novel recommendations: An algorithm based on popularity forecasting. ACM TIST 5(4), Special Issue on Novelty and Diversity in Recommender Systems, Jan. 2015.

Jannach, D., Lerche, L., Kamehkhosh, I. Jugovac, M. What recommenders recommend: an analysis of recommendation biases and possible countermeasures. UMUAI 25(5), Dec. 2015, pp. 427-491.

IRGIR Group @ UAM



Kahn, B. E. Consumer variety-seeking among goods and services: An integrative review. Journal of Retailing and Consumer Services 2(3), July 1995, pp.139-148.

Kaminskas, M., Bridge, D. Diversity, Serendipity, Novelty, and Coverage: A Survey and Empirical Analysis of Beyond-Accuracy Objectives in Recommender Systems. ACM TIIS 7(1), March 2017.

Kapoor, K., Kumar, V., Terveen, L. G., Konstan, J. A., Schrater, P. R. “I like to explore sometimes”: Adapting to Dynamic User Novelty Preferences. RecSys 2015, Vienna, Austria, pp. 19-26.

Karumur, R. P., Nguyen, T. T., Konstan, J. A. Early Activity Diversity: Assessing Newcomer Retention from First-Session Activity. CSCW 2016, San Francisco, CA, USA, pp. 594-607.

Lacerda, A. Multi-Objective Ranked Bandits for Recommender Systems. Neurocomputing 246, July 2017, 12-24.

Lathia, N., Hailes, S., Capra, L., Amatriain, X. Temporal Diversity in Recommender Systems. SIGIR 2010, Geneva, Switzerland, 210-217.

Li, S., Karatzoglou, A. and Gentile, C. Collaborative Filtering Bandits. SIGIR 2016, Pisa, pp. 539-548.

Maddi, S. R. The Pursuit of Consistency and Variety. In Abelson, R. P. et al. (Eds.), Theories of Cognitive Consistency: A Sourcebook, Rand McNally, Chicago, 1968, pp. 61-85.

Marlin, B. M., Zemel, R. S. Collaborative prediction and ranking with non-random missing data. RecSys 2009, New York, NY, USA, pp. 5-12.

McAlister, L. Choosing Multiple Items from a Product Class. Journal of Consumer Research 6, December 1979, pp. 213-224.

IRGIR Group @ UAM



McAlister, L., Pessemier, E. A. Variety seeking behavior: an interdisciplinary review. Journal of Consumer Research 9, December 1982.

McNee, S. M., Riedl, J., Konstan, J. A. Being Accurate is Not Enough: How Accuracy Metrics have hurt Recommender Systems. CHI 2006, Montréal, Canada, pp. 1097-1101.

McInerney, J., Lacker, B., Hansen, S., Higley, K., Bouchard, H., Gruson, A. and Mehrotra, R. Explore, exploit, and explain: personalizing explainable recommendations with bandits. RecSys 2018, Vancouver, Canada, pp. 31-39.

Mourão, F., Fonseca, C., Araújo, C., Meira Jr., W. The Oblivion Problem: Exploiting Forgotten Items to Improve Recommendation Diversity. Workshop on Novelty and Diversity in Recommender Systems (DiveRS 2011) at RecSys 2011, Chicago, Illinois, October 2011, pp. 27-34.

Murakami, T., Mori, K., Orihara, R. Metrics for Evaluating the Serendipity of Recommendation Lists. JSAI 2007. Mizayaki, Japan, June 2007. Also in Springer Verlag LNCS Vol. 4914, 2008, pp 40-46.

Nguyen T. T., Hui, P-M., Harper, F. M., Terveen, L. G., Konstan, J. A. Exploring the filter bubble: the effect of using recommender systems on content diversity. WWW 2014, Seoul, Korea, pp. 677-686.

Onuma, K., Tong, H., Faloutsos, C. TANGENT: a novel, ‘Surprise me’, recommendation algorithm. KDD 2009, pp. 657-666.

Park, Y-J., Tuzhilin, A. The long tail of recommender systems and how to leverage it. RecSys 2008, Lausanne, Switzerland, pp. 11-18.

Patil, G. P., Taillie, C. Diversity as a Concept and its Measurement. Journal of the American Statistical Association 77(379), September 1982, pp. 548-561.

IRGIR Group @ UAM



Raju, P. S. Optimum Stimulation Level: Its Relationship to Personality, Demographics and Exploratory Behavior. Journal of Consumer Research 7(3), December 1980, pp. 272-282.

Ribeiro, M. T., Lacerda, A., Veloso, A. and Ziviani, N. Pareto-efficient hybridization for multi-objective recommender systems. RecSys 2012, Dublin, Ireland, September 2012, pp. 19-26.

Salganik, M. J., Dodds, P. S. and Watts, D. J. Experimental Study of Inequality and Unpredictability in an Artificial Cultural Market. Science 311(5762), February 2006, pp. 854-856.

Santos, R. L. T., Macdonald, C., Ounis, I. Exploiting query reformulations for web search result diversification. WWW 2010, Raleigh, NC, USA, April 2010, pp. 881-890.

Santos, R. L. T., Macdonald, C., Ounis, I. Search Result Diversification. Foundations and Trends in Information Retrieval 9(1), 2015.

Sanz-Cruzado, J. Castells, P. Enhancing Structural Diversity in Social Networks by Recommending Weak Ties. RecSys 2018, Vancouver, Canada, pp. 233-241.

Sanz-Cruzado, J., Castells, P., López, E. A Simple Multi-Armed Nearest-Neighbor Bandit for Interactive Recommendation. RecSys 2019, Copenhagen, Denmark, pp. 358-362.

Schnabel, T., Swaminathan, A., Singh, A., Chandak, N. and Joachims, T. Recommendations as Treatments: Debiasing Learning and Evaluation. ICML 2016, New York, NY, USA, pp. 1670-1679.

Shi, Y., Zhao, X., Wang, J., Larson, M., Hanjalic, A. Adaptive diversification of recommendation results via latent factor portfolio. SIGIR 2012, Portland, OR, USA, pp. 175-184.

IRGIR Group @ UAM



Sinha, A., Gleich, D. F. Ramani, K. Deconvolving Feedback Loops in Recommender Systems. NIPS 2016, Barcelona, Spain, December 2016, pp. 3243-3251.

Smyth, B. McClave, P. Similarity vs. diversity. ICCBR 2001. London, UK, pp. 347-361.

Steck, H. Training and Testing of Recommender Systems on Data Missing not at Random. KDD 2010, Washington D. C., USA, pp. 713-722.

Steck, H. Item popularity and recommendation accuracy. RecSys 2011, Chicago, IL, pp. 125-132.

Sutton R. and Barto, A. Reinforcement Learning: An Introduction (2nd ed.). MIT Press, Cambridge, MA, USA, 2018.

Swaminathan, A., Krishnamurthy, A., Agarwal, A., Dudik, M., Langford, J., Jose, D. and Zitouni, I. Off-policy Evaluation for Slate Recommendation. NIPS 2017, Long Beach, CA, USA, pp. 3635-3645.

Vallet, D. and Castells, P. Personalized Diversification of Search Results. SIGIR 2012, Portland, OR, USA, pp. 841-850.

Varadarajan, P. Product Diversity and Firm Performance: An Empirical Investigation. Journal of Marketing 50(3), July 1986, pp. 43-57.

Vargas, S., Castells, P. and Vallet, D. Intent-Oriented Diversity in Recommender Systems. SIGIR 2011, Beijing, China, pp. 1211-1212.

Vargas, S. and Castells, P. Rank and Relevance in Novelty and Diversity Metrics for Recommender Systems. RecSys 2011. Chicago, Illinois, pp. 109-116.

IRGIR Group @ UAM



Vargas, S. and Castells, P. Exploiting the Diversity of User Preferences for Recommendation. OAIR 2013, Lisbon, Portugal, May 2013.

Veloso, A., Ribeiro, M., Lacerda, A., Moura, E., Hata, I. and Ziviani, N. Multi-Objective Pareto-Efficient Approaches for Recommender Systems. ACM TIST 5(4), Special Issue on Novelty and Diversity in Recommender Systems, January 2015.

Yang, L., Cui, Y., Xuan, Y. , Wang, C. , Belongie, S. and Estrin, D. Unbiased Offline Recommender Evaluation for Missing-Not-At-Random Implicit Feedback. RecSys 2018, Vancouver, Canada, pp. 279-287.

Zhang, M. and Hurley, N. Avoiding Monotony: Improving the Diversity of Recommendation Lists. RecSys 2008, Lausanne, Switzerland, 123-130.

Zhang, M., Hurley, N. Novel Item Recommendation by User Profile Partitioning. Web Intelligence 2009, pp. 508-515.

Zhang, Y. C., Ó Séaghdha, D., Quercia, D., Jambor, T. Auralist: introducing serendipity into music recommendation. WSDM 2012, Seattle, WA, USA, pp. 13-22.

Zhou, T., Kuscsik, Z., Liu, J-G., Medo, M., Wakeling, J. R., Zhang, Y-C. Solving the apparent diversity-accuracy dilemma of recommender systems. PNAS 107(10), March 2010, pp. 4511-4515.

Ziegler, C-N., McNee, S. M., Konstan, J. A., Lausen, G. Improving recommendation lists through topic diversification. WWW 2005, Chiba, Japan, pp. 22-32.