1
0 0.01 0.02 0 1 2 3 4 5 Cumulative recall t x 100000 FourSquare New York 0 0.2 0.4 0.6 0 1 2 3 4 5 t Millions Twitter 200 tweets 0 0.01 0.02 0.03 0.04 0.05 0 2 4 6 8 10 t x 100000 FourSquare Tokyo 0 0.2 0.4 0.6 0.8 0 1 2 3 4 5 t Millions Twitter 1 month 0 0.2 0.4 0.6 0 1 2 3 t Millions MovieLens 1M 1 k = 1 k = 10 k = 100 Most popular Random Neighbor bandit Bandit Recommender Systems Use of bandit algorithms to generate recommendations Usually arms = items Personalized approaches: contextual bandits o Stochastic versions of Probabilistic Matrix Factorization (PMF) o Clusters of users/items Our approach: user-based kNN with stochastic (bandit-based) neighborhood selection A Simple Multi - Armed Nearest - Neighbor Bandit for Interactive Recommendation Javier Sanz-Cruzado, Pablo Castells and Esther López Universidad Autónoma de Madrid {javier.sanz-cruzado,pablo.castells}@uam.es [email protected] IR G IR Group @UAM Nearest-neighbor bandit Neighbor bandit Given a user , choose the best neighbor Conditional preference: Instead of similarity, probability that likes an item that likes Thompson sampling bandit o Rewards considered as a Bernoulli distribution with mean o estimated from data using posterior Beta distributions, , o Hits α : Positive common ratings between u and v: = 0 + ∈ℐ , , o Misses : ’s positive ratings not shared by u. = 0 + ∈ℐ , o Select maximizing Recommend the item that likes most (ties randomly broken) Update for all users who have rated the recommended item Multi-armed bandits Dataset #Users #Items #Ratings FourSquare New York 1,083 38,333 227,428 FourSquare Tokyo 2,293 61,858 573,703 Twitter 1 month 9,511 9,511 650,937 Twitter 200 tweets 9,253 9,253 475,608 MovieLens 1M 6,040 3,706 1,000,209 Interacts User System Recommends items 1 2 3 4 1 2 3 5 4 , 2 ≠? 3. Neighbor selects item to recommend ? 1 ? ? 0 1 0 1 0 1 0 1 1 0 0 1 1. Choose target user 0 0 1 0 5. Update s conditional preferences with = , ≠? Recommendation 4. Obtain , 0,1 Update 6. Add , to rating matrix Current state (Binary rating matrix) Items Users 1 2 3 4 1 2 3 5 ℛ: × ℐ → 0,1, ? , 5 ≠? Evaluation approach Offline evaluation: simulate user feedback using offline data Extreme cold start: start with no ratings Random user selection (one at a time) Metric: cumulative recall – fraction of discovered positive ratings (growth curve) Algorithms Bandits Exploitation-only Data Implicit feedback datasets o 2 Twitter follows graphs (1 month, 200 tweets) o 2 FourSquare venue recommendation datasets (New York, Tokyo) Explicit ratings dataset: MovieLens 1M o We take ratings ≥ 4 as positive ( , =1) Extension: neighbors Given a target user , select the best neighbors Multiple play Thompson sampling bandit Pick : the users maximizing Recommend the item maximizing: Again, update for all users who have rated Penalizes exploration in favor of exploitation Experiments Results Motivation $ 7 7 7 7 7 7 Arms Reward Reward Bandit Which arm maximizes the gain? Select the best among several actions (arms) Exploration vs. exploration o Select arm with highest estimated value (exploit) o Select arm to gain knowledge (explore) Algorithms: ε-greedy, UCB1, Thompson Sampling, etc. $ Neighbors Ratings , Cond. preference (|) Items ∈ℐ Target user Ratings , Metric (e.g. CTR) Target user Arms Rewards Estimated arm value Context Our approach Recommender bandits Multi-armed bandits 13th ACM Conference on Recommender Systems Code for this paper is available at https://github.com/ir-uam/kNNBandit Full text Code Our approach: nearest-neighbors bandit Item-oriented: ε-greedy, Thompson sampling Collaborative filtering: matrix factorization, user-based kNN (cosine) Sanity-check: most-popular, random Interactive recommendation More realistic perspective Recommendation operates in cycles o Users react to and interact with recommendations o The system gains further input from this interaction o The input is used in the following recommendations arg max ∈ℐ 0.000 0100,000 200,000 300,000 400,000 500,000 Ac c. Re cal l Num. Iter. Neighbor bandit Neighbor bandit k = 10 User-based kNN Th. sampling ε-greedy Implicit MF Most popular Random

FourSquare New York FourSquare Tokyo Twitter 1 month ...ir.ii.uam.es/pubs/recsys2019-jsanzcruzado-poster.pdf · Javier Sanz-Cruzado, Pablo Castells and Esther López Universidad Autónoma

  • Upload
    others

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

Page 1: FourSquare New York FourSquare Tokyo Twitter 1 month ...ir.ii.uam.es/pubs/recsys2019-jsanzcruzado-poster.pdf · Javier Sanz-Cruzado, Pablo Castells and Esther López Universidad Autónoma

0

0.01

0.02

0 1 2 3 4 5

Cum

ulat

ive

reca

ll

t x 100000

FourSquare New York

0

0.2

0.4

0.6

0 1 2 3 4 5

t Millions

Twitter 200 tweets

0

0.01

0.02

0.03

0.04

0.05

0 2 4 6 8 10

t x 100000

FourSquare Tokyo

0

0.2

0.4

0.6

0.8

0 1 2 3 4 5

t Millions

Twitter 1 month

0

0.2

0.4

0.6

0 1 2 3

t Millions

MovieLens 1M

00.20.40.60.8

1

012345Div

ersi

ty

t

k = 1 k = 10 k = 100 Most popular Random

Neighbor bandit

Bandit Recommender Systems

Use of bandit algorithms to generate recommendations

Usually arms = items

Personalized approaches: contextual bandits

o Stochastic versions of Probabilistic Matrix Factorization (PMF)

o Clusters of users/items

Our approach: user-based kNN with stochastic (bandit-based) neighborhood selection

A Simple Multi-Armed Nearest-Neighbor Bandit for Interactive RecommendationJavier Sanz-Cruzado, Pablo Castells and Esther López

Universidad Autónoma de Madrid{javier.sanz-cruzado,pablo.castells}@uam.es

[email protected]

IRGIR Group @ UAM

Nearest-neighbor bandit

Neighbor bandit Given a user 𝑢, choose the best neighbor 𝑣

Conditional preference: Instead of similarity, 𝑃 𝑢 𝑣 probability that 𝑢 likes an item that 𝑣 likes

Thompson sampling bandit

o Rewards considered as a Bernoulli distribution with mean 𝑃 𝑢 𝑣

o 𝑃 𝑢 𝑣 estimated from data using posterior Beta distributions, 𝑝 𝑢 𝑣 ∼ 𝛣 𝛼 𝑣 𝑢 , 𝛽 𝑣 𝑢

o Hits α 𝒗 𝒖 : Positive common ratings between u and v: 𝛼 𝑣 𝑢 = 𝛼0 + 𝑖∈ℐ 𝑟 𝑢, 𝑖 𝑟 𝑣, 𝑖

o Misses 𝜷 𝒗 𝒖 : 𝑣’s positive ratings not shared by u. 𝛽 𝑣 𝑢 = 𝛽0 + 𝑖∈ℐ 𝑟 𝑣, 𝑖 − 𝛼 𝑣 𝑢

o Select 𝑣 maximizing 𝑝 𝑢 𝑣

Recommend the item 𝑖 that 𝑣 likes most (ties randomly broken)

Update 𝛼 𝑤 𝑢 for all users 𝑤 ∈ 𝒰 who have rated the recommended item 𝑖

Multi-armed bandits

Dataset #Users #Items #Ratings

FourSquare New York 1,083 38,333 227,428

FourSquare Tokyo 2,293 61,858 573,703

Twitter 1 month 9,511 9,511 650,937

Twitter 200 tweets 9,253 9,253 475,608

MovieLens 1M 6,040 3,706 1,000,209

Interacts

User System

Recommends items

𝑣1

𝑣2

𝑣3

𝑣4

𝑖1

𝑖2

𝑖3

𝑖5

𝑖4

𝑟 𝑢, 𝑖2 ≠ ?

3. Neighbor 𝑣 selects item 𝑖to recommend

? 1 ? ? 0

1 0

1 0 1

0 1 1

0 0 1

1. Choose targetuser 𝑢

0

0

1

0

5. Update 𝑢′s conditional preferences with𝑣 ∈ 𝒰𝑖 = 𝑤 ∈ 𝒰 𝑟 𝑤, 𝑖 ≠ ?

Recommendation

4. Obtain 𝑟 𝑢, 𝑖 ∈ 0,1

Update

𝒊

6. Add 𝑟 𝑢, 𝑖 to rating matrix

Current state(Binary rating matrix)

Items

Use

rs

𝒖

𝑣1

𝑣2

𝑣3

𝑣4

𝑖1

𝑖2

𝑖3

𝑖5

𝑖

ℛ:𝒰 × ℐ → 0,1, ? 𝑟 𝑢, 𝑖5 ≠ ?

Evaluation approach Offline evaluation: simulate user feedback using offline data

Extreme cold start: start with no ratings

Random user selection (one at a time)

Metric: cumulative recall – fraction of discovered positive ratings (growth curve)

Algorithms

Bandits

Exploitation-only

Data Implicit feedback datasets

o 2 Twitter follows graphs (1 month, 200 tweets)

o 2 FourSquare venue recommendation datasets (New York, Tokyo)

Explicit ratings dataset: MovieLens 1M

o We take ratings ≥ 4 as positive (𝑟 𝑢, 𝑖 = 1)

Extension: 𝒌 neighbors

Given a target user 𝑢, select the best 𝑘 neighbors

Multiple play Thompson sampling bandit

Pick 𝒩𝑘 𝑢 : the 𝑘 users maximizing 𝑝 𝑢 𝑣

Recommend the item maximizing:

Again, update 𝛼 𝑤 𝑢 for all users 𝑤 ∈ 𝒰 who have rated 𝑖

Penalizes exploration in favor of exploitation

Experiments

Results

Motivation

$

7 7 7 7 7 7

Arms

Reward Reward

Bandit

Which arm maximizes the gain?

Select the best among several actions (arms)

Exploration vs. exploration

o Select arm with highest estimated value (exploit)

o Select arm to gain knowledge (explore)

Algorithms: ε-greedy, UCB1, Thompson Sampling, etc.$

Neighbors 𝑣 ∈ 𝒰

Ratings 𝑟 𝑢, 𝑖

Cond. preference 𝑃(𝑢|𝑣)

Items 𝑖 ∈ ℐ

Target user 𝑢 ∈ 𝒰

Ratings 𝑟 𝑢, 𝑖

Metric (e.g. CTR)

Target user 𝑢 ∈ 𝒰

Arms

Rewards

Estimated arm value

Context

Our approachRecommender bandits Multi-armed bandits

13th ACM Conference on Recommender SystemsCode for this paper is available at https://github.com/ir-uam/kNNBandit

Full text Code

Our approach: nearest-neighbors bandit

Item-oriented: ε-greedy, Thompson sampling

Collaborative filtering: matrix factorization, user-based kNN (cosine)

Sanity-check: most-popular, random

Interactive recommendation

More realistic perspective

Recommendation operates in cycles

o Users react to and interact with recommendations

o The system gains further input from this interaction

o The input is used in the following recommendations

argmax𝑖∈ℐ

𝑣∈𝒩𝑘 𝑢

𝑝 𝑢 𝑣 𝑟𝑣 𝑖

0.0000100,000200,000300,000400,000500,000A

c c.

Re

cal l

Num. Iter. Neighbor bandit Neighbor bandit k = 10 User-based kNN Th. sampling ε-greedy Implicit MF Most popular Random