29
Contextual Recommendation in Multi-User Devices Raz Nissim, Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee

Contextual Recommendation in Multi-User Devices Raz Nissim, Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee

Embed Size (px)

Citation preview

Contextual Recommendationin Multi-User Devices

Raz Nissim, Michal Aharon, Eshcar Hillel, Amit Kagian, Ronny Lempel, Hayim Makabee

Recommendation in Personal Devices and Accounts

04/20/23 2

Challenge: Recommendations in Shared Accounts and Devices

“I am a 34 yo man who enjoys action and sci-fi movies. This is what my children have done to my netflix account”

04/20/23 3

Our Focus: Recommendations for Smart TVs

04/20/23 4

Main problems: Inferring who has

consumed each item in the past

Who is currently requesting the recommendations

“Who” can be a subset of users

Smart TVs can track what is being watched on them

Solution: Using Context

04/20/23 5

Previous work: time of day

Context in this Work:Current Item Being Watched

04/20/23 6

This Work: Contextual Personalized Recommendations

04/20/23 7

WatchItNext problem: it is 8:30pm and “House of Cards” is

on What should we recommend to be

watched next on this device?

Implicit assumption: there’s a good chance whoever is in front of the set now, will remain there

Technically, think of HMM where the hidden state corresponds to who is watching the set, and states don’t change too often

WatchItNext Inputs and Output

04/20/23 8

Available programs, a.k.a. “line-up”

Ranked recommendations

Recommendation Settings: Exploratory and Habitual

One typically doesn’t buy the same book twice, nor do people typically read the same news story twice

But people listen to the songs they like over and over again, and watch movies they like multiple times as well

04/20/23 9

In the TV setting, people regularly watch series and sports events

Habitual setting: all line-up items are eligible for recommendation to a device

Exploratory setting: only items that were not previously watched on the device are eligible for recommendation

Contextual

Personalized

Popular

04/20/23 10

Contextual Recommendations in a Different Context

How can contextualized and personalized recommendations be served together?

A fundamental principle in recommender systems Taps similarities in patterns of

consumption/enjoyment of items by users Recommends to a user what users with detected

similar tastes have consumed/enjoyed

Collaborative Filtering

04/20/23 11

Consider a consumption matrix R of users and items

ru,i=1 whenever person u consumed item i In other cases, ru,i might be person u’s rating on item i

The matrix R is typically very sparse …and often very large

Collaborative Filtering – Mathematical Abstraction

use

rs

R =

Items

|U| x |I|

• Real-life task: top-k recommendation– predict which yet-to-be-consumed

items the user would most enjoy• Related task on ratings data:

matrix completion– Predict users’ ratings for items they

have yet to rate, i.e. “complete” missing values

04/20/23 12

Latent factor models (LFM): Map both users and items to some f-dimensional space Rf,

i.e. produce f-dimensional vectors vu and wi for each user and item

Define rating estimates as inner products: qui = <vu,wi> Main problem: finding a mapping of users and items to the

latent factor space that produces “good” estimates

Collaborative Filtering – Matrix Factorization

use

rs

R =

Items

|U| x |I| |U| x f f x |I|

VW Closely related to

dimensionality reduction techniques of the ratings matrix R (e.g. Singular Value Decomposition)

04/20/23 13

LFMs Rise to Fame: Netflix Prize

04/20/23 14

Used extensively by Challenge winners “Bellkor’s Pragmatic Chaos”(2006-2009)

Originally devised as a generative model of documents in a corpus, where documents are represented as bags-of-words

L

k is a parameter representing the number of “topics” in the corpus

V is a stochastic matrix: V[d,t] = P(topict|documentd), t=1,…,k U is a stochastic matrix: U[t,w] = P(wordw|topict), t=1,…,k L is a vector holding the documents’ lengths (#words per

document)

Latent Dirichlet Allocation (LDA)[Blei, Ng, Jordan 2003]

04/20/23 15

Word1

Word2

Document1

#1,1 #1,2 #1,…

Document2

#2,1 #2,2 #2,…

… #...,1 #...,2 …

|D| x k k x |W|

VU

L

In our case: given a parameter k and the collection of devices (=documents) and their viewing history (=bags of shows), output:

k “profiles”, where each profile is a distribution over items Associate each device to a distribution over the profiles

Profiles, hopefully, will represent viewing preferences such as: “Kids shows” “Cooking reality and home improvement” “News and Late Night” “History and Science” “Redneck reality: fishing & hunting shows, MMA”

A-priori probability of an item being watched on a device:

Score(item|device) = profile=1,…,k P(item|profile) x P(profile|device)

Latent Dirichlet Allocation (cont.)

04/20/23 16

Contextualizing Recommendations: Three Main Approaches

1. Contextual pre-filtering: use context to restrict the data to be modeled

2. Contextual post-filtering: use context to filter or weight the recommendations produced by conventional models

3. Contextual modeling: context information is incorporated in the model itself Typically requires denser data due to many more

parameters Computationally intensive E.g. Tensor Factorization,

Karatzoglou et al., 2010

04/20/23 17

Main Contribution:“3-Way” Technique

Learn a standard matrix factorization model (LFM/LDA) When recommending to a device d currently watching

context item c, score each target item t as follows:

S(t follows c|d) = j=1..k vd(j)*wc(j)*wt(j) With LFM, requires an additive shift to all vectors to get rid

of negative values Results in “Sequential LFM/LDA” – a personalized

contextual recommender Score is high for targets that agree with both context and

device Again – no need to model context or change learning

algorithm; learn as usual, just apply change when scoring

04/20/23 18

Data: Historical Viewing Logs Triplets of the form (devide ID, program ID,

timestamp) Don’t know who watched the device at that time Actually, don’t know whether anyone watched

Is anyone watching?

Time

04/20/23 19

Data by the Numbers Training data: three months’ worth of viewership

data

Test Data: derived from one month of viewership data

04/20/23 20

* Items are {movie, sports event, series} – not at the individual episode level

Devices Unique items* Triplets

339647 17232 More than 19M

Setting Test Instances Average Line-up Size

Habitual ~3.8M 390

Exploratory

~1.7M 349

Metric: Avg. Rank Percentile (ARP)

Note: with large line-ups, ARP is practically equivalent to average AUC

04/20/23 21

RP = 0.75

?next(RP = 0.25)

(RP = 0.50)

(RP = 1.0)Rank Percentile

properties: Ranges in (0,1] Higher is better Random scores ~0.5

in large lineups

Baselines

04/20/23 22

Name Personalized?

Contextual?

General popularity No No

Sequential popularity No Yes

Temporal popularity No Yes

Device popularity* Yes No

LFM Yes No

LDA Yes No

* Only applicable to habitual recommendations

Contextual Personalized Recommenders

04/20/23 23

SequentialLDA [LFM]: 3-way element-wise multiplication of device vector, context item and target item

TemporalLDA[LFM]: regular LDA/LFM score, multiplied by Temporal Popularity

TempSeqLDA[LFM]: 3-way score multiplied by Temporal Popularity

All LDA/LFM models are 80-dimensional

Results (1)Sequential Context Matters

Degradation when using a random item as context indicates that the correct context item reflects the current viewing session, and implicitly the current watchers of the device04/20/23 24

Results (2)Sequential Context Matters

Device Entropy: the entropy of p(topic | device) as computed by LDAon the training data; high values correspond to diverse distributions

04/20/23 25

Results (3) - Exploratory Setting

04/20/23 26

Results (4) - Habitual Setting

04/20/23 27

AR

P

Conclusions Multi-user or shared devices pose challenging

recommendation problems TV recommendations characterized by two use cases –

habitual and exploratory Sequential context helps – it “narrows" the topical

variety of the program to be watched next on the device

Intuitively, context serves to implicitly disambiguate the current user or users of the device

3-Way technique is an effective way of incorporating sequential context that has no impact on learning

Future: explore applications of Hidden Topic Markov Models [Gruber, Rosen-Zvi, Weiss 2007]

04/20/23 28

Thank You – Questions?

04/20/23 29

rlempel [at] yahoo-inc [dot] com