Optimal Radio Channel Recommendations with Explicit and Implicit Feedback

Optimal Radio Channel Recommendations with Explicit and Implicit Feedback

Omar Moling Free University of Bozen-Bolzano

Dublin, RecSys 2012, 11 September

Linas Baltrunas Telefonica Research

Francesco Ricci Free University of Bozen-Bolzano

Issue #1 • Usually RSs are running on the "server-side” 1

• We need more client-side RSs: allowing the user to (dynamically) choose the content providers to take items from 2

• For example: music can be streamed from several – alternative - internet radio channels

1 G. Adomavicius and A. Tuzhilin. An Architecture of e-butler: A consumer-centric online personalization system. International Journal of Computational Intelligence and Applications, 2(3):313-327, 2002. 2 F. J. Martin, J. Donaldson, A. Ashenfelter, M. Torrens, and R. Hangartner. The big promise of recommender systems. AI Magazine, 32(3):19-27, 2011.

Optimal Radio Channel Recommendations with Explicit and Implicit Feedback - RecSys12 - Omar Moling, Linas Baltrunas, Francesco Ricci

Issue #2 • Sequential recommendations 1

•  In some domains, items are consumed in a sequence: music, books, games, travels

• Recommendations should take it into account

• Ex: Music preferences usually change during a listening session and are influenced by the music listened so far

•  "I do love Stravinsky but after one hour of that music I need something different …"

1 G. Shani, D. Heckerman, and R. I. Brafman. An mdp-based recommender system. Journal of Machine Learning Research, 6:1265-1295, 2005.


Issue #3 • Explicit preferences are typically used in RSs (ratings) 1

• There is a trend in using “implicit” feedback: i.e. user actions that are interpreted by the system as preferences

• Example 1: total listening time for an artist

• Example 2: in comparison-based approaches items selected are considered as better than those only viewed

1 D. Oard and J. Kim. Implicit feedback for recommender systems. In Proceedings of the AAAI Workshop on Recommender Systems, pages 81-83, 1998.


Application scenario •  I drive my car listening to a radio channel

• Then, it starts to rain heavily and I slow down, I will be late

• My mood changes, my “situational” music preferences may change too

•  I could switch to another radio channel

• Or get irritated because I do not like anymore that music

• A true intelligent system should do that for me, detecting a situation change, e.g., recognizing different listening patterns, and proposing the right music for the current situation


Overview • RLradio

• Experimental Study

• Results


RLradio - Music Preferences


RLradio - Music Player


Baseline System • RLradio P, probabilistic

• Baseline system which chooses radio channels based on the explicit music preferences entered by the user

0

10

20

30

40

50

60

Pop Rock Jazz

Preference percentage

on avg.:

50% Pop 30% Rock 20% Jazz

Example:


Research Hypothesis •  Is it possible to improve the performance of the baseline

system by exploiting the knowledge acquired from the click of the Next button?

• Performance is measured as the average percentage of the track length which is actually listened to


Listening sessions • We have observed users entering preference value for

several channels - hence, switching channels makes sense

Frequency of sessions with a given number of channels with non-null preference

> 500 listening sessions Optimal Radio Channel Recommendations with Explicit and Implicit Feedback - RecSys12 - Omar Moling, Linas Baltrunas, Francesco Ricci

Reinforcement Learning

One among the 9

available channels

Percentage of the track

actually listened to

(0, 1, 2)

Recommender System

ex: Pop > Rock

User + Player

History and user’s music

preferences


State Model • s1-s2: The channels recommended

and listened in the previous two listening steps

• s3-s4: How much the user listened to these tracks - discretized in 3 levels (0-15%, 15-60%, 60-100%)

• s5-s13: The user preference for each channel - discretized in 4 levels (<15%, 15-40%, 40-60%, >60%)

prev. channel:

Pop

2-last channel:

Rock

p < 15% 15% < p p < 60%

Rock > 60% 15% < Pop < 40%

Example of a state:




• Results


Experimental Study • First, a group of users tested the baseline system (RLradio

P), which uses only explicit feedback

• We collected data on the user listening behavior from which

• we obtained state-transition probabilities

• we computed the optimal policy with Policy Iteration algorithm

• Users have then used the system (RLradio RL) - using the Optimal Policy updated at run time with R-learning


Optimal policy • The optimal policy choses, for each state, the actions that

are jointly maximizing the expected cumulative reward - obtained in a full interaction session

policy in state s

transition probability from state s and action

a to state s’

expected reward when choosing action a in state

s and landing in s‘

state value of state s’

index of the action with the highest value


R-Learning • Starting from the optimal policy, the system that we

developed was updating the channel selection policy using R-Learning

• R-Learning fits continuous tasks (music listening, server getting new tasks etc.)

state-action value of state s and

action a


Experimental Study Summary 1.  First a group of users tested RLradio P

2.  Then the same group tested RLradio RL

3.  To overcome ordering effects, a second, distinct group of users tested the systems in the opposite order

4.  Users were asked to take a short questionnaire after testing each of the systems

5.  70 users




• Results


Avg. Track Listening Time % • Total implicit feedback items: > 7800

Improvement: 4.76 %,

statistically significant p = 0.028

RLradio P (baseline)

RLradio RL

64.35

67.41


Avg. Daily Listening Time • RLradio P: 62.6 minutes

• RLradio RL: 75.5 minutes

•  Improvement of 20% with p = 0.043


Percentage of users • 63% of users had a higher listening percentage


Online Learning (R-Learning) • 1606 states were visited collectively

• 29.8% of initial states changed policy

• This indicates that RLradio RL had a different channel selection policy

• Confirmed by the analysis of the log files, where several sequence patterns could be recognized

•  Example: Assigning high preferences to Rock and Pop channels leads to a policy which stays on one channel until the listening percentage is high, to then switch to the other


Conclusions • Novel RS autonomously switching radio channel in a

collection of radio channels

• RLradio works client-side, offers items from several content providers

• Exploits and combines explicit preferences and implicit feedback, using Reinforcement Learning

• Research Hypothesis holds

•  Increase in the average listening time percentage of the proposed music tracks – compared with a system exploiting only the explicitly entered music preferences


Thank you for you attention. Any questions?


Technology

Optimal Radio Channel Recommendations with Explicit and Implicit Feedback