36
Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks 7th Italian Information Retrieval Workshop Venezia (Italy), May 30-31 2016 Cataldo Musto, Claudio Greco, Alessandro Suglia and Giovanni Semeraro Work supported by the IBM Faculty Award ”Deep Learning to boost Cognitive Question Answering” Titan X GPU used for this research donated by the NVIDIA Corporation 1

Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Embed Size (px)

Citation preview

Page 1: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Ask Me Any Rating: A Content-basedRecommender System based onRecurrent Neural Networks7th Italian Information Retrieval WorkshopVenezia (Italy), May 30-31 2016

Cataldo Musto, Claudio Greco, Alessandro Suglia and Giovanni Semeraro

Work supported by the IBM Faculty Award ”Deep Learning to boost Cognitive Question Answering”Titan X GPU used for this research donated by the NVIDIA Corporation

1

Page 2: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Overview

1. Background

Content-based recommender systems

Neural network models

2. Research work

Ask me Any Rating (AMAR)

Experimental evaluation

3. Conclusions

Lesson-learnt

Vision

2

Page 3: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Background

Page 4: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Content-based recommender systems

Consists in matching up the attributes of a user profile withthe attributes of a content object (item) [1]

[1] P. Lops, M. De Gemmis, and G. Semeraro. “Content-based recommender systems:State of the art and trends”. In: Recommender systems handbook. Springer, 2011 3

Page 5: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Deep learning

Definition

Allows computational models that are composed ofmultiple processing layers to learn representations of data

with multiple levels of abstraction [2]

• Discovers intricate structure in large data sets by using thebackpropagation algorithm [3];

• Leads to progressively more abstract features at higher layers ofrepresentations;

• More abstract concepts are generally invariant to most localchanges of the input;

[2] Y. LeCun, Y. Bengio, and G. Hinton. “Deep learning”. In: Nature 521 (2015)[3] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. “Learning representations by

back-propagating errors”. In: Cognitive modeling (1988)

4

Page 6: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Recurrent Neural Networks

• Recurrent Neural Networks (RNN) are architectures suitable tomodel variable-length sequential data [4];

• The connections between their units may contain loops whichlet them consider past states in the learning process;

• Their roots are in the Dynamical System Theory in which thefollowing relation is true:

s(t) = f(s(t−1); x(t); θ)

where s(t) represents the current system state computed by ageneric function f evaluated on the previous state s(t−1), x(t)represents the current input and θ are the network parameters.

[4] D. E. Rumelhart, G. E. Hinton, and R. J. Williams. Learning internal representationsby error propagation. Tech. rep. DTIC Document, 1985

5

Page 7: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

RNN pros and cons

Pros

• Appropriate to represent sequential data;• A versatile framework which can be applied to different tasks;• Can learn short-term and long-term temporal dependencies.

Cons

• Vanishing/exploding gradient problem [5];• Difficulties to reach satisfying minima during the optimization of

the loss function;• Difficult to parallelize the training process.

[5] Y. Bengio, P. Simard, and P. Frasconi. “Learning long-term dependencies withgradient descent is difficult”. In: Neural Networks, IEEE Transactions on 5 (1994)

6

Page 8: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Long Short Term Memory (LSTM)

• A specific RNN introduced to solve the vanishing/explodinggradient problem;

• Each cell presents a complex structure which is more powerfulthan simple RNN cells.

Figure: LSTM architecture [6]

forget gate (f) considers thecurrent input and the previousstate to remove or preserve the

most appropriate information forthe given task

[6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrentneural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013

7

Page 9: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Long Short Term Memory (LSTM)

• A specific RNN introduced to solve the vanishing/explodinggradient problem;

• Each cell presents a complex structure which is more powerfulthan simple RNN cells.

Figure: LSTM architecture [6]

input gate (i) considers thecurrent input and the previous

state to determine how the inputinformation will be used to

update the state cell

[6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrentneural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013

7

Page 10: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Long Short Term Memory (LSTM)

• A specific RNN introduced to solve the vanishing/explodinggradient problem;

• Each cell presents a complex structure which is more powerfulthan simple RNN cells.

Figure: LSTM architecture [6]

output gate (o) considers thecurrent input, the previous state

and the updated state cell togenerate an appropriate output

for the given task

[6] A. Graves, A. Mohamed, and G. Hinton. “Speech recognition with deep recurrentneural networks”. In: Acoustics, Speech and Signal Processing (ICASSP), IEEE 2013

7

Page 11: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Research work

Page 12: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Ask Me Any Rating (AMAR)

“Mirror, mirror, here I stand.What is the fairest movie in the

land?”

• Inspired by a neural network modelused to solve Question Answeringtoy tasks [7];

• Name adapted from “Ask MeAnything” [8];

• Very simple Factoid QuestionAnswering system where userprofiles are questions and ratingsare answers.

[7] J. Weston et al. “Towards AI-Complete Question Answering: A Set of PrerequisiteToy Tasks”. In: CoRR abs/1502.05698 (2015)

[8] A. Kumar et al. “Ask Me Anything: Dynamic Memory Networks for NaturalLanguage Processing”. In: CoRR abs/1506.07285 (2015)

8

Page 13: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Ask Me Any Rating (AMAR)

• Two different modules togenerate:

• User embedding• Item embedding

• User embedding associatedto a user identifier;

• Item embedding generatedfrom an item description;

• Concatenation of user anditem embeddings given to alogistic regression layer topredict the probability of a“like”.

b b bw1 w2 wm

User u Item description id

User LT Word LT

v(u)

v(id)

LSTM LSTM LSTM

v(w1) v(w2) v(wm)

h(w1) h(w2) h(wm)

Mean pooling layer

Concatenation layer

Logistic regression layer

9

Page 14: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Ask Me Any Rating (AMAR)

User embedding

• An identifier u is associated to each user;• The identifier is given as input to a lookup table (User LT);• User LT converts it to a learnt user embedding v(u).

Item embedding

• Each word w1 . . .wm of the item description id is associated to aunique identifier specific of the item descriptions corpus;

• Words identifiers are given as input to a lookup table (Item LT);• Item LT converts them to learnt words embeddings v(wk);• Words embeddings v(wk) are sequentially passed through anRNN with LSTM cells (LSTM module);

• The LSTM module generates a latent representation h(wk) foreach word;

• A mean pooling layer averages the words representationsgenerating an item embedding v(id) for the item i.

10

Page 15: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Ask Me Any Rating (AMAR)

“Like” probability estimation

• Item and user embeddings, v(id) and v(u), are concatenated in asingle representation;

• The resulting representation is used as feature for theprediction task;

• A Logistic regression layer is used to estimate the probability ofa “like” given by user u to a specific item i;

• The generated score is used to build a sorted list ofrecommended items for user u.

Optimization criterion

• The neural network is trained minimizing the BinaryCross-entropy loss function.

11

Page 16: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

AMAR extended

• AMAR extended adds to the AMARarchitecture an additional modulefor items genres;

• An identifier gk is associated to eachitem genre;

• Genres identifier are given as inputto a lookup table (Genre LT);

• Genres LT converts them to learntgenres embeddings v(gk);

• A mean pooling layer averages thegenres representations generating agenres embedding v(ig).

g1 g2 gn

Item genres igj

Genre LT

v(u) v(id)

Mean pooling layer

Concatenation layer

Logistic regression layer

b b bv(g1) v(g2) v(gn)

v(ig)

Item description idUser u

b b b

12

Page 17: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Experimental protocol

• Datasets: Movielens 1M (ML1M) e DBbook;• Text preprocessing: tokenization and stopword removal;• Evaluation strategy: 5-fold cross validation for Movielens 1M,holdout for DBbook;

• Recommendation task: top-N recommendation leveragingbinary user feedback;

• Evaluation strategy for recommendation: TestRatings [9];• Metric: F1-measure evaluated at 5, 10 and 15.

[9] A. Bellogin, P. Castells, and I. Cantador. “Precision-oriented evaluation ofrecommender systems: an algorithmic comparison”. In: Proceedings of the fifth ACMconference on Recommender systems. 2011

13

Page 18: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

ML1M

A film dataset created by the research group GroupLens of theUniversity of Minnesota which contains user ratings on a 5-starsscale.

Each rating has been binarized according to the following formula:

bin_rating(r) ={

1, if r ≥ 40, otherwise

#ratings 1000209#users 6040#item 3301avg ratings per user 31.423avg positive ratings per user 17.985avg negative ratings per user 13.439sparsity 0.95

14

Page 19: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

DBbook

A book dataset released for the Linked open data-enabledrecommender systems: ESWC 2014 challenge [10].

It contains binary user preferences (e.g., I like it, I don’t like it).

#ratings 72371#users 6181#item 8170avg ratings per user 11.392avg positive ratings per user 6.727avg negative ratings per user 4.665sparsity 0.998

[10] T. Di Noia, I. Cantador, and V. C. Ostuni. “Linked open data-enabledrecommender systems: ESWC 2014 challenge on book recommendation”.In: Semantic Web Evaluation Challenge. Springer, 2014

15

Page 20: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Models configurations

Embedding-based recommenders

W2V Google News (W2V-news)• Method: SG• Embedding size: 300• Corpus: Google News

GloVe• Embedding size: 300• Corpus: Wikipedia 2014 +

Gigaword 5

Baseline recommenders

Item to item CF (I2I) *• Neighbours: 30, 50, 80

User to user CF (U2U) *• Neighbours: 30, 50, 80

SLIM with BPR-Opt (BPRSlim) *TF-IDF

Bayesian Personalized RankingMatrix Factorization (BPRMF) *

• Latent factors: 10, 30, 50Weighted Matrix FactorizationMethod (WRMF) *

• Latent factors: 10, 30, 50

* MyMediaLite implementations 16

Page 21: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Models configurations

AMAR

• Opt. method: RMSprop [11]• α: 0.9• Learning rate: 0.001

• Epochs: 25;• User embedding size: 10;• Item embedding size: 10;• LSTM output size: 10;• Batch size:

• ML1M: 1536• DBbook: 512

AMAR extended

• Opt. method: RMSprop• α: 0.9• Learning rate: 0.001

• Epochs: 25;• User embedding size: 10;• Item embedding size: 10;• Genre embedding size: 10;• LSTM output size: 10;• Batch size:

• ML1M: 1536• DBbook: 512

[11] T. Tieleman and G. E. Hinton. “rmsprop”. In: COURSERA: Neural Networks forMachine Learning Lecture 6.5 (2012) 17

Page 22: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

DBbook results

0.662 0.662

0.6550.656

0.640.639

0.631

0.636

0.632

0.662

0.62

0.62

0.63

0.63

0.64

0.64

0.65

0.65

0.66

0.66

0.67

AMAR AMARextended

GloVe W2V-News I2I-30 U2U-30 BPRMF-30 WRMF-50 BPRSlim TF-IDF

F1@

10

RECOMMENDER CONFIGURATIONS

Differences statistically significant according to Wilcoxon test (ρ ≤ 0.05)

18

Page 23: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

ML1M results

0.641 0.644

0.575

0.587

0.527 0.525 0.524 0.525

0.548

0.59

0.40

0.45

0.50

0.55

0.60

0.65

AMAR AMAR extended GloVe W2V-News I2I-30 U2U-30 BPRMF-30 WRMF-50 BPRSlim TF-IDF

F1@

10

RECOMMENDER CONFIGURATIONS

Only differences between U2U and GloVe, BPRSlim and GloVe, GloVe and Word2vecare not statistically significant according to Wilcoxon test (ρ ≤ 0.05)

19

Page 24: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Conclusions

Page 25: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

AMAR pros and cons

Pros

• High improvement on ML1M;• Able to learn more suitable item and user representations for

the recommendation task;• Item and user embeddings are not generated using a simple

mean, but they are adapted during training.

Cons

• It does not deal well with very sparse datasets:• Small improvement on DBbook

• High training times:• DBbook: 50 minutes per epoch• ML1M: 90 minutes per epoch

20

Page 26: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

AMAR Improvements

Optimization

• Use alternative training methods and regularization techniques;• Use pretrained word embeddings;• More appropriate cost functions for top-N recommendation;• Increase embedding dimensions.

Architecture

• Item modeling may be improved by using different neuralnetwork architectures;

• Classification step may be done by using deeper fully connectedlayers.

Additional features

Leverage important data silos to enrich item representations:

• Linked Open Data;• Web and social media. 21

Page 27: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Thanks for your attention

• Design of recommender systemsusing deep neural networks;

• Experimental evaluation onwell-known datasets on thetop-N recommendation task;

• Higher performance using deepmodels than using shallowmodels.

Alessandro [email protected]

Claudio [email protected]

22

Page 28: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Technical details(Warning: for geeks only)

Page 29: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Cross entropy

DefinitionGiven two probability distributions over the same underlying set ofevents, p and q, it measures the average number of bits needed toidentify an event drawn from a set of possibilities, if a codingscheme is used based on an “unnatural” probability distribution q,rather than the “true” distribution p.

Given discrete probability distributions p and q, the cross entropy isdefined as follows:

H(p,q) = −∑xp(x) logq(x)

23

Page 30: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

RNN

Given an input vector x(t), bias vectors b, c and weight matrices U, Vand W, a forward step of an RNN neural network is computed in thisway:

at = b+Wst−1 + Uxtst = tanh(at)ot = c+ Vstpt = softmax(ot)

In this case, the activation function are the hyperbolic tangent(tanh) for the hidden layer and the multinomial logistic function(softmax) for the output layer.

24

Page 31: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

LSTM

The information flow in an LSTM module is much more complex thanthe one in an RNN. The architecture used in this work uses thefollowing equations, presented in [6]:

it = σ(Wxixt +Whiht−1 +Wcict−1 + bi)ft = σ(Wxfxt +Whfht−1 +Wcfct−1 + bf)ct = ftct−1 + it tanh(Wxcxt +Whcht−1 + bc)ot = σ(Wxoxt +Whoht−1 +Wcoct + bo)ht = ot tanh(ct)

where σ is the logistic sigmoid function, and i, f, o and c arerespectively the input gate, forget gate,output gate and cellactivation vectors, all of which are the same size as the hiddenvector h.

25

Page 32: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Corpus stats

Google News• # tokens: 6B• Vocabulary size: 40K• # matched words:

• DBbook: 44636 (41.52%)• ML1M: 35150 (49.13%)

GloVe• # tokens: 100B• Vocabulary size: 3M• # matched words:

• DBbook: 65013 (60.48%)• ML1M: 49893 (69.74%)

26

Page 33: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

References

Page 34: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

[1] Pasquale Lops, Marco De Gemmis, and Giovanni Semeraro.“Content-based recommender systems: State of the art andtrends”. In: Recommender systems handbook. Springer, 2011.

[2] Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. “Deeplearning”. In: Nature 521 (2015).

[3] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.“Learning representations by back-propagating errors”. In:Cognitive modeling 5 (1988).

[4] David E Rumelhart, Geoffrey E Hinton, and Ronald J Williams.Learning internal representations by error propagation.Tech. rep. DTIC Document, 1985.

[5] Yoshua Bengio, Patrice Simard, and Paolo Frasconi. “Learninglong-term dependencies with gradient descent is difficult”. In:Neural Networks, IEEE Transactions on 5 (1994).

[6] Alan Graves, Abdel-rahman Mohamed, and Geoffrey Hinton.“Speech recognition with deep recurrent neural networks”. In:

26

Page 35: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

Acoustics, Speech and Signal Processing (ICASSP), 2013 IEEEInternational Conference on. IEEE. 2013.

[7] Jason Weston et al. “Towards AI-Complete Question Answering:A Set of Prerequisite Toy Tasks”. In: CoRR abs/1502.05698 (2015).

[8] Ankit Kumar et al. “Ask Me Anything: Dynamic MemoryNetworks for Natural Language Processing”. In: CoRRabs/1506.07285 (2015).

[9] Alejandro Bellogin, Pablo Castells, and Ivan Cantador.“Precision-oriented evaluation of recommender systems: analgorithmic comparison”. In: Proceedings of the fifth ACMconference on Recommender systems. 2011.

[10] Tommaso Di Noia, Iván Cantador, and Vito Claudio Ostuni.“Linked open data-enabled recommender systems: ESWC 2014challenge on book recommendation”. In: Semantic WebEvaluation Challenge. Springer, 2014.

26

Page 36: Ask Me Any Rating: A Content-based Recommender System based on Recurrent Neural Networks

[11] Tijmen Tieleman and Geoffrey E. Hinton. “rmsprop”. In:COURSERA: Neural Networks for Machine Learning Lecture 6.5(2012).

26