16
Recommender Systems Navisro Analytics @navisro [email protected] http://www.navisro.com ACM Data Mining Hackathon 8/18/2012

Collaborative Filtering and Recommender Systems By Navisro Analytics

Embed Size (px)

DESCRIPTION

Recommendation System Overview, Types of Recommender System, and OpenSource tools/libraries available.

Citation preview

Page 1: Collaborative Filtering and Recommender Systems By Navisro Analytics

Recommender Systems Navisro Analytics

@navisro [email protected]

http://www.navisro.com

ACM Data Mining Hackathon

8/18/2012

Page 2: Collaborative Filtering and Recommender Systems By Navisro Analytics

Capturing the Long Tail…

Page 3: Collaborative Filtering and Recommender Systems By Navisro Analytics

Recommender Approaches

Item Hierarchy

(You bought Printer you will also need ink - BestBuy)

Collaborative Filtering – User-User Similarity

(People like you who bought beer also bought diapers - Target)

Attribute-based recommendations

(You like action movies, starring

Clint Eastwood, you might like “Good,

Bad and the Ugly” Netflix)

Collaborative Filtering – Item-

Item similarity

(You like Godfather so you will like

Scarface - Netflix)

Social+Interest Graph Based (Your friends like Lady Gaga so you will like Lady Gaga, PYMK – Facebook, LinkedIn)

Model Based Training SVM, LDA, SVD for implicit features

Page 4: Collaborative Filtering and Recommender Systems By Navisro Analytics

Other/Model-based

Approaches

• Slope one recommender

• Latent factor Models for Web Data

– Matrix factorization using SVD, ALS, with Regularization

– LDA, SVM, Bayesian Clustering

Page 5: Collaborative Filtering and Recommender Systems By Navisro Analytics

General Steps •Problem definition (user-based, item-based, ratings/binary…)

•Map-Reduce, cleansing, massaging data (input matrix)

•Training Set, Validation Set Data Prep

• bias removal - Z-score, Mean-centering, Log Normalize

• Pearson Correlation Coefficient

• Cosine Similarity

• K-nearest neighbor

Similarity weights/Neighbors

• Training model (only in model-based approaches) Train

• Predict missing ratings

• top-N predictions for every user Predict

• Reverse of normalization Denormalize

• Accuracy, Precision, Recall, F1, ROC Evaluate Accuracy

Page 6: Collaborative Filtering and Recommender Systems By Navisro Analytics

User-based CF

Reference: Recommenderlab vignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf

Page 7: Collaborative Filtering and Recommender Systems By Navisro Analytics

Challenges

• Dimensionality reduction (e.g. use PCA)

• Input data sparsity (aka cold start problem)

• Overfitting to training data set (use regularization)

• Data wrangling, in general…

Page 8: Collaborative Filtering and Recommender Systems By Navisro Analytics

Just How Good is your

Recommender?

• Evaluation of predicted ratings (Mean Average Error, Root Mean Sq Error)

• Evaluation of top-N recommendations

– Mean Absolute Error

– Accuracy

– Precision & Recall (F1 score)

– ROC curve

Page 9: Collaborative Filtering and Recommender Systems By Navisro Analytics

Tools

Page 10: Collaborative Filtering and Recommender Systems By Navisro Analytics

Open Source Tools Software Description Language URL

Apache Mahout Hadoop ML library that includes Collaborative Filtering Java

http://mahout.apache.org/

Cofi Collaborative Filtering Library Java http://www.nongnu.org/cofi/

Crab Components to create recommender systems Python https://github.com/muricoca/crab

easyrec Recommender for web pages Java http://easyrec.org/

LensKit Collaborative Filtering algorithms from GroupLens Research Java http://lenskit.grouplens.org/

MyMediaLite Recommender system algorithms C#/Mono http://mloss.org/software/view/282/

SVDFeature Toolkit for Feature based Matrix Factorization C++ http://mloss.org/software/view/333/

Vogoo PHP LIB Collaborative Filtering for personalized web sites PHP http://sourceforge.net/projects/vogoo/

recommenderlab R library for developing and testing collaborative filtering systems R

http://cran.r-project.org/web/packages/recommenderlab/index.html

Scikit-learn

Python module integrating classic ML algorithms in scientific Python packages (numpy, scipy, matplotlib) Python http://scikit-learn.org/stable/

Page 11: Collaborative Filtering and Recommender Systems By Navisro Analytics

recommenderlab

Reference: Recommenderlab vignette, http://cran.r-project.org/web/packages/recommenderlab/vignettes/recommenderlab.pdf

Page 12: Collaborative Filtering and Recommender Systems By Navisro Analytics

Mahout

DataModel model = new FileDataModel(new File("data.txt"));

// Construct the list of pre-computed correlations Collection<GenericItemSimilarity.ItemItemSimilarity> correlations =

...;

ItemSimilarity itemSimilarity =

new GenericItemSimilarity(correlations);

Recommender recommender =

new GenericItemBasedRecommender(model, itemSimilarity);

Recommender cachingRecommender = new CachingRecommender(recommender);

...

List<RecommendedItem> recommendations = cachingRecommender.recommend (1234, 10);

Page 13: Collaborative Filtering and Recommender Systems By Navisro Analytics

Peter Harrington’s Sample Py

Code

Page 14: Collaborative Filtering and Recommender Systems By Navisro Analytics

• High Level Reading – Programming Collective Intelligence by Toby Segaran. The 2nd

chapter gives a good introduction to collaborative filtering with Python examples (non-SVD).

– Matrix Factorization Techniques for Recommender Systems Yehuda Koren; Robert Bell; Chris Volinsky, IEEE Computer, 2009, 8

• Singular Value Decomposition (SVD) Reading – The Singular Value Decomposition, by Jody Hourigan and Lynn

McIndoo, Linear Algebra – Math 45. http://online.redwoods.edu/INSTRUCT/darnold/LAPROJ/Fall98/JodLynn/report2.pdf w/ Matlab & image examples

– Numerical Recipes, 3rd Edition, Press et. al.,2007, p65-75.

2. References & Reading

Page 15: Collaborative Filtering and Recommender Systems By Navisro Analytics

• Collaborative Filtering Reading – See papers on research.yahoo.com/Yehuda_Koren – Collaborative Filtering for Implicit Feedback Datasets, Yifan Hu;

Yehuda Koren; Chris Volinsky, IEEE International Conference on Data Mining (ICDM 2008), IEEE, 2008

– Factorization Meets the Neighborhood: a Multifaceted Collaborative Filtering Model, Yehuda Koren, ACM Int. Conference on Knowledge Discovery and Data Mining (KDD’08), 2008

– Collaborative Filtering with Temporal Dynamics, Yehuda Koren, KDD 2009, ACM, 2009

– James Thornton’s CF Blog http://original.jamesthornton.com/cf/ – Apache Mahout Recommender

https://cwiki.apache.org/MAHOUT/recommender-documentation.html

– Flexible Collaborative Filtering In Java With Mahout Taste - Philippe Adjiman

– Books, Articles and Tutorials on Mahout/Cofi

References & Reading (continued)

Page 16: Collaborative Filtering and Recommender Systems By Navisro Analytics

Questions?