Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Agenda
● Introduction● Mahout / Taste● Taste Architecture● Algorithms● Evaluating algorithms● Questions?
Recommendation Engines
● Amazon
● Stumbleupon
● Youtube
● Last.fm
● Netflix
● Digg
● Google News
CollaborativeFiltering
Clustering
Classification
Is this SPAM?
Users & Items
Preferences
I rateI am buying
Explicit Implicit
3 stars
Item-based recommendation
Which are books are read
by people that also read
User-based recommendation
We've got similar tastes, read any good books?
User neighborhood
Taste Architecture
DataModel
Recommender
ItemSimilarityor UserSimilarity
234, 854, 4.0234, 598, 3.0234, 458, 5.0235, 289, 4.0… , … , ...
Preferences CSV file
3 stars
Preferences
● Preference● long userId;● long itemId;● float value;
● PreferenceArray● Implicit
BooleanUserPreferenceArray & BooleanItemPreferenceArray
DataModels
● FileDataModel
● GenericJDBCDataModel
● MySQLDataModel
Similarity Algorithms
Class Explicit Implicit
TanimotoCoefficientSimilarity
LogLikelihoodSimilarity
EuclidianDistanceSimilarity
PearsonCorrelationSimilarity
SpearmanCorrelationSimilarity
UncenteredCosineSimilarity
Slope One
Similarity Algorithms
Class Explicit Implicit
TanimotoCoefficientSimilarity
LogLikelihoodSimilarity
EuclidianDistanceSimilarity
PearsonCorrelationSimilarity
SpearmanCorrelationSimilarity
UncenteredCosineSimilarity
Slope One
TanimotoCoefficientSimilarity
#Users preferring A AND B
Divided by
#Users preferring A XOR B
T(A,B) =
LoglikelihoodSimilarity
● Hypothesis A = “Items are similar”
● Hypothesis B = “Items are not similar”
● L(A,B) = log (max likelihood A) – log (max likelihood B)
● See “Accurate methods for statistics of suprise and coincidence” ~ Ted Dunning
● MySQLJDBCItemSimilarity
● Generic*Similarity● GenericItemSimilarity.ItemItemSimilarity
● GenericUserSimilarity.UserUserSimilarity
Precomputed Similarities
long itemId = 345;
GenericItemBasedRecommender itemRec = …itemRec.mostSimilarItems(itemId, 5);
long userId = 103;
GenericUserBasedRecommender userRec = …userRec.recommend(userId, 5);
Recommenders
● User/Item-based recommendation
● Refresh logic
● Access to DataModel
● Recommended because
Recommenders
Evaluating algorithms
Eval %
Originaldataset
Train %
Recommender
Testdataset
Trainingdataset
Estimatedpreference
Actualpreference
3.0
Evaluating algorithms
● AverageAbsoluteDifference or RMSRecommenderEvaluator● Evaluation %● Training %● RecommenderBuilder● DataModelBuilder● DataModel
Evaluation Demo
● Helper classes for doing evaluation
● TODO - Evaluation of implicit data
● Suggestions welcome
References
Mahout in Action EAP
http://blog.jteam.nl
Mailinglist