Upload
daniel-mcennis
View
143
Download
0
Embed Size (px)
Citation preview
Differences in Distributions and Their Effect on Recommendation
System PerformanceWhy Collaborative Filtering Doesn’t Scale
(portions reference Prismatic’s Silicon Valley talk)
History of Recommendation
Overfitting
Distribution of All Items Across Users
Distribution of All Items Across All Users in the Future
Concrete Set of Past Items
Across Users
Concrete Set of Future Items Across Users
Recommender Systems DilemmaSet of All Items Possible
Set of Items Known to Users in the Future
Set of Items Known to Users in the Past
Set of Items Recommended By
Recommenders
Items Viewed Or Liked in the Future
Items Users Viewed Or Rated
in the PastItems Seen in Ground
Truth Without Changes in Item
Access
??????
Collaborative Filtering in Music• Construct correlations between items from set of past known items• Generate estimated distribution for past users across all items• Hope ‘errors’ relate to future user liked items• Gap between distributions escalates with the scale of data
Resulting BiasesHuge number of items where 50%+ of users only ever saw 20 songs a
month out of 3 millionMassive gap between all items and known items distribution
Cross Validation ground truth assumes the 50%+ users only ever saw that new top 20 songs for the new setResults are supposed to be based on if users knew all sets
Continuous user testing assumes ‘all items seen’ distributions, but only the set of recommended items are new items seen
User data itself is a biased subset of the whole
First Generation Problems• Everyone likes The Beatles or Norah Jones• Extremely frequent in biased data sets• Since everyone listened to before, everyone gets recommended them• Recommendations usually repeat the top 40 of the data collection
• Users might like novel recommendations, but that won’t ever be in the evaluation set in cross validation – users never saw them
Problems Over Time• The ground truth is heavily biased by recommendations controlling
the set of known items• Machine learning – including collaborative filtering – learns the algorithm
distribution more than users preferences
• Performance Bias• Future ground truth comes from those that stayed in the system• They liked the system• It doesn’t represent those that were unhappy and left• Biases data to keep existing users happy without regard to ex-users• In extreme cases, even new users are discarded
Best Solution So FarPast Data Idealized Future Distribution
Idealized Function Feature Value => Rating
Best Solution So Far• Requires all Items be categorized and quantized• Requires accuracy and general agreement on these values• (Socially Defined versus Absolute)
• At least all features are present in all sets• Transforms recommendation into optimization and personalization• Set of items with highest score for a user• Ability to predict poor performing product or agent solutions
• Better able to incorporate additional data• Prediction is usually linear time over the number of items
Evaluation Adjustments• No Replacement for Real World A/B testing• Machine Learning for evaluation, not just the question• Hidden dependencies and ‘cheating’
Learned Algorithm Model Training Evaluation Model
Model Training
Business Objective
Ground Truth