Distribution Problems in Recommender Systems

Differences in Distributions and Their Effect on Recommendation

System PerformanceWhy Collaborative Filtering Doesn’t Scale

(portions reference Prismatic’s Silicon Valley talk)

History of Recommendation

Overfitting

Distribution of All Items Across Users

Distribution of All Items Across All Users in the Future

Concrete Set of Past Items

Across Users

Concrete Set of Future Items Across Users

Recommender Systems DilemmaSet of All Items Possible

Set of Items Known to Users in the Future

Set of Items Known to Users in the Past

Set of Items Recommended By

Recommenders

Items Viewed Or Liked in the Future

Items Users Viewed Or Rated

in the PastItems Seen in Ground

Truth Without Changes in Item

Access

??????

Collaborative Filtering in Music• Construct correlations between items from set of past known items• Generate estimated distribution for past users across all items• Hope ‘errors’ relate to future user liked items• Gap between distributions escalates with the scale of data

Resulting BiasesHuge number of items where 50%+ of users only ever saw 20 songs a

month out of 3 millionMassive gap between all items and known items distribution

Cross Validation ground truth assumes the 50%+ users only ever saw that new top 20 songs for the new setResults are supposed to be based on if users knew all sets

Continuous user testing assumes ‘all items seen’ distributions, but only the set of recommended items are new items seen

User data itself is a biased subset of the whole

First Generation Problems• Everyone likes The Beatles or Norah Jones• Extremely frequent in biased data sets• Since everyone listened to before, everyone gets recommended them• Recommendations usually repeat the top 40 of the data collection

• Users might like novel recommendations, but that won’t ever be in the evaluation set in cross validation – users never saw them

Problems Over Time• The ground truth is heavily biased by recommendations controlling

the set of known items• Machine learning – including collaborative filtering – learns the algorithm

distribution more than users preferences

• Performance Bias• Future ground truth comes from those that stayed in the system• They liked the system• It doesn’t represent those that were unhappy and left• Biases data to keep existing users happy without regard to ex-users• In extreme cases, even new users are discarded

Best Solution So FarPast Data Idealized Future Distribution

Idealized Function Feature Value => Rating

Best Solution So Far• Requires all Items be categorized and quantized• Requires accuracy and general agreement on these values• (Socially Defined versus Absolute)

• At least all features are present in all sets• Transforms recommendation into optimization and personalization• Set of items with highest score for a user• Ability to predict poor performing product or agent solutions

• Better able to incorporate additional data• Prediction is usually linear time over the number of items

Evaluation Adjustments• No Replacement for Real World A/B testing• Machine Learning for evaluation, not just the question• Hidden dependencies and ‘cheating’

Learned Algorithm Model Training Evaluation Model

Model Training

Business Objective

Ground Truth

Technology

Distribution Problems in Recommender Systems