22
Content-based Recommender Systems jake@ twitter.com @pbrane User Interest Modeling, Twitter Inc. Apache Mahout PMC (previously: LinkedIn, a bunch of tiny startups)

Jake Mannix, MLconf 2013

Embed Size (px)

DESCRIPTION

Jake Mannix, Applied Machine Learning Engineer, Twitter: Personalization and Recommenders with Content-Based Approaches

Citation preview

Page 1: Jake Mannix, MLconf 2013

Content-based Recommender Systems

jake@ twitter.com @pbraneUser Interest Modeling, Twitter Inc.

Apache Mahout PMC(previously: LinkedIn, a bunch of tiny startups)

Page 2: Jake Mannix, MLconf 2013

Overview

• Collaborative Filtering == RecSys?

• User/Item content and/or metadata

• RecSys training w/ user/item features

• Advantages / Disadvantages

• Historical production examples from Twitter and LinkedIn

Page 3: Jake Mannix, MLconf 2013

Recommender System

“traditional” RecSys

Page 4: Jake Mannix, MLconf 2013

Aside: “users” don’t have to be users

• At LinkedIn, the Recommender Systems team built a general-purpose entity-to-entity RecSys: Product [user, item]

• TalentMatch [job posting, user-profile]

• GroupsYouMayLike [user, group]

• {Jobs for your group} [group, job-posting]

• AdsYouMayBeInterestedIn [user, ad]

Anmol Bhasin, Monica Rogati (now VP of Data at Jawbone), and myself built...So how do recommender systems work?next page: “an artists depiction of collaborative filtering”

Page 5: Jake Mannix, MLconf 2013

Collaborative Filtering

Page 6: Jake Mannix, MLconf 2013

CF is Generic!

• users / items reduced to GUIDs, could be anything

• large body of acad. work on techniques:

• SVD, ALS + other matrix factorizations

• stacked RBM, etc.

• General purpose OSS CF recommender:

• Apache Mahout (http://mahout.apache.org)

Page 7: Jake Mannix, MLconf 2013

What about Domain Specific Knowledge?• Items are more than just GUIDs

• Users are more than just account names

• Perhaps, they are both much more:

• user profile text on Facebook / LinkedIn

• webdoc content + metadata

• movie genres, directors, description

• ad landing page content

could be derived data: at Twitter, every piece of text content passing through the system gets classified into topical categories, and users get classified according to their topical interests and things they’re “known for”

Page 8: Jake Mannix, MLconf 2013

User/Item Features

• each user has a feature-vector

• each item has a feature-vector

• dimensionalities may (will!) differ

• collectively, we thus have MOAR Matrices!

next page: more art!

Page 9: Jake Mannix, MLconf 2013

Feature Matrices

we could decompose this resultant user/item-feature matrix...

slight misrepresentation: user-features along rows of first matrix, columns are user-idsnote: the “multiplication” here could be actual matrix mult, OR maybe a more bayesian / statistical form: p(user|user-feature), p(item|user), p(item-feature|item) -> p(positive engagement | user-features, item-features). Full joint distribution ->HARD. Naive Bayes? or...

Page 10: Jake Mannix, MLconf 2013

Train a ranker/classifier

• take a column of user-feature matrix:

• row of item-feature matrix:

• embed in

• train classifier to predict ratings given

go back and forth on this page to the previous onenext page is some notes about this

Page 11: Jake Mannix, MLconf 2013

Training, cont.

• note: no need for any relationship between features

• if you apply a discretization technique, don’t even need to care about correlation between +/- values and “goodness/badness”

Page 12: Jake Mannix, MLconf 2013

Classifier/Ranker RecSys HOWTO

• incoming preferences are triples of (user-feature vector, item-feature vector, preference-value)

• train classifier (online if desired!), and

• trained classifier spits out predicted rating given user/item pairs

• note: may require some item preselection

predicted ratings may not be what you want, it may be a Learn To Rank setup

Page 13: Jake Mannix, MLconf 2013

Variations

• What if your features have some structure?

Page 14: Jake Mannix, MLconf 2013

Structured data• Item = { field1, field2, field3, ... }

• User = { fieldA, fieldB, ... }

• field1: tf-idf-weighted “position description”

• field2 : standardized categorical job title

• field3 : #years experience

• fieldA : tf-idf “job requirements”

• fieldB : #years of experience required

note: this is TalentMatch: here “items” are LI profiles, and “users” are job postings

Page 15: Jake Mannix, MLconf 2013

Pairwise-field similarity

• Some fields are naturally comparable to others, can compute vector cosine, jaccard, etc.

• Others have a business-specific similarity f(#years experience - required experience)

• Each set of field pairs generates an untrained weight

Page 16: Jake Mannix, MLconf 2013

Train a low-dimensional classifier/ranker

• take these O(|item_fields| x |user_fields|) weights and feed into the training of a ranker.

• given low number of features, very interpretable

interpretation: p(user is good for job) = w_jobtitle+headline * sim(jobtitle, headline) + w_jobdesc+headline * sim(jobdesc, headline) + w_jobdesc+currentdesc * sim(jobdesc, currentdesc) + ...

Page 17: Jake Mannix, MLconf 2013

Content-based RecSys: Pros

• Fixes cold-start problem

• Scales fantastically

• Flexible: can Learn To Rank using LR, SVM, GBDT, whatever

other content approach to cold-start: unsupervised similarity to engaged-with items/users + CFscales: use as much data as your classifier/ranker needs to converge well. once trained, can often be a very low latency method of generating item scores. Many classifiers are extremely insensitive to #features input

Page 18: Jake Mannix, MLconf 2013

Content-based RecSys:Cons

• Not always very general (although: pairwise crossed features are pretty general)

• Features may be too coarse

• Feature selection may be difficult

• Low-latency from large item sets is hard

• Underweights popularity, similarity to known good items

for item selection: clustering can’t always work very well, if using crossed features, but sometimes tricks like LSH can help

Page 19: Jake Mannix, MLconf 2013

Hybrid Models

• Classifiers/Regression models yield scores

• can combine this score with preferences from CF

• alternately, generate top-K items via CF, rerank with your content-based ranker (using CF rank as another feature)

Page 20: Jake Mannix, MLconf 2013

Examples

• LinkedIn’s original (2010) generic entity-to-entity RecSys was primarily content-based

• Twitter’s #discover product is a hybrid recommender with content, social, and CF features

Note: PYMK is not primarily content basedAlso: personalized search is naturally a hybrid content-based recommender

Page 21: Jake Mannix, MLconf 2013

Conclusion

• Free paper title: “On the unreasonable effectiveness of CF on the consumer web”

• But if you do know features about users/items: learn to rank using them!

• This is more common than you might think, in industry. But everyone’s got different domain-specific features, so less research about it

CF works absurdly well, given how little it knows about the items it’s recommendingRiff on Hardy’s “On the unreasonable effectiveness of mathematics in the physical sciences...”

Page 22: Jake Mannix, MLconf 2013

Questions?

[email protected]@pbrane

LinkedIn/G+ : jakemannix