Polyvalent recommendations

Preview:

DESCRIPTION

Recent work in recommendations allows some really amazing simplicity of implementation while extending the inputs handled to multiple kinds of interactions against items different from the ones being recommended.

Citation preview

1©MapR Technologies - Confidential

Polyvalent Recommendations

2©MapR Technologies - Confidential

Multiple Kinds of Behavior for Recommending

Multiple Kinds of Things

3©MapR Technologies - Confidential

Contact:– tdunning@maprtech.com– @ted_dunning

Slides and such (available late tonight):– http://www.slideshare.net/tdunning

Hash tags: #mapr #recommendations

4©MapR Technologies - Confidential

A new approach to recommendation, polyvalent recommendation, that is both simpler and much more powerful than traditional approaches. The idea is that you can combine user, item and content recommendations into a single query that you can implement using a very simple architecture.

5©MapR Technologies - Confidential

Recommendations

Often known (inaccurately) as collaborative filtering Actors interact with items– observe successful interaction

We want to suggest additional successful interactions Observations inherently very sparse

6©MapR Technologies - Confidential

Examples

Customers buying books (Linden et al) Web visitors rating music (Shardanand and Maes) or movies (Riedl,

et al), (Netflix) Internet radio listeners not skipping songs (Musicmatch) Internet video watchers watching >30 s

7©MapR Technologies - Confidential

Dyadic Structure

Functional– Interaction: actor -> item*

Relational– Interaction Actors x Items⊆

Matrix– Rows indexed by actor, columns by item– Value is count of interactions

Predict missing observations

8©MapR Technologies - Confidential

Recommendation Basics

History:

User Thing1 3

2 4

3 4

2 3

3 2

1 1

2 1

9©MapR Technologies - Confidential

Recommendation Basics

History as matrix:

(t1, t2) cooccur 2 times, (t1, t4) once, (t2, t4) once

t1 t2 t3 t4

u1 1 0 1 0

u2 1 0 1 1

u3 0 1 0 1

10©MapR Technologies - Confidential

A Quick Simplification

Users who do h

Also do r

User-centric recommendations

Item-centric recommendations

11©MapR Technologies - Confidential

Recommendation Basics

Coocurrence

t1 t2 t3 t4

t1 2 0 2 1

t2 0 1 0 1

t3 2 0 1 1

t4 1 1 1 2

12©MapR Technologies - Confidential

Problems with Raw Cooccurrence

Very popular items co-occur with everything– Welcome document– Elevator music

That isn’t interesting– We want anomalous cooccurrence

13©MapR Technologies - Confidential

Recommendation Basics

Coocurrence

t1 t2 t3 t4

t1 2 0 2 1

t2 0 1 0 1

t3 2 0 1 1

t4 1 1 1 2t3 not t3

t1 2 1

not t1 1 1

14©MapR Technologies - Confidential

Root LLR Details

In Rentropy = function(k) { -sum(k*log((k==0)+(k/sum(k))))}rootLLr = function(k) { sqrt( (entropy(rowSums(k))+entropy(colSums(k)) - entropy(k))/2)}

Like sqrt(mutual information * N/2)

15©MapR Technologies - Confidential

Spot the Anomaly

Root LLR is roughly like standard deviations

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

0.44 0.98

2.26 7.15

16©MapR Technologies - Confidential

Threshold by Score

Coocurrence

t1 t2 t3 t4

t1 2 0 2 1

t2 0 1 0 1

t3 2 0 1 1

t4 1 1 1 2

17©MapR Technologies - Confidential

Threshold by Score

Significant cooccurrence => Indicators

t1 t2 t3 t4

t1 1 0 0 1t2 0 1 0 1t3 0 0 1 1t4 1 0 0 1

18©MapR Technologies - Confidential

Decomposition for Cooccurrence

Can use SVD for cooccurrence

But first one or two singular vectors just encode popularity … ignore those

VT projects items into concept space, V projects back into item space

Thresholding reconstructed cooccurrence matrix is another way to get indicators

19©MapR Technologies - Confidential

What’s right about this?

20©MapR Technologies - Confidential

Virtues of Current State of the Art

Lots of well publicized history– Netflix, Amazon, Overstock

Lots of support– Mahout, commercial offerings like Myrrix

Lots of existing code– Mahout, commercial codes

Proven track record Well socialized solution

21©MapR Technologies - Confidential

What’s wrong about this?

22©MapR Technologies - Confidential

Cross Occurrence

We don’t have to do co-occurrence We can do cross-occurrence

Result is cross-recommendation

23©MapR Technologies - Confidential

Fundamental Algorithmics

Cooccurrence

A is users x items, K is items x items Product has general shape of matrix K tells us “users who interacted with x also interacted with y”

24©MapR Technologies - Confidential

Fundamental Algorithmic Structure

Cooccurrence

Matrix approximation by factoring

LLR

25©MapR Technologies - Confidential

But Wait ...

Does it have to be that way?

26©MapR Technologies - Confidential

But why not ...

Why just dyadic learning?

Why not triadic learning?Why not cross learning?

27©MapR Technologies - Confidential

For example

Users enter queries (A)– (actor = user, item=query)

Users view videos (B)– (actor = user, item=video)

A’A gives query recommendation– “did you mean to ask for”

B’B gives video recommendation– “you might like these videos”

28©MapR Technologies - Confidential

The punch-line

B’A recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)

29©MapR Technologies - Confidential

Real-life example

Query: “Paco de Lucia” Conventional meta-data search results:– “hombres del paco” times 400– not much else

Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

30©MapR Technologies - Confidential

Real-life example

31©MapR Technologies - Confidential

Hypothetical Example

Want a navigational ontology? Just put labels on a web page with traffic– This gives A = users x label clicks

Remember viewing history– This gives B = users x items

Cross recommend– B’A = label to item mapping

After several users click, results are whatever users think they should be

32©MapR Technologies - Confidential

But wait,there’s more!

33©MapR Technologies - Confidential

users

things

34©MapR Technologies - Confidential

users

thingtype 1

thingtype 2

35©MapR Technologies - Confidential

36©MapR Technologies - Confidential

Summary

Input: Multiple kinds of behavior on one set of things

Output: Recommendations for one kind of behavior with a different set of things

Cross recommendation is a special case

37©MapR Technologies - Confidential

Now again, without the scary math

38©MapR Technologies - Confidential

Input Data User transactions– user id, merchant id– SIC code, amount– Descriptions, cuisine, …

Offer transactions– user id, offer id– vendor id, merchant id’s, – offers, views, accepts

39©MapR Technologies - Confidential

Input Data User transactions– user id, merchant id– SIC code, amount– Descriptions, cuisine, …

Offer transactions– user id, offer id– vendor id, merchant id’s, – offers, views, accepts

Derived user data– merchant id’s– anomalous descriptor terms– offer & vendor id’s

Derived merchant data– local top40– SIC code– vendor code– amount distribution

40©MapR Technologies - Confidential

Cross-recommendation

Per merchant indicators– merchant id’s– chain id’s– SIC codes– indicator terms from text– offer vendor id’s

Computed by finding anomalous (indicator => merchant) rates

41©MapR Technologies - Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

42©MapR Technologies - Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

43©MapR Technologies - Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

Sample query– Current location– Recent merchant descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted offers– Local top40

44©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

45©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

46©MapR Technologies - Confidential

Objective Results

At a very large credit card company

History is all transactions, all web interaction

Processing time cut from 20 hours per day to 3

Recommendation engine load time decreased from 8 hours to 3 minutes

Recommendation quality increased visibly

47©MapR Technologies - Confidential

Contact:– tdunning@maprtech.com– @ted_dunning

Slides and such (available late tonight):– http://www.slideshare.net/tdunning

Hash tags: #mapr #recommendations

We are hiring!

48©MapR Technologies - Confidential

Thank You

Recommended