Buzz Words Dunning Multi Modal Recommendations

Preview:

DESCRIPTION

Multi-model recommendation engines use multiple kinds of behavior as input and can be implemented using standard search engine technology. I show how and why starting with basic recommendations all the way through full multi-modal systems.

Citation preview

1©MapR Technologies - Confidential

Multi-Modal Recommendations

2©MapR Technologies - Confidential

Multiple Kinds of Behavior for Recommending

Multiple Kinds of Things

3©MapR Technologies - Confidential

What’s Up

What is this multi-modal stuff? A simple recommendation architecture Some scary math Putting it into a deployable architecture Final thoughts

4©MapR Technologies - Confidential

Contact:– tdunning@maprtech.com– @ted_dunning– @apachemahout– @user-subscribe@mahout.apache.org

Slides and such (available late tonight):– http://www.slideshare.net/tdunning

Hash tags: #bbuzz #mapr #recommendations

5©MapR Technologies - Confidential

Recommendations

Often known (inaccurately) as collaborative filtering Actors interact with items– observe successful interaction

We want to suggest additional successful interactions Observations inherently very sparse

6©MapR Technologies - Confidential

Examples of Recommendations

Customers buying books (Linden et al) Web visitors rating music (Shardanand and Maes) or movies (Riedl,

et al), (Netflix) Internet radio listeners not skipping songs (Musicmatch) Internet video watchers watching >30 s (Veoh)

7©MapR Technologies - Confidential

What is this multi-modal stuff?

But people don’t just do one thing

One kind of behavior is useful for predicting other kinds

Having a complete picture is important for accuracy

What has the user said, viewed, clicked, closed, bought lately?

8©MapR Technologies - Confidential

A simple recommendation architecture

Look at the history of interactions

Find significant item cooccurrence in user histories

Use these cooccurring items as “indicators”

For all indicators in user history, add up scores

9©MapR Technologies - Confidential

Recommendation Basics

History:

User Thing1 3

2 4

3 4

2 3

3 2

1 1

2 1

10©MapR Technologies - Confidential

Recommendation Basics

History as matrix:

(t1, t3) cooccur 2 times, (t1, t4) once, (t2, t4) once, (t3, t4) once

t1 t2 t3 t4

u1 1 0 1 0

u2 1 0 1 1

u3 0 1 0 1

11©MapR Technologies - Confidential

A Quick Simplification

Users who do h

Also do r

User-centric recommendations

Item-centric recommendations

12©MapR Technologies - Confidential

Recommendation Basics

Coocurrence

t1 t2 t3 t4

t1 2 0 2 1

t2 0 1 0 1

t3 2 0 1 1

t4 1 1 1 2

13©MapR Technologies - Confidential

Problems with Raw Cooccurrence

Very popular items co-occur with everything– Welcome document– Elevator music

That isn’t interesting– We want anomalous cooccurrence

14©MapR Technologies - Confidential

Recommendation Basics

Coocurrence

t1 t2 t3 t4

t1 2 0 2 1

t2 0 1 0 1

t3 2 0 1 1

t4 1 1 1 2t3 not t3

t1 2 1

not t1 1 1

15©MapR Technologies - Confidential

Spot the Anomaly

Root LLR is roughly like standard deviations

A not A

B 13 1000

not B 1000 100,000

A not A

B 1 0

not B 0 2

A not A

B 1 0

not B 0 10,000

A not A

B 10 0

not B 0 100,000

0.44 0.98

2.26 7.15

16©MapR Technologies - Confidential

Root LLR Details

In Rentropy = function(k) { -sum(k*log((k==0)+(k/sum(k))))}rootLLr = function(k) { sign = … sign * sqrt( (entropy(rowSums(k))+entropy(colSums(k)) - entropy(k))/2)}

Like sqrt(mutual information * N/2)See http://bit.ly/16DvLVK

17©MapR Technologies - Confidential

Threshold by Score

Coocurrence

t1 t2 t3 t4

t1 2 0 2 1

t2 0 1 0 1

t3 2 0 1 1

t4 1 1 1 2

18©MapR Technologies - Confidential

Threshold by Score

Significant cooccurrence => Indicators

t1 t2 t3 t4

t1 1 0 0 1t2 0 1 0 1t3 0 0 1 1t4 1 0 0 1

19©MapR Technologies - Confidential

So Far, So Good

Classic recommendation systems based on these approaches– Musicmatch (ca 2000)– Veoh Networks (ca 2005)

Currently available in Mahout– See RowSimilarityJob

Very simple to deploy– Compute indicators– Store in search engine– Works very well with enough data

20©MapR Technologies - Confidential

What’s right about this?

21©MapR Technologies - Confidential

Virtues of Current State of the Art

Lots of well publicized history– Musicmatch, Veoh, Netflix, Amazon, Overstock

Lots of support– Mahout, commercial offerings like Myrrix

Lots of existing code– Mahout, commercial codes

Proven track record Well socialized solution

22©MapR Technologies - Confidential

What’s wrong about this?

23©MapR Technologies - Confidential

Too Limited

People do more than one kind of thing Different kinds of behaviors give different quality, quantity and

kind of information

We don’t have to do co-occurrence We can do cross-occurrence

Result is cross-recommendation

24©MapR Technologies - Confidential

Heh?

25©MapR Technologies - Confidential

Symmetry Gives Cross Recommentations

Why just dyadic learning?

Why not triadic learning?Why not cross learning?

26©MapR Technologies - Confidential

For example

Users enter queries (A)– (actor = user, item=query)

Users view videos (B)– (actor = user, item=video)

A’A gives query recommendation– “did you mean to ask for”

B’B gives video recommendation– “you might like these videos”

27©MapR Technologies - Confidential

The punch-line

B’A recommends videos in response to a query– (isn’t that a search engine?)– (not quite, it doesn’t look at content or meta-data)

28©MapR Technologies - Confidential

Real-life example

Query: “Paco de Lucia” Conventional meta-data search results:– “hombres del paco” times 400– not much else

Recommendation based search:– Flamenco guitar and dancers– Spanish and classical guitar– Van Halen doing a classical/flamenco riff

29©MapR Technologies - Confidential

Real-life example

30©MapR Technologies - Confidential

Hypothetical Example

Want a navigational ontology? Just put labels on a web page with traffic– This gives A = users x label clicks

Remember viewing history– This gives B = users x items

Cross recommend– B’A = label to item mapping

After several users click, results are whatever users think they should be

31©MapR Technologies - Confidential

32©MapR Technologies - Confidential

Nice. But we can do better?

33©MapR Technologies - Confidential

users

things

34©MapR Technologies - Confidential

users

thingtype 1

thingtype 2

35©MapR Technologies - Confidential

users

action1item type1

action2item type2

36©MapR Technologies - Confidential

37©MapR Technologies - Confidential

Summary

Input: Multiple kinds of behavior on one set of things

Output: Recommendations for one kind of behavior with a different set of things

Cross recommendation is a special case

38©MapR Technologies - Confidential

Now again, without the scary math

39©MapR Technologies - Confidential

Input Data

User transactions– user id, merchant id– SIC code, amount– Descriptions, cuisine, …

Offer transactions– user id, offer id– vendor id, merchant id’s, – offers, views, accepts

40©MapR Technologies - Confidential

Input Data

User transactions– user id, merchant id– SIC code, amount– Descriptions, cuisine, …

Offer transactions– user id, offer id– vendor id, merchant id’s, – offers, views, accepts

Derived user data– merchant id’s– anomalous descriptor terms– offer & vendor id’s

Derived merchant data– local top40– SIC code– vendor code– amount distribution

41©MapR Technologies - Confidential

Cross-recommendation

Per merchant indicators– merchant id’s– chain id’s– SIC codes– indicator terms from text– offer vendor id’s

Computed by finding anomalous (indicator => merchant) rates

42©MapR Technologies - Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

43©MapR Technologies - Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

44©MapR Technologies - Confidential

Search-based Recommendations

Sample document– Merchant Id– Field for text description– Phone– Address– Location

– Indicator merchant id’s– Indicator industry (SIC) id’s– Indicator offers– Indicator text– Local top40

Sample query– Current location– Recent merchant descriptions– Recent merchant id’s– Recent SIC codes– Recent accepted offers– Local top40

45©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrindexing

Cooccurrence(Mahout)

Item meta-data

Indexshards

Complete history

46©MapR Technologies - Confidential

SolRIndexerSolR

IndexerSolrsearchWeb tier

Item meta-data

Indexshards

User history

47©MapR Technologies - Confidential

Contact:– tdunning@maprtech.com– @ted_dunning– @apachemahout– @user-subscribe@mahout.apache.org

Slides and such (available late tonight):– http://www.slideshare.net/tdunning

Hash tags: #bbuzz #mapr #recommendations

We are hiring!

48©MapR Technologies - Confidential

Objective Results

At a very large credit card company

History is all transactions, all web interaction

Processing time cut from 20 hours per day to 3

Recommendation engine load time decreased from 8 hours to 3 minutes

Recommendation quality increased visibly

49©MapR Technologies - Confidential

Thank You