54
Collaborative Filtering in Map/Reduce Ole-Martin Mørk - Open AdExchange tirsdag 14. september 2010

Collaborative Filtering in Map/Reduce

Embed Size (px)

Citation preview

Page 1: Collaborative Filtering in Map/Reduce

Collaborative Filteringin

Map/Reduce

Ole-Martin Mørk - Open AdExchange

tirsdag 14. september 2010

Page 2: Collaborative Filtering in Map/Reduce

Vision

• Learn that Map/Reduce is simple

• Learn that Map/Reduce may be powerful

• Collaborative Filtering is fun!

tirsdag 14. september 2010

Page 3: Collaborative Filtering in Map/Reduce

Agenda

• Map/Reduce

• Collaborative Filtering

• Collaborative Filtering with Map/Reduce

• Amazon Elastic MapReduce

tirsdag 14. september 2010

Page 4: Collaborative Filtering in Map/Reduce

Map/Reduce

tirsdag 14. september 2010

Page 5: Collaborative Filtering in Map/Reduce

Map/Reduce

• Very scalable algorithm

• Inspirered by map and reduce from functional programming.

• Everything is based on key/value

tirsdag 14. september 2010

Page 6: Collaborative Filtering in Map/Reduce

6 phases

• Reader

• Map

• Partition

• Comparison

• Reduce

• Writer

tirsdag 14. september 2010

Page 7: Collaborative Filtering in Map/Reduce

6 phases

• Reader

•Map

• Partition

• Comparison

•Reduce

• Writer

tirsdag 14. september 2010

Page 8: Collaborative Filtering in Map/Reduce

Map

tirsdag 14. september 2010

Page 9: Collaborative Filtering in Map/Reduce

List(“hello”,“dude”).map{x=>x.substring(0,1)}

functional map

tirsdag 14. september 2010

Page 10: Collaborative Filtering in Map/Reduce

Map/Reduce map

• Input is key/value

• Output is key/value

tirsdag 14. september 2010

Page 11: Collaborative Filtering in Map/Reduce

Simple Example, Map

• Count occurences of words in a document

• Input is: <linenumber>, <content of line>

• For each word on the line, the output is <word>, <count>

tirsdag 14. september 2010

Page 12: Collaborative Filtering in Map/Reduce

Map

tirsdag 14. september 2010

Page 13: Collaborative Filtering in Map/Reduce

Reducetirsdag 14. september 2010

Page 14: Collaborative Filtering in Map/Reduce

functional reduce

val sum=List(32,40,23).reduceLeft{_+_}

tirsdag 14. september 2010

Page 15: Collaborative Filtering in Map/Reduce

Map/Reduce reduce

• Input is key/list of values

• Output is key/value

tirsdag 14. september 2010

Page 16: Collaborative Filtering in Map/Reduce

Simple Example, Reduce

• Reduce input is <word, counts>

• For each value we increase the count

• Output is <word>, <sum of counts>

tirsdag 14. september 2010

Page 17: Collaborative Filtering in Map/Reduce

Reduce

tirsdag 14. september 2010

Page 18: Collaborative Filtering in Map/Reduce

CollaborativeFiltering

tirsdag 14. september 2010

Page 19: Collaborative Filtering in Map/Reduce

Amazon

tirsdag 14. september 2010

Page 20: Collaborative Filtering in Map/Reduce

Last.fm

tirsdag 14. september 2010

Page 21: Collaborative Filtering in Map/Reduce

Sceneami.com

tirsdag 14. september 2010

Page 22: Collaborative Filtering in Map/Reduce

User based

• Useful when we have

• Small number of users

• High correlation between users

• Data that changes often

tirsdag 14. september 2010

Page 23: Collaborative Filtering in Map/Reduce

Item based

• Useful for big sites like Amazon etc..

• Small overlap between users

• Mostly static data

tirsdag 14. september 2010

Page 24: Collaborative Filtering in Map/Reduce

Min

drø

mm

eapp

likas

jon

Pattern Matching in Scala

Euclidean Distance

Rating

Rating

Match

Match

tirsdag 14. september 2010

Page 25: Collaborative Filtering in Map/Reduce

Euclidean Distance

• Alf‘s presentations:1,25,56,57,58,98 (6)

• Kari’s presentations: 2,25,98,99 (4)

• Equal presentations: 25 and 98 (2)

• Unmatched presentations: 6-2 + 4-2 = 6

• Distance score: 1/1+sqr(6)= 0.29

tirsdag 14. september 2010

Page 26: Collaborative Filtering in Map/Reduce

Recommended sessions

• Me:1,2,5,6,7

• Kate (0.31): 5,6,8,9

• Paul (0.41): 1,2,4,5,6

• Mary(0.31):1,5,8,9

tirsdag 14. september 2010

Page 27: Collaborative Filtering in Map/Reduce

Recommended sessions

• Me:1,2,5,6,7

• Kate (0.31): 5,6,8,9

• Paul (0.41): 1,2,4,5,6

• Mary(0.31):1,5,8,9

• Recommended: 8 (0.62)

tirsdag 14. september 2010

Page 28: Collaborative Filtering in Map/Reduce

Recommended sessions

• Me:1,2,5,6,7

• Kate (0.31): 5,6,8,9

• Paul (0.41): 1,2,4,5,6

• Mary(0.31):1,5,8,9

• Recommended: 8 (0.62), 9 (0.62)

tirsdag 14. september 2010

Page 29: Collaborative Filtering in Map/Reduce

Recommended sessions

• Me:1,2,5,6,7

• Kate (0.31): 5,6,8,9

• Paul (0.41): 1,2,4,5,6

• Mary(0.31):1,5,8,9

• Recommended: 8 (0.62), 9 (0.62), 4 (0.41)

tirsdag 14. september 2010

Page 30: Collaborative Filtering in Map/Reduce

Demo

tirsdag 14. september 2010

Page 31: Collaborative Filtering in Map/Reduce

More Map/Reduce

tirsdag 14. september 2010

Page 32: Collaborative Filtering in Map/Reduce

Several iterations

Iteration 1

Iteration 2

Iteration 3

tirsdag 14. september 2010

Page 33: Collaborative Filtering in Map/Reduce

Several iterations

Iteration 3

Iteration 1 Iteration 2

tirsdag 14. september 2010

Page 34: Collaborative Filtering in Map/Reduce

Partitioning

Reducer Reducer

Jeff

Kate

Mary

Ali

Lea

Paul

Paul Mary Kate Lea Jeff Ali

tirsdag 14. september 2010

Page 35: Collaborative Filtering in Map/Reduce

Comparison

Reducer Reducer

Pres 2

Kate

Pres 2 JeffPres 2

Mary

Pres 1

Paul

Pres 1 AliPres 1

Lea

Pres 1Pres 1Pres 1 Pres 2Pres 2Pres 2Paul Lea Ali Jeff Mary Kate

tirsdag 14. september 2010

Page 36: Collaborative Filtering in Map/Reduce

Guidelines

• Never access external sources during computation.

• Your functions should be small and fast

• You might not have all the data available

tirsdag 14. september 2010

Page 37: Collaborative Filtering in Map/Reduce

Hadoop

• Hadoop is reusing objects, so remember to clone if you plan to keep them.

• You can read and write all objects implementing hadoop.WritableComparable

• write(DataOutput)

• readFields(DataInput)

• compareTo(Object)

tirsdag 14. september 2010

Page 38: Collaborative Filtering in Map/Reduce

Collaborative Filtering, the Map/Reduce way

tirsdag 14. september 2010

Page 39: Collaborative Filtering in Map/Reduce

Overview

• Create an application that recommends JavaZone presentations.

• Overall goal: Scalable performance

• 4 iterations

• Reading input from text file

tirsdag 14. september 2010

Page 40: Collaborative Filtering in Map/Reduce

Iteration 1

• Map input: <user>, <presentations>

• Map output: <presentation>, <user>

• Reduce output: <presentation>, <userList>

tirsdag 14. september 2010

Page 41: Collaborative Filtering in Map/Reduce

Iteration 2

• Map input: <presentation>, <userList>

• Map output: <user>, <userList>

• Reduce input: <user>, <list of userList>

• Reduce output: <userTuplet>, <match count>

tirsdag 14. september 2010

Page 42: Collaborative Filtering in Map/Reduce

Iteration 3

• Map input: <userTuplet>, <match count>

• Map output: <userTuplet>, <diff>

• Map output: <userTuplet reversed>, <diff>

• Reduce output: <user>, <similaruser>

tirsdag 14. september 2010

Page 43: Collaborative Filtering in Map/Reduce

Iteration 4

• Map input: <user>, <similaruser>

• Map output: <user>, <presentation with score>

• Reduce output: <user>, <presentations>

tirsdag 14. september 2010

Page 44: Collaborative Filtering in Map/Reduce

Demo

tirsdag 14. september 2010

Page 45: Collaborative Filtering in Map/Reduce

Map/Reduce on EC2

tirsdag 14. september 2010

Page 46: Collaborative Filtering in Map/Reduce

Elastic Map/Reduce

• Same code

• Same input

• Different configuration

tirsdag 14. september 2010

Page 47: Collaborative Filtering in Map/Reduce

Upload files

s3cmd put oax-jz10:jar/oax-jz10.jar target/oax.jz10.jar

s3cmd.rb put oax-jz10:input/data.txt data.txt

tirsdag 14. september 2010

Page 48: Collaborative Filtering in Map/Reduce

Create job flow

elastic-mapreduce --create --alive --log-uri s3n://oax-jz10/log

tirsdag 14. september 2010

Page 49: Collaborative Filtering in Map/Reduce

Register iterations

elastic-mapreduce --jobflow j-1NLAIW45QUN4B --jar s3n://oax-jz10/jar/oax-jz10.jar --arg com.openadex.pres.iterations.Iteration1 --arg s3n://oax-jz10/input --arg s3n://oax-jz10/output1

tirsdag 14. september 2010

Page 50: Collaborative Filtering in Map/Reduce

Download output

s3cmd.rb get oax-jz10:output4/part-00000 out

tirsdag 14. september 2010

Page 51: Collaborative Filtering in Map/Reduce

Demo

tirsdag 14. september 2010

Page 52: Collaborative Filtering in Map/Reduce

Summary

• Map/Reduce may be simple

• Map/Reduce can be really powerful

• Collaborative filtering is fun :-)

tirsdag 14. september 2010

Page 53: Collaborative Filtering in Map/Reduce

tirsdag 14. september 2010

Page 54: Collaborative Filtering in Map/Reduce

Thank you

Ole-Martin Mø[email protected]/olemartin

del.icio.us/olemartin/jz10

All images are licensed with Creative Commons. See http://bit.ly/mr-photos for details,

tirsdag 14. september 2010