Acts As Recommendable

Preview:

DESCRIPTION

RubyManor talk on using Recommendation systems in production.

Citation preview

Recommendations in Production

Alex MacCaw

Netflix Prize

Amazon.comFacebookLast.fmStumbleUpon

Google Suggest

iTunes

Rotten Tomatoes

Yelp

Google Search

Chicken or Egg

• Google Reader

• IMDB

Acts As Recommendable

Types of recommendations

• Content Based

• User Based

• Item Based

Programming Collective Intelligence

Has Many Through Relationship

User Book

UserBooks

Has Many Has Many

Has Many Through

Can have score (rating)

User

class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_booksend

Gives you

User#similar_usersUser#recommended_booksBook#similar_books

The algorithms

• Manhattan Distance

• Euclidean distance

• Cosine

• Pearson correlation coefficient

• Jaccard

• Levenshtein

How does it work?

Strategy

• Map data into Euclidean Space

• Calculate similarity

• Use similarities to recommend

The Black Knight

John Tucker Must Die

James 4 5

Jonah 3 2

George 5 3

Alex 4 2

0

1.25

2.50

3.75

5.00

0 1.25 2.50 3.75 5.00

The Black Knight

John Tucker Must Die

0

1.25

2.50

3.75

5.00

0 1.25 2.50 3.75 5.00

The Black Knight

John Tucker Must Die

item id

user id

score

{ 1 => { 1 => 1.0, 2 => 0.0, ... }, ...}

[[1, 0.5554], [2, 0.888], [3, 0.8843], ...]

Problem 1

It was far too slow to calculate on the fly(obvious)

SELECT * FROM "users" WHERE ("users"."id" = 2) SELECT * FROM "books" SELECT * FROM "users" SELECT "user_books".* FROM "user_books" WHERE ("user_books".user_id IN (1,2,3,4,5,6,7,8,9,10)) SELECT * FROM "books" WHERE ("books"."id" IN (11,6,12,7,13,8,14,9,15,1,2,19,20,3,10,4,5)) SELECT * FROM "books" WHERE ("books"."id" IN (20,3,19,6))

All books All user_books

Solution

Cache the dataset

rake recommendations:build

Build offline

SELECT * FROM "user_books" WHERE ("user_books".user_id = 2) SELECT * FROM "books" WHERE ("books"."id" = 5) SELECT * FROM "books" WHERE ("books"."id" = 4) SELECT * FROM "books" WHERE ("books"."id" = 8) SELECT * FROM "books" WHERE ("books"."id" = 7) SELECT * FROM "books" WHERE ("books"."id" = 2) SELECT * FROM "books" WHERE ("books"."id" = 1)

Problem 2

Fetching the dataset took too long since it was so massive

Solution

Split up the cache by item

Rails.cache.write("aar_books_1", scores

)

Problem 3

The dataset was so big it crashed Ruby!

Solution

Get rid of ActiveRecord

Only deal with integers

items = options[:on_class].connection.select_values("SELECT id from #{options[:on_class].table_name}").collect(&:to_i)

Problem 4

It still crashed Ruby!

{ 1 => { 1 => 1.0, 2 => 0.0, ... }, ...}

Solution

Remove unnecessary cruft from dataset

{ 1 => { 1 => 1.0, ... }, ...}

Problem 5

It was too slow

Solution

Re-write the slow bits in C

Details

• RubyInline

• Implemented Pearson

• Monkey patched original Ruby methods

• Very fast

Ruby Object

InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include "ruby.h" double c_sim_pearson(VALUE items) {

No Floats :(

InlineC = Module.new do inline do |builder| builder.c ' #include <math.h> #include "ruby.h" double c_sim_pearson(VALUE items) {

Hash Lookup

if (!st_lookup(RHASH(prefs1)->tbl, items_a[i], &prefs1_item_ob)) { prefs1_item = 0.0; } else { prefs1_item = NUM2DBL(prefs1_item_ob); }

Conversion

return num / den;

Design Designs

• Not too many relationships

• Not to many ‘items’

• Similarity matrix for items, not users

Changing data

Scaling Even Further

• K Means clustering

• Split cluster by category

Adding ratingsActiveRecord::Schema.define(:version => 1) do create_table "books", :force => true do |t| t.string "name" t.datetime "created_at" t.datetime "updated_at" end create_table "user_books", :force => true do |t| t.integer "user_id", :null => false t.integer "book_id", :null => false t.integer "rating", :default => 0 end create_table "users", :force => true do |t| t.string "name" t.datetime "created_at" t.datetime "updated_at" endend

class User < ActiveRecord::Base has_many :user_books has_many :books, :through => :user_books acts_as_recommendable :books, :through => :user_books, :score => :ratingend

That’s it

Improvements?

• Better API

• Perform calculations over a cluster (EC2) using Map/Nanite

class AARN < Nanite::Actor expose :sim_pearson def sim_pearson(item1, item2) Optimizations.c_sim_pearson(item1, item2) endend

http://eribium.org/blog

twitter : maccmanemail/jabber: maccman@gmail.com

Questions?

http://rubyurl.com/kUpk

http://github.com/maccman/acts_as_recommendable

Recommended