30
RecStore An Extensible and Adaptive Framework for Online Recommender Queries inside the Database Engine

RecStore An Extensible and Adaptive Framework for Online Recommender Queries inside the Database Engine

Embed Size (px)

Citation preview

RecStore An Extensible and Adaptive Framework for Online Recommender Queries inside the

Database Engine

2

• Microsoft Research:–Justin J. Levandoski

• University of Minnesota:–Mohamed Sarwat–Mohamed F. Mokbel –Michael D. Ekstrand

Authors

3

Recommender Systems – Basic Idea

• Users: provide opinions on items consumed/watched/listened to…

• The system: provides the user suggestions for new items

4

• Analyze user behavior to recommend users personalized and interesting things to do/read/see

rate movies

MovieRatings

build recommendation

model

SimilarUsers

Similar Items

recommendationquery

“Recommend user A five movies”OfflineOnline

Recommender Systems – Basic Idea

5

Things have changed !

• We live in an increasingly social and “real-time” world– Number of things to recommend is growing exponentially– Users expressing opinions faster than ever– Recommendations change second-to-second

“Offline” step can no longer be tolerated

“Like” button

NY Times “Recommend” button

Facebook Posts Blog/News Items

6

• No work has explored recommender system performance – Performance has always been synonymous with “quality”

“[Our] solution is based on a huge amount of models and predictors which would not be practical as part of a commercial recommender system. However, this result is a direct consequence of the nature and goal of the competition: obtain the highest possible accuracy at any cost, disregarding completely the complexity of the solution and the execution performance."

Team BelKor’s Pragmatic ChaosWinner of the 2009 Netflix Prize

Herlocker et al. “Evaluating Collaborative Filtering Recommender Systems”, ACM TOIS 2004

“We have chosen not to discuss computation performance of recommender algorithms. Such performance is certainly important, and in the future we expect there to be work on the quality of time-limited and memory-limited recommendations.”

Existing Recommender Systems

7

• Incoming stream of rating data: (user, item, rating)• Ratings are used to build a recommendation model as:

– Item-based collaborative filtering: (item, item, similarity) – User-based collaborative filtering: (user, user, similarity)

• Recommendation query:– Item-based collaborative filtering:

• Given a user u, find the top-k items that are most similar to the items that u has liked before

– User-based collaborative filtering:• Given a user u, find the top-k items that the users who are

similar to u have liked

“Online” recommendation environments have all the pieces of a data management problem

Recommender Systems in DBMS

8

• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion

Talk Outline

9

• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion

Talk Outline

10

Lets NOT try to find a new way of doing recommendation*

* ACM RecSys community is already doing excellent job in this frontier. Lets start from there.

RecStore – Main Idea

RecStore pushes the Recommender Systems inside the Database Engine to provide online

support and scale up the computations of existing recommender methods.

11

• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion

Talk Outline

12

RecStore – System Architecture

Model Filter

Intermediate Filter

Rating Data

Model Table

Intermediate Store

Rating Updates

Recommendation Queries

Acce

ss M

etho

ds (I

ndex

, Sca

n)

1

2

3

FAST

MED

IUM

SLOW

SLOW

MED

IUM

FAST

13

• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion

Talk Outline

14

– Adaptivity: RecStore is adaptive to different system workloads (Query Intensive Vs. Update Intensive)

RecStore – System Features

– Extensibility: RecStore is extensible to support many recommendation methods (e.g., item-based CF, user-based CF).

15

RecStore – Adaptivity (1/6)

Model Filter

Intermediate Filter

Rating Data

Model Table

Intermediate Store

Rating Update

Recommendation Queries

Acce

ss M

etho

ds (I

ndex

, Sca

n)

1

2

3

-Low Latency Recommendation Query.-High Storage and maintenance Cost.

Materialize-All (α = β = M)

α

β

16

RecStore – Adaptivity (2/6)

Model Filter

Intermediate Filter

Rating Data

Model Table

Intermediate Store

Rating Update

Recommendation Queries

Acce

ss M

etho

ds (I

ndex

, Sca

n)

1

2

3

-High Latency Recommendation Query-Low Storage and maintenance Cost.

Materialize-None (α = β = 0)

α

β

17

RecStore – Adaptivity (3/6)

Model Filter

Intermediate Filter

Rating Data

Model Table

Intermediate Store

Rating Update

Recommendation Queries

Acce

ss M

etho

ds (I

ndex

, Sca

n)

1

2

3

-Middle Ground between Materialize-All and Materialize-None

Intermediate Store Only (α = M , β = 0)

α

β

18

RecStore – Adaptivity (4/6)

Model Filter

Intermediate Filter

Rating Data

Model Table

Intermediate Store

Rating Update

Recommendation Queries

Acce

ss M

etho

ds (I

ndex

, Sca

n)

1

2

3

-Middle Ground between Materialize-All and Intermediate-Only

Full Intermediate Store / Partial Model Store (α = M , β = N)

α

βN

19

RecStore – Adaptivity (5/6)

Model Filter

Intermediate Filter

Rating Data

Model Table

Intermediate Store

Rating Update

Recommendation Queries

Acce

ss M

etho

ds (I

ndex

, Sca

n)

1

2

3

-Lies between Partial Model and Intermediate Only

Partial Intermediate Store / Partial Model Store (α = K , β = N)

α

β N

K

20

Intermediate Store Only (α = M , β = 0)

Full Intermediate Store / Partial Model Store (α = M , β = N)

RecStore – Adaptivity (6/6)

Model Filter

Intermediate Filter

Rating Data

Model Table

Intermediate Store

Rating Update-Low Latency Recommendation Query.-High Storage and maintenance Cost.

Materialize-All (α = β = M)

-High Latency Recommendation Query-Low Storage and maintenance Cost.

Materialize-None (α = β = 0)

-Middle Ground between Materialize-All and Materialize-None

-Middle Ground between Materialize-All and Intermediate-Only

-Lies between Partial Model and Intermediate Only

Partial Intermediate Store / Partial Model Store (α = K , β = N)

α

β

21

• RecStore is Extensible to support various Recommendation Methods

User-based CF

Item-based CF (Cosine)

MyRec

RecStore

DBMS

Item-based CF (Pearson)

Item-based CF (Probabilistic)

RecStore – Extensibility

• The Application Developer can define a new recommendation method using SQL code

• The recommendation method is registered using the SQL clause:

Define RecStore Model

22

RecStore – Extensibility

DEFINE RECSTORE MODEL ItemItemCosineFROM Ratings R1, Ratings R2WHERE R1.ItemId <> R2.itemId AND R1.userId = R2.userIdWITH INTERMEDIATE STORE:

(R1.itemID as item, R2.itemId as rel_itm, vector_lenp, vector_lenq, dot_prod, co_rate)WITH INTERMEDIATE FILTER:

ALLOW UPDATE WITH My_IntFilterLogic(),UPDATE vector_lenp AS vector_lenp + R1.rating *

R1.rating,UPDATE vector_lenq AS vector_lenp + R2.rating *

R2.rating,UPDATE dot_prod AS ot_prod + R1.rating *

R2.rating,UPDATE co_rate AS 1

WITH MODEL STORE:(R1.itemId as item, R2.itemId as rel_itm, COMPUTED

sim)WITH MODEL FILTER:

ALLOW UPDATE WITH My_ModFilterLogic(),UPDATE sim AS

if (co_rate < 50) co_rate * dot_prod /

( 50*sqrt(vector_lenp) * sqrt (vector_lenq));else co_rate / sqrt(vector_lenp) *

sqrt(vector_lenp);

RecStore

DBMS

Item-based CF (Cosine)

Simple SQL to Plug-in a new Recommendation MethodIn

term

edia

te S

tats

Mod

el S

tore

23

• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion

Talk Outline

24

RecStore – Experimental Evaluation (1/3)

• MovieLens Data• 10 Million ratings• 10k items, 70k users

• Machine– Intel Core2 8400 at 3Ghz

with 4GB of RAM running Ubuntu Linux 8.04

• Techniques– matall: materialize all (α = β = M)– ionly: intermediate store only (α = M and β = 0)– pm-m: partial model store (α = M and β = 20% of all movies)– pm-mi: partial model/partial intermediate (α = 40% and β = 20% of all

movies).– viewreg: Regular PostgreSQL view – viewmat: Simulated materialized view in Postgress

PostgreSQL 8.4

25

RecStore – Experimental Evaluation (2/3)

0.5k 2.5K 4.5K 7k01234567

Update Efficiency

matall ionly pm-mviewmat viewreg pm-mi

Item-Based Cosine Similarity

RecStore is adaptive to a spectrum of workload ranging from query intensive workloads to update Intensive workload

5K 25K 45K 70K0

0.2

0.4

0.6

0.8

Query Efficiency

matall pm-m ionly

26

RecStore – Experimental Evaluation (3/3)

Queries Updates0

1

2

3

4

5

6

7

matall viewmationly pm-m

Real workload trace continuous arrival of both:- rating updates - recommender queries against the MovieLens System.

Item-Based Cosine Similarity

27

• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion

Talk Outline

28

Conclusion: Wrap Up

29

• Recommender Systems have all the ingredients of a data management problem.

• RecStore is a step to incorporate Recommender Systems in the database engine.

• RecStore is adaptive to different system workloads (queries vs. updates)

• RecStore is extensible to support new recommendation methods.

Conclusion: Take-Away Message

30

Questions?