Upload
lee-harper
View
223
Download
0
Tags:
Embed Size (px)
Citation preview
RecStore An Extensible and Adaptive Framework for Online Recommender Queries inside the
Database Engine
2
• Microsoft Research:–Justin J. Levandoski
• University of Minnesota:–Mohamed Sarwat–Mohamed F. Mokbel –Michael D. Ekstrand
Authors
3
Recommender Systems – Basic Idea
• Users: provide opinions on items consumed/watched/listened to…
• The system: provides the user suggestions for new items
4
• Analyze user behavior to recommend users personalized and interesting things to do/read/see
rate movies
MovieRatings
build recommendation
model
SimilarUsers
Similar Items
recommendationquery
“Recommend user A five movies”OfflineOnline
Recommender Systems – Basic Idea
5
Things have changed !
• We live in an increasingly social and “real-time” world– Number of things to recommend is growing exponentially– Users expressing opinions faster than ever– Recommendations change second-to-second
“Offline” step can no longer be tolerated
“Like” button
NY Times “Recommend” button
Facebook Posts Blog/News Items
6
• No work has explored recommender system performance – Performance has always been synonymous with “quality”
“[Our] solution is based on a huge amount of models and predictors which would not be practical as part of a commercial recommender system. However, this result is a direct consequence of the nature and goal of the competition: obtain the highest possible accuracy at any cost, disregarding completely the complexity of the solution and the execution performance."
Team BelKor’s Pragmatic ChaosWinner of the 2009 Netflix Prize
Herlocker et al. “Evaluating Collaborative Filtering Recommender Systems”, ACM TOIS 2004
“We have chosen not to discuss computation performance of recommender algorithms. Such performance is certainly important, and in the future we expect there to be work on the quality of time-limited and memory-limited recommendations.”
Existing Recommender Systems
7
• Incoming stream of rating data: (user, item, rating)• Ratings are used to build a recommendation model as:
– Item-based collaborative filtering: (item, item, similarity) – User-based collaborative filtering: (user, user, similarity)
• Recommendation query:– Item-based collaborative filtering:
• Given a user u, find the top-k items that are most similar to the items that u has liked before
– User-based collaborative filtering:• Given a user u, find the top-k items that the users who are
similar to u have liked
“Online” recommendation environments have all the pieces of a data management problem
Recommender Systems in DBMS
8
• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion
Talk Outline
9
• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion
Talk Outline
10
Lets NOT try to find a new way of doing recommendation*
* ACM RecSys community is already doing excellent job in this frontier. Lets start from there.
RecStore – Main Idea
RecStore pushes the Recommender Systems inside the Database Engine to provide online
support and scale up the computations of existing recommender methods.
11
• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion
Talk Outline
12
RecStore – System Architecture
Model Filter
Intermediate Filter
Rating Data
Model Table
Intermediate Store
Rating Updates
Recommendation Queries
Acce
ss M
etho
ds (I
ndex
, Sca
n)
1
2
3
FAST
MED
IUM
SLOW
SLOW
MED
IUM
FAST
13
• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion
Talk Outline
14
– Adaptivity: RecStore is adaptive to different system workloads (Query Intensive Vs. Update Intensive)
RecStore – System Features
– Extensibility: RecStore is extensible to support many recommendation methods (e.g., item-based CF, user-based CF).
15
RecStore – Adaptivity (1/6)
Model Filter
Intermediate Filter
Rating Data
Model Table
Intermediate Store
Rating Update
Recommendation Queries
Acce
ss M
etho
ds (I
ndex
, Sca
n)
1
2
3
-Low Latency Recommendation Query.-High Storage and maintenance Cost.
Materialize-All (α = β = M)
α
β
16
RecStore – Adaptivity (2/6)
Model Filter
Intermediate Filter
Rating Data
Model Table
Intermediate Store
Rating Update
Recommendation Queries
Acce
ss M
etho
ds (I
ndex
, Sca
n)
1
2
3
-High Latency Recommendation Query-Low Storage and maintenance Cost.
Materialize-None (α = β = 0)
α
β
17
RecStore – Adaptivity (3/6)
Model Filter
Intermediate Filter
Rating Data
Model Table
Intermediate Store
Rating Update
Recommendation Queries
Acce
ss M
etho
ds (I
ndex
, Sca
n)
1
2
3
-Middle Ground between Materialize-All and Materialize-None
Intermediate Store Only (α = M , β = 0)
α
β
18
RecStore – Adaptivity (4/6)
Model Filter
Intermediate Filter
Rating Data
Model Table
Intermediate Store
Rating Update
Recommendation Queries
Acce
ss M
etho
ds (I
ndex
, Sca
n)
1
2
3
-Middle Ground between Materialize-All and Intermediate-Only
Full Intermediate Store / Partial Model Store (α = M , β = N)
α
βN
19
RecStore – Adaptivity (5/6)
Model Filter
Intermediate Filter
Rating Data
Model Table
Intermediate Store
Rating Update
Recommendation Queries
Acce
ss M
etho
ds (I
ndex
, Sca
n)
1
2
3
-Lies between Partial Model and Intermediate Only
Partial Intermediate Store / Partial Model Store (α = K , β = N)
α
β N
K
20
Intermediate Store Only (α = M , β = 0)
Full Intermediate Store / Partial Model Store (α = M , β = N)
RecStore – Adaptivity (6/6)
Model Filter
Intermediate Filter
Rating Data
Model Table
Intermediate Store
Rating Update-Low Latency Recommendation Query.-High Storage and maintenance Cost.
Materialize-All (α = β = M)
-High Latency Recommendation Query-Low Storage and maintenance Cost.
Materialize-None (α = β = 0)
-Middle Ground between Materialize-All and Materialize-None
-Middle Ground between Materialize-All and Intermediate-Only
-Lies between Partial Model and Intermediate Only
Partial Intermediate Store / Partial Model Store (α = K , β = N)
α
β
21
• RecStore is Extensible to support various Recommendation Methods
User-based CF
Item-based CF (Cosine)
MyRec
RecStore
DBMS
Item-based CF (Pearson)
Item-based CF (Probabilistic)
RecStore – Extensibility
• The Application Developer can define a new recommendation method using SQL code
• The recommendation method is registered using the SQL clause:
Define RecStore Model
22
RecStore – Extensibility
DEFINE RECSTORE MODEL ItemItemCosineFROM Ratings R1, Ratings R2WHERE R1.ItemId <> R2.itemId AND R1.userId = R2.userIdWITH INTERMEDIATE STORE:
(R1.itemID as item, R2.itemId as rel_itm, vector_lenp, vector_lenq, dot_prod, co_rate)WITH INTERMEDIATE FILTER:
ALLOW UPDATE WITH My_IntFilterLogic(),UPDATE vector_lenp AS vector_lenp + R1.rating *
R1.rating,UPDATE vector_lenq AS vector_lenp + R2.rating *
R2.rating,UPDATE dot_prod AS ot_prod + R1.rating *
R2.rating,UPDATE co_rate AS 1
WITH MODEL STORE:(R1.itemId as item, R2.itemId as rel_itm, COMPUTED
sim)WITH MODEL FILTER:
ALLOW UPDATE WITH My_ModFilterLogic(),UPDATE sim AS
if (co_rate < 50) co_rate * dot_prod /
( 50*sqrt(vector_lenp) * sqrt (vector_lenq));else co_rate / sqrt(vector_lenp) *
sqrt(vector_lenp);
RecStore
DBMS
Item-based CF (Cosine)
Simple SQL to Plug-in a new Recommendation MethodIn
term
edia
te S
tats
Mod
el S
tore
23
• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion
Talk Outline
24
RecStore – Experimental Evaluation (1/3)
• MovieLens Data• 10 Million ratings• 10k items, 70k users
• Machine– Intel Core2 8400 at 3Ghz
with 4GB of RAM running Ubuntu Linux 8.04
• Techniques– matall: materialize all (α = β = M)– ionly: intermediate store only (α = M and β = 0)– pm-m: partial model store (α = M and β = 20% of all movies)– pm-mi: partial model/partial intermediate (α = 40% and β = 20% of all
movies).– viewreg: Regular PostgreSQL view – viewmat: Simulated materialized view in Postgress
PostgreSQL 8.4
25
RecStore – Experimental Evaluation (2/3)
0.5k 2.5K 4.5K 7k01234567
Update Efficiency
matall ionly pm-mviewmat viewreg pm-mi
Item-Based Cosine Similarity
RecStore is adaptive to a spectrum of workload ranging from query intensive workloads to update Intensive workload
5K 25K 45K 70K0
0.2
0.4
0.6
0.8
Query Efficiency
matall pm-m ionly
26
RecStore – Experimental Evaluation (3/3)
Queries Updates0
1
2
3
4
5
6
7
matall viewmationly pm-m
Real workload trace continuous arrival of both:- rating updates - recommender queries against the MovieLens System.
Item-Based Cosine Similarity
27
• RecStore Main Idea• RecStore System Architecture• RecStore System Features• RecStore Experimental Results• Conclusion
Talk Outline
29
• Recommender Systems have all the ingredients of a data management problem.
• RecStore is a step to incorporate Recommender Systems in the database engine.
• RecStore is adaptive to different system workloads (queries vs. updates)
• RecStore is extensible to support new recommendation methods.
Conclusion: Take-Away Message