Upload
others
View
1
Download
0
Embed Size (px)
Citation preview
LCBM: Statistics-based Parallel Collaborative Filtering
Fabio Petroni 1, Leonardo Querzoni 1, Roberto Beraldi 1, Mario Paolucci 2
Cyber Intelligence and information Security
CIS Sapienza1 Department of Computer Control andManagement Engineering Antonio Ruberti,
Sapienza University of Rome -petroni|querzoni|[email protected]
2 Institute of Cognitive Sciencesand Technologies, CNR -
Larnaca, Cyprus22-23 May, 2014
Personalization
I Most of todays internet businesses deeply root their success in theability to provide users with strongly personalized experiences.
I Recommender Systems: are a subclass of information filteringsystem that seek to predict the rating or preference that userwould give to an item.
2 of 21
Collaborative Filtering
I Collaborative Filtering (CF) represents today’s a widely adoptedstrategy to build recommendation engines.
I CF analyzes the known preferences of a group of users to makepredictions of the unknown preferences for other users.
Filtering
filter the user data stream
satisfying user personal interests
Discovery
recommends new content to the
user that matches his interests
3 of 21
Challenges
I Key factors for recommendation effectiveness:
amount of data availablelarger data-sets improve quality
more than better algorithms
timelinesstimely deliver output to users
constitutes a business advantage
7 Most CF algorithms typically need to go through a very lengthytraining stage, incurring in prohibitive computational costs andlarge time-to-prediction intervals when applied on large data sets.
7 This lack of efficiency is going to quickly limit the applicability ofthese solutions at the current rates of data production growth.
4 of 21
Taxonomy Of CF Algorithms
MEMORY-BASED
MODEL-BASED
Collaborative Filtering
neighborhood models
user-oriented item-oriented
Pearson Correlation Similarity
Cosine Similarity
cluster models
Bayesian networks
linear regression
LATENT FACTORS MODELS
pLSA neural networks
Latent Dirichlet
AllocationMATRIX
FACTORIZATION
ALS SGD SVD
5 of 21
Memory-Based CF Algorithms
I Operate on the entire set of ratings to generate predictions:B compute a similarity score, which reflects the distance between two
users or two items.
- Pearson Correlation Similarity- Cosine Similarity
B Predictions are computed by looking at the ratings of the k mostsimilar users or items (k-nearest neighbor).
3 Simple design and implementation;
3 used in a lot of real-world systems.
7 Impose several scalability limitations;
7 their use is impractical when dealing with large amounts of data.
6 of 21
Model-Based CF Algorithms
I Use the set of available ratings to estimate or learn a model andthen apply this model to make rating predictions.
I The most successful are based on matrix factorization models.
7 of 21
Matrix Factorization 1/2
LOSS(P,Q) =∑
(Rij − PiQj)2
I The most popular techniques to minimize LOSS are AlternatingLeast Squares (ALS) and Stochastic Gradient Descent (SGD).
ALSALS alternates between keeping
P and Q fixed, solving from time
to time a least-squares problem.
SGDSGD works by taking steps
proportional to the negative of the
gradient of the LOSS function.
I Both algorithms need several passes through the set of ratings.
8 of 21
Matrix Factorization 2/2
3 Provide high quality results;3 SGD was chosen by the top three solutions of KDD-Cup 2011;3 there exist parallel and distributed implementations.
7 High computational costs;7 large time-to-prediction intervals;7 especially when applied on large data sets.
SGD ALS LCBM
training time O(X · K · E ) Ω(K 3(N + M) + K 2X ) O(X )
prediction time O(K ) O(K ) O(1)
memory usage O(K (N + M)) O(M2 + X ) O(N + M)
In the table X is the number of collected ratings, K the number of hidden
features, E the iterations, N and M the number of users and items respectively.
9 of 21
Contributions
In this paper we present:
LCBM: Linear Classifier of Beta distributions Means
a novel collaborative filtering algorithm for binary ratings that:
3 is inherently parallelizable;
3 provides results whose quality is on-par with state-of-the-artsolutions;
3 in shorter time;
3 using less computational resources.
10 of 21
Overview
LCBMWorking phase
ratings
Training phaseItem rating PDF
inferenceratings peritem
User profilerratings peruser
standarderror
mean
>predictions
threshold
I We consider:B items as elements whose tendency to be rated positively/negatively
can be statistically characterized using a probability density function;B users as entities with different tastes that rate the same items using
different criteria and that must thus be profiled;I The algorithm needs two passes over the set of available ratings
to train the model (build users and items profiles).11 of 21
Item Rating PDF Inference
I Beta distribution:B X = the relative frequency of positive
votes that the item will receive.B α = OK + 1, β = KO + 1
Item profile
MEAN = αα+β = OK+1
OK+KO+2
SE = 1OK+KO+2
√(OK+1)(KO+1)
(OK+KO)(OK+KO+3)
I Chebyshev’s inequality:
Pr(MEAN−λSE ≤ X ≤ MEAN+λSE ) ≥ 1− 1
λ2
0
0.5
1
1.5
2
2.5
3
0 0.2 0.4 0.6 0.8 1
PD
F
X
I Example:B OK = 8, KO = 3B MEAN ≈ 0.7B SE ≈ 0.04
– Pr(0.62 ≤ X ≤ 0.78) ≥ 0.75
12 of 21
User Profiler
0.1 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.65 0.7 0.8 0.85
discriminant function
OK errors KO errors
I Every rating is represented by a point with two attributes:B a boolean value containing the rating (OK or KO).B a key ∈ [0, 1] that gives the point’s rank in the order;
I if value = OK → key = MEAN + λSEI if value = KO → key = MEAN − λSE
User profile
Quality threshold (QT) = the discriminant function of a 1D linearclassifier that separates the points with a minimal number of errors.
13 of 21
Working Phase
LCBMWorking phase
ratings
Training phaseItem rating PDF
inferenceratings peritem
User profilerratings peruser
standarderror
mean
>predictions
threshold
I In order to produce a prediction the algorithm simply comparesQT values from the user profiler with MEAN values from theitem’s rating PDF inference block:B if MEAN > QT → prediction = OKB if MEAN ≤ QT → prediction = KO
14 of 21
Experiments: Setting and The Data Sets
Dataset MovieLens 100k MovieLens 10M Netflix
users 943 69878 480189items 1682 10677 17770
ratings 100000 10000054 100480507
I 5-fold cross-validation approach:B This technique divides the dataset in 5 folds and then uses in turn
one of the folds as test set and the remaining ones as training set.
I We compared LCBM against CFimplementations in Apache Mahout:
B ParallelSGDFactorizer: lock-free and parallel implementation of SGDB ALSWRFactorizer: ALS with Weighted-λ-Regularization
15 of 21
Experiments: Performance Metrics
I Confusion matrix: true positives (TP) false negatives (FN)
false positives (FP) true negatives (TN)
I Matthews correlation coefficient (MCC) ∈ [−1, 1]
MCC =TP × TN − FP × FN√
(TP + FP)× (TP + FN)× (TN + FP)× (TN + FN)
B 1→ perfect prediction;B 0→ no better than random prediction;B −1→ total disagreement between prediction and observation.
I time needed to run the test;I the peak memory load during the test.
16 of 21
Results: Accuracy
0
0.1
0.2
0.3
0.4
0.5
0.6
MovieLens (105) MovieLens (10
7) Netflix (10
8)
MC
C
number of ratings
LCBMSGD K=256
SGD K=32SGD K=8
ALS
0
0.1
0.2
0.3
0.4
0.5
0.6
MovieLens (105) MovieLens (10
7) Netflix (10
8)
MC
C
number of ratings
LCBMSGD K=256
SGD K=32SGD K=8
ALS
17 of 21
Results: Computational Resources
0
2
4
6
8
10
12
14
16
18
20
MovieLens (105) MovieLens (10
7) Netflix (10
8)
me
mo
ry (
GB
)
number of ratings
LCBMSGD K=256
SGD K=32SGD K=8
ALS
0
2
4
6
8
10
12
14
16
18
20
MovieLens (105) MovieLens (10
7) Netflix (10
8)
me
mo
ry (
GB
)
number of ratings
LCBMSGD K=256
SGD K=32SGD K=8
ALS
0
0.2
Zoom
0
0.2
Zoom
18 of 21
Results: Timeliness
0
500
1000
1500
2000
2500
3000
3500
4000
4500
MovieLens (105) MovieLens (10
7) Netflix (10
8)
tim
e (
s)
number of ratings
LCBMSGD K=256
SGD K=32SGD K=8
ALS
0
500
1000
1500
2000
2500
3000
3500
4000
4500
MovieLens (105) MovieLens (10
7) Netflix (10
8)
tim
e (
s)
number of ratings
LCBMSGD K=256
SGD K=32SGD K=8
ALS
0
10
Zoom
0
10
Zoom
19 of 21
Conclusions
I LCBM is a novel CF algorithm with binary ratings.I It works by analyzing collected ratings to:
B infer a probability density function of the relative frequency ofpositive votes that the item will receive;
B compute a personalized quality threshold for each user.
I These two information are used to predict missing ratings.
I LCBM is inherently parallelizable.
Future works:
I Handle multidimensional ratings.
I Provide recommendations: a list of items the user will like themost.
20 of 21
Thank you!
Questions?
Fabio PetroniRome, Italy
Current position:PhD Student in Computer Engineering, Sapienza University of Rome
Research Areas:Recommendation Systems, Collaborative Filtering, Distributed Systems
21 of 21