View
1.090
Download
3
Category
Preview:
Citation preview
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
Badrul Sarwar, ”Item-Based Collaborative
Filtering Recommendation Algorithms”,
WWW 2001
Deguchi Lab.
Takashi UMEDA
Mail: umeda07[at]cs.dis.titech.ac.jp
Web: http://umekoumeda.net/
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
Outline…
• Introduction
• Item-Based CF
• Experimental Procedure
• Experimental Result
• Conclusions
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
INTRODUCTION
Chap.1
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-1. My Research Domain
• Evaluating recommendation Algorithms by ABM
– Recommendation:
• Rule Based Approach
• Contents Based Approach
• Collaborative Filtering(CF)
• Bayesian Network
– Why CF?
• It’s mainly used in many websites
– Why ABM?
• To use ABM, Algorithms are optimized according to the
market environment
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-2. What’s CF? (1/2)
• Have you used Amazon.com ?
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-3. What’s CF 2/2
Recommendation
Collaborative Filtering Algorithms(CF) is commonly
used in EC WebSite.
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-4. What’s CF 3/3
Book List
Book List
Prof. Kizima
Prof. Deguchi
CF will
recommend
Prof Deguchi
Follow book,
Based on people
that are similar
with him
They have same books
↓
They have similar
preference
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-5. Contribution of this paper
• Problem of the Basic CF Algorithms
– Basic CF : Nearest Neighbors
– Scalability(Performance)
• High Scalability : In many users, a system recommend for them quickly
– Accuracy(Quality)
• High Accuracy : if the data were sparse, a system recommend the item that a user may like
• In this paper, the Author proposed new Algorithms
– Item-Based CF
– Performance & Quality can be improved
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-6. Collaborative Filtering Process
i1 i2 ・・ i n
u1 a1,2
u2
u3
:
um
Input Data
•U ={ u1,u2,..,um}• I ={i1,i2,..,in}• Iui : item where user ui
evalues, Iui⊆ I• ai,j : evaluation of item ijby user ui
CF-Algorithm
Prediction
Recommendation(Top-N Recommendation)
Output Interface
Pa,j
• Predicted the degree of likeness of item ij by the user ua
• Ir ∩Iua = Φ
A list of N-items that the user will
like the most(Ir⊂I)
•Ir ∩Iua = Φ
User – Item Matrices
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-7.Variation of the CF-Algorithm
CF- Algorithm
Memory Based Approach Model Based Approach
•Procedure(Nearest Neighbor) 1. The system defines a set of
users known as neighborsat on-line
2. The system produces a prediction or top-n recommendation
• Procedure1. The system develops a
model of user ratings at off-line
2. By using the model, the system produce a prediction or top n recommendation
• How developing the mode ? • Bayesian Network• clustering
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-8.What ‘s online and offline ?
Off-line Computation On-line Computation
At a suitable interval,
offline computation is
performed automatically
When a user used the
system, online
Computation is
performed quickly
EX:Google
• Indexing
• Crowling
• Ranking
If you input a query, the
search engine output the
result.
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
1-9.the problem of the basic CF
Weakness of
the Nearest
Neighbor
Accuracy
Sparsity of user-item matrices:
many users may have purchased
well under 1% of the all items →
accuracy of Nearest Neighbor
algorithm may be poor
Scalability
With millions of users and
items, Nearest Neighbor
algorithm may suffer serious
scalability problem
We need new CF-Algorithms………..
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
ITEM-BASED CF
Chap.2
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
2-1. Overview of Item base CF
Off-line Computation On-line Computation
Item Similarity Computation Prediction Computation
Si,j : Similarity between item ii and ij •Pu,i is the degree of the likeness item-i by user-u ,based on the similarity between items,S
i1 i2 ・・ i n
u1 R1,2
u2
u3
:
um
S2n
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
2-2. Item Similarity Computation
• Cosine-Based Similarity
• Correlation-based Similarity
• Adjusted Cosine Similarity
The Difference in rating scale between defferent users
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
2-3.Prediction Computation
• Weighted Sum
• Regression
– Ru,n is calculated by Regression model
– Ri: Target item’s rating(explaining variable)
– Rn: Similar item’s rating (explained variable)
normalization coefficient
•N is the set of item that is very
similar with item I
• |N| : neighbor size
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
2-4. Time Complexity(1/2)
On-line Computation
User Similarity Computation
Prediction Computation
•Computing 1 user-user similarity, Recommend System scan n scores.→ O(n)• Recommend System must computing m × m user-user similarity. →O(m×m)
Action
TimeComplexity
O(m2n) + O(m)
• Computing 1 Pi,j-Value, Recommend System scan m user-user similarity → O(m)
Time complexity of Nearest Neibhor is…..
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
2-4. Time Complexity(2/2)
On-line Computation
Item Similarity Computation
Prediction Computation
Item-Item Similarity is static as opposed the User Similarity → It It’s possible to precompute item Similarity ( = model )
Action
TimeComplexity
O(n)
Computing 1 Pi,j-Value, Recommend System scan n item similarity → O(n)
Time complexity of Item-Based CF is better Performance than Neaest Neighbor
Off-line Computation
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
EXPERIMENTAL PROCEDURE
Chap.3
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
3-1. Experimental Procedure
the data set is divided into a train and a test portion
The Follow parameters is decided.
• Similarity Algorithms
• Train/ Test Ratio(x) : Sparsity level in data
• neighborhood size
user item rating
u1 i2 3u2 i1 2
u6 i3 3
TestTrain
Evaluation
Parameter Learning
1.Data Dividing
2.To fix the optimal values of a parameter
3.Full Experiment To evalue Item based CF, the follow value is measured
• Performance
• Quality
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
3-2. Data Sets
• Data Sets
– Data from website “ MovieLens”
– MovieLens is web based recommender system
– Hundreds of users visit MovieLens to rate and
receive recommendations for movies.
– A data set was converted into a user-item
matrix( 943user × 1682 columns )
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
3-3. Evaluation Metrics
• To evaluating the quality of a recomender system, we use MAE as evaluation metrics.
• MAE: Mean Absolute Error
– pi: Predicted Rating for item I (predicted based on a train data)
– qi: true Rating for item I (from a test data)
– The lower the MAE, the more accurately the recommendation engine predicts user ratings.
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
EXPERIMENTAL RESULTS
Chap.4
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
4-1.Optimal Values of a parameter(1/2)
Item-Similarity Algorithms =
Adjusted cosine is the best
quality
Train-test ratio (x) = 0.8 as an
optimum value
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
4-1.Optimal Values of a parameter(2/2)
Considering both trends,
Optimal choise of
Neighborhood Size
Is 30
In Full Experiment, basic
parameter is as follows.
• Similarity Algorithms:
Adjusted Cosine
• test/train ratio: 0.8
• neighborhood size : 30
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
4-2. Quality
• Quality
• Item-Based CF ( weighted sum ) out perform the nearest-neighbor• Item-Based CF (regression ) out perform the other two cases at low values of x and at low neighborhood size
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
4-3. Performance(1/2)
• model size:– Full model: At item similarity computation,
all item – item similarity(1682×1682) is computed .
– Model size = 200: At item similarity computation, 200 item – 200 item similarity (200×200 ) is computated .
• If model size is small , Good quality is consistent ? – Other model based Approach is consistent
– If it is consistent, online performance is higher than full- model case
• Result: – if model size is 100 ~ 200, it’s possible to
obtain resonably good prediction quality
In the case of not using all item-item similarity , the accurarcy of
prediction don’t down and the performance improve.
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
CONCLUSIONS
Chap.5
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
5. Conclusion
• Quality– Item-based CF provides better quality of predictions
than nearest neighbor Algorithms.• Independent of Neighborhood size and train/test ratio
– The improvement in quality is not large
• Performance– Item-Similarity Computation can be pre-computed
• Item-similarity is static
– High online Performance
– It is possible to retain only a small subset of items and produce good prediction quality& high Performance
http://umekoumeda.net/Summer Seminar 2008 @Susukakedai
THANK YOU
Recommended