View
216
Download
1
Embed Size (px)
Citation preview
Yoda: An Accurate and Scalable Web-based Recommendation
Systems
Cyrus Shahabi, Farnoush Banaei-Kashani,
Yi-Shin Chen, and Dennis McLeodIntegrated Media Systems Center and
Computer Science Department,
University of Southern CaliforniaE-mail:{shahabi, banaeika, yishinc, mcleod}@usc.edu
Outline
Motivation Related Work
Content-based Filtering Collaborative Filtering
Offline Process: Clustering, Voting, Aggregation Online Process: Classification & Aggregation Performance Evaluation Conclusion & Future Work
Motivation
The amount of data is enormous on the Web
Users suffer from information overload
Recommendation systems can personalize and customize the
Web environment in real-time Similar to Amazon.com “real-time” recommendations (people who bought
this book also purchased …)
Different approach (vs. association-rule mining)
Challenges: Scalability : As the # of items and users grow, the system stay efficient
Sparsity: Not enough information available on the user
Related Work: Content-Based Filtering
From the Information Retrieval community [Maes1994]
[Shardanand and Maes 1995] [Balabanovi and Shoham 1997]
Based on a comparison between the feature vectors of
items (e.g., artist, style) in the database and the user’s
interest list Major weakness [Balabanovi and Shoham 1997]
Content limitation: only can be applied to few kinds of content, can
only capture certain aspects of the content
Over-specialization: users can only obtain information based on the
content of their profiles
Related Work:Collaborative Filtering(CF)
Employ a user’s item evaluations (not the actual content) to find other similar users: nearest-neighbor algorithm [Resnick et al. 1994]
Three major weaknesses Scalability: time complexity O(U*I) (I:#items, U: #users)
Clustering [Breese et al. 2000]
Bayesian network [Kitts et al. 2000]
Sparsity: profile matrix (i.e., # of user evaluated items) is sparse SVD [Sarwar et al. 2000]
Synonymy: latent association between items is not considered Content analysis [Balabanovi and Shoham 1997]
Categorization [Kohrs and Merialdo 2000]
FuzzyAggregation
FuzzyAggregation
Clusters
Offline Process
PPEDSimilarityMeasure
and
Clustering
PPEDSimilarityMeasure
and
Clustering
User Navigation Behaviors
User 1
User 2
User 3
User 4
User 5
User U-6
User U-5
User U-4
User U-3
User U-2
User U-1
User U
User 6
VotingVoting
FavoritePVs
(Rock=High
Classical=Low
Pop=Low
Rap=High)
Item Database
ClusterWish-list
0.870.830.72
0.47
0.61
Voting Mechanism
FavoritePVs
(Rock=High
Classical=Low
Pop=Low
Rap=High
Blues=Low)
Rock Classical Pop Rap Blues
High Low Mid High LowProperty Values
VotingVotingRock
Cla
ssical
Blu
es
H M
L H M
L H M
L
51
22
10
7 1
5 6
1 2
1 2
5
37
Cp,f(k)
Mpf=Max{Cp,f(k)
} f in F
pffpp MkCFfffmaxkF ,,
Ranking Items
Item Database
ClusterWish-List
0.87
0.83
0.82
0.79
0.72
0.70
0.68
0.65
0.63
0.61
0.54
0.47
0.42
FuzzyAggregation
FuzzyAggregation
kFpfmaxiv pik ~
fmax{ …}
FavoritePVs
(Rock=High
Classical=Low
Pop=Low
Rap=High
Blues=Low)
Fp(k)
(High*High)
, (Mid*Low), (Low*Low)
Vk(i)
Rock Classical Pop Rap Blues
High Low Mid Mid Low
Property Values
ip~
Locality SensitiveHashing algorithm
FavoritePVs
(Rock=High
Classical=Low
Pop=Low
Rap=High
Blues=Low)
Rock Classical Pop Rap Blues
High Low Mid Mid Low
Property Values
FuzzyAggregation
FuzzyAggregation
fmax{ }
fiEfmaxiv fkk ,
Optimized Equation
Why optimized: time complexity O(#P*I) (#P: # of properties, I: # of items)
Intuition: the vk(i) value comes from the maximum value among ip pkF ~
Mhigh(k) kGppfmaxkM fif ~)(
)(, kMfiE ffk
f
(High*High), (Low*Mid)
Optimized Equation
Optimized Equation
Time complexity: O(f*I) I=#items f=#fuzzy terms
Satisfy a triangular norm form
Time complexity can be further reduced to O(N) (N: constant number) by
Fagin’s A0 algorithm [Fagin 1996]
kGppfmaxkM fif ~)(
fiEfmaxiv
kMfiE
fkk
ffk
,
, )(
PPEDSimilarityMeasure
PPEDSimilarityMeasure
FuzzyAggregation
Clusters
Online Process
Current User’sNavigation Behavior
A List of Similarity Values0.65 0.790.32
UserWish-List
0.87
0.83
0.82
0.79
0.72
0.70
0.68
0.65
0.63
0.61
0.54
0.47
0.42
Cluster Wish-lists0.870.830.72
0.47
0.61
0.870.830.72
0.47
0.61
0.870.830.72
0.47
0.61
Optimized Method
Original Time complexity: O(K*I) K=#clusters I=#items
Time complexity of optimized method: O(f*I) f=#fuzzy terms
Time complexity can be further reduced to O(N) (N:
constant number) by Fagin’s A0 algorithm [Fagin 1996]
Experimental Methodology
Clustering
Generate
Item Database User Set
ClustersSimilarity Matrix
cluster user
Cluster Favorite PVs
Rankingof Items
in Clusters
Generate
User Navigation Behaviors
Assign Property Valuesto Items:
• Item-PV = f(Cluster-PV, noise)• noise ~ item-rank
Experimental Methodology
ClustersSimilarity Matrix
cluster user
Cluster Favorite PVs
Rankingof Items
in Clusters
User Navigation Behaviors
Item Database User Set
H L M N F F L
L M N F F L
M N F F L M
N F F L
M N F F L
L M N F F L
Assign evaluation values to items•Item-Rating = f(Cluster-Ranking, weight)• weight ~ user-cluster similarities
Experimental Methodology
Item Database User Set
User Navigation Behaviors
H L M N F F L
L M N F F L
M N F F L M
N F F L
M N F F L
L M N F F L
Training
Testing
Current Session Recommendation
Accuracy Comparison
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
0.45
1000 5000
Number of Items
Har
mo
nic
Me
an
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
1.1
Imp
rove
me
nt
Nearest Neighbor Method Yoda Improvementrecallprecision11
2Mean Harmonic
Processing Time Comparison
0
500
1000
1500
2000
2500
0 500 1000 1500 2000 2500 3000 3500 4000 4500
Number of Users
CP
U T
ime
(mil
lise
con
ds/
use
r)
Yoda BNN: Basic Nearest Neighbor Method
Processing Time= CPU +IOIn BNN process: #Items = 5000; #Users = 1000In Yoda process: #Items in each cluster wish-list = 250 #Clusters = 18
Conclusion
Yoda scales as the # of users/items grow
Higher accuracy
Future Work
Compare other techniques
Run more experiments with real data
Incorporate the content-based filtering mechanism into
the user clustering & classification phases
Incorporate the user profiles
Reference
[Shardanand and Maes 1995] U. Shardanand and P. Maes, Social Information Filtering: Algorithm for automating ''Word of Mouth'', proceedings on Human factors in computing systems, Denver,CO,USA , p. 210-217, May, 1995
[Maes 1994] Pattie Maes, Agents that reduce work and information overload, Communications of the ACM, 37(7), p.30-40, 1994
[Balabanovi and Shoham 1997]Marko Balabanovi and Yoav Shoham, Fab: content-based, collaborative recommendation, Communications of the ACM, 40(3), p. 66-72, 1997
[Resnick et al. 1994] P. Resnick and N. Iacovou and M. Suchak and P. Bergstrom and J. Riedl, GroupLens: An Open Architecture for Collaborative Filtering of Netnews, Proceedings of ACM conference on Cumputer-Supported Cooperative Work, Chapel Hill, NC, p.175-186, 1994
[Sarwar et al. 2000] B. Sarwar and G. Karypis and J. Konstan and J.Riedl, Application of Dimensionality Reduction in Recommender System -- A Case Study, ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000
[Kohrs and Merialdo 2000] A. Kohrs and B. Merialdo, Using category-based collaborative filtering in the Active WebMuseum, Proceedings of IEEE International Conference on Multimedia and Expo, 1, p.351-354, 2000
Reference
[Kitts et al. 2000] Brendan Kitts and David Freed and Martin Vrieze, Cross-sell: a fast promotion-tunable customer-item recommendation method based on conditionally independent probabilities, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA USA, p. 437-446, August, 2000
[Breese et al. 2000] J. Breese and D. Heckerman and C. Kadie, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI USA, p.43-52, July, 1998
Shahabi C., A.M. Zarkesh, J. Adibi, and V. Shah: Knowledge, Discovery from Users Web Page Navigation, Proceedings of the IEEE, RIDE97 Workshop, April, 1997.
Shahabi C., F. Banaei-Kashani, J. Faruque, and A. Faisal: Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining , EC-Web 2001, Germany, September 2001
Fagin R.: Combining Fuzzy Information from Multiple Systems, Proceedings of Fifteenth ACM Symposyum on Principles of Database Systems, Montreal, pp. 216-226, 1996.
Shahabi C., and Y. Chen: A Unified Framework to Incorporate Soft Query into Image Retrieval Systems , International Conference on Enterprise Information Systems, Setubal, Portugal, July 2001