Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media

Yoda: An Accurate and Scalable Web-based Recommendation

Systems

Cyrus Shahabi, Farnoush Banaei-Kashani,

Yi-Shin Chen, and Dennis McLeodIntegrated Media Systems Center and

Computer Science Department,

University of Southern CaliforniaE-mail:{shahabi, banaeika, yishinc, mcleod}@usc.edu

Outline

Motivation Related Work

Content-based Filtering Collaborative Filtering

Offline Process: Clustering, Voting, Aggregation Online Process: Classification & Aggregation Performance Evaluation Conclusion & Future Work

Motivation

The amount of data is enormous on the Web

Users suffer from information overload

Recommendation systems can personalize and customize the

Web environment in real-time Similar to Amazon.com “real-time” recommendations (people who bought

this book also purchased …)

Different approach (vs. association-rule mining)

Challenges: Scalability : As the # of items and users grow, the system stay efficient

Sparsity: Not enough information available on the user

Related Work: Content-Based Filtering

From the Information Retrieval community [Maes1994]

[Shardanand and Maes 1995] [Balabanovi and Shoham 1997]

Based on a comparison between the feature vectors of

items (e.g., artist, style) in the database and the user’s

interest list Major weakness [Balabanovi and Shoham 1997]

Content limitation: only can be applied to few kinds of content, can

only capture certain aspects of the content

Over-specialization: users can only obtain information based on the

content of their profiles

Related Work:Collaborative Filtering(CF)

Employ a user’s item evaluations (not the actual content) to find other similar users: nearest-neighbor algorithm [Resnick et al. 1994]

Three major weaknesses Scalability: time complexity O(U*I) (I:#items, U: #users)

Clustering [Breese et al. 2000]

Bayesian network [Kitts et al. 2000]

Sparsity: profile matrix (i.e., # of user evaluated items) is sparse SVD [Sarwar et al. 2000]

Synonymy: latent association between items is not considered Content analysis [Balabanovi and Shoham 1997]

Categorization [Kohrs and Merialdo 2000]

FuzzyAggregation

FuzzyAggregation

Clusters

Offline Process

PPEDSimilarityMeasure

and

Clustering


and

Clustering

User Navigation Behaviors

User 1

User 2

User 3

User 4

User 5

User U-6

User U-5

User U-4

User U-3

User U-2

User U-1

User U

User 6

VotingVoting

FavoritePVs

(Rock=High

Classical=Low

Pop=Low

Rap=High)

Item Database

ClusterWish-list

0.870.830.72

0.47

0.61

Voting Mechanism

FavoritePVs

(Rock=High

Classical=Low

Pop=Low

Rap=High

Blues=Low)

Rock Classical Pop Rap Blues

High Low Mid High LowProperty Values

VotingVotingRock

Cla

ssical

Blu

es

H M

L H M

L H M

L

51

22

10

7 1

5 6

1 2

1 2

5

37

Cp,f(k)

Mpf=Max{Cp,f(k)

} f in F

pffpp MkCFfffmaxkF ,,

Ranking Items

Item Database

ClusterWish-List

0.87

0.83

0.82

0.79

0.72

0.70

0.68

0.65

0.63

0.61

0.54

0.47

0.42

FuzzyAggregation

FuzzyAggregation

kFpfmaxiv pik ~

fmax{ …}

FavoritePVs

(Rock=High

Classical=Low

Pop=Low

Rap=High

Blues=Low)

Fp(k)

(High*High)

, (Mid*Low), (Low*Low)

Vk(i)


High Low Mid Mid Low

Property Values

ip~

Locality SensitiveHashing algorithm

FavoritePVs

(Rock=High

Classical=Low

Pop=Low

Rap=High

Blues=Low)


High Low Mid Mid Low

Property Values

FuzzyAggregation

FuzzyAggregation

fmax{ }

fiEfmaxiv fkk ,

Optimized Equation

Why optimized: time complexity O(#P*I) (#P: # of properties, I: # of items)

Intuition: the vk(i) value comes from the maximum value among ip pkF ~

Mhigh(k) kGppfmaxkM fif ~)(

)(, kMfiE ffk

f

(High*High), (Low*Mid)

Optimized Equation

Optimized Equation

Time complexity: O(f*I) I=#items f=#fuzzy terms

Satisfy a triangular norm form

Time complexity can be further reduced to O(N) (N: constant number) by

Fagin’s A0 algorithm [Fagin 1996]

kGppfmaxkM fif ~)(

fiEfmaxiv

kMfiE

fkk

ffk

,

, )(



FuzzyAggregation

Clusters

Online Process

Current User’sNavigation Behavior

A List of Similarity Values0.65 0.790.32

UserWish-List

0.87

0.83

0.82

0.79

0.72

0.70

0.68

0.65

0.63

0.61

0.54

0.47

0.42

Cluster Wish-lists0.870.830.72

0.47

0.61

0.870.830.72

0.47

0.61

0.870.830.72

0.47

0.61

Optimized Method

Original Time complexity: O(K*I) K=#clusters I=#items

Time complexity of optimized method: O(f*I) f=#fuzzy terms

Time complexity can be further reduced to O(N) (N:

constant number) by Fagin’s A0 algorithm [Fagin 1996]

Experimental Methodology

Clustering

Generate

Item Database User Set

ClustersSimilarity Matrix

cluster user

Cluster Favorite PVs

Rankingof Items

in Clusters

Generate


Assign Property Valuesto Items:

• Item-PV = f(Cluster-PV, noise)• noise ~ item-rank


ClustersSimilarity Matrix

cluster user

Cluster Favorite PVs

Rankingof Items

in Clusters



H L M N F F L

L M N F F L

M N F F L M

N F F L

M N F F L

L M N F F L

Assign evaluation values to items•Item-Rating = f(Cluster-Ranking, weight)• weight ~ user-cluster similarities




H L M N F F L

L M N F F L

M N F F L M

N F F L

M N F F L

L M N F F L

Training

Testing

Current Session Recommendation

Accuracy Comparison

0

0.05

0.1

0.15

0.2

0.25

0.3

0.35

0.4

0.45

1000 5000

Number of Items

Har

mo

nic

Me

an

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

1.1

Imp

rove

me

nt

Nearest Neighbor Method Yoda Improvementrecallprecision11

2Mean Harmonic

Processing Time Comparison

0

500

1000

1500

2000

2500

0 500 1000 1500 2000 2500 3000 3500 4000 4500

Number of Users

CP

U T

ime

(mil

lise

con

ds/

use

r)

Yoda BNN: Basic Nearest Neighbor Method

Processing Time= CPU +IOIn BNN process: #Items = 5000; #Users = 1000In Yoda process: #Items in each cluster wish-list = 250 #Clusters = 18

Conclusion

Yoda scales as the # of users/items grow

Higher accuracy

Future Work

Compare other techniques

Run more experiments with real data

Incorporate the content-based filtering mechanism into

the user clustering & classification phases

Incorporate the user profiles

Reference

[Shardanand and Maes 1995] U. Shardanand and P. Maes, Social Information Filtering: Algorithm for automating ''Word of Mouth'', proceedings on Human factors in computing systems, Denver,CO,USA , p. 210-217, May, 1995

[Maes 1994] Pattie Maes, Agents that reduce work and information overload, Communications of the ACM, 37(7), p.30-40, 1994

[Balabanovi and Shoham 1997]Marko Balabanovi and Yoav Shoham, Fab: content-based, collaborative recommendation, Communications of the ACM, 40(3), p. 66-72, 1997

[Resnick et al. 1994] P. Resnick and N. Iacovou and M. Suchak and P. Bergstrom and J. Riedl, GroupLens: An Open Architecture for Collaborative Filtering of Netnews, Proceedings of ACM conference on Cumputer-Supported Cooperative Work, Chapel Hill, NC, p.175-186, 1994

[Sarwar et al. 2000] B. Sarwar and G. Karypis and J. Konstan and J.Riedl, Application of Dimensionality Reduction in Recommender System -- A Case Study, ACM WebKDD 2000 Web Mining for E-Commerce Workshop, 2000

[Kohrs and Merialdo 2000] A. Kohrs and B. Merialdo, Using category-based collaborative filtering in the Active WebMuseum, Proceedings of IEEE International Conference on Multimedia and Expo, 1, p.351-354, 2000

Reference

[Kitts et al. 2000] Brendan Kitts and David Freed and Martin Vrieze, Cross-sell: a fast promotion-tunable customer-item recommendation method based on conditionally independent probabilities, Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, Boston, MA USA, p. 437-446, August, 2000

[Breese et al. 2000] J. Breese and D. Heckerman and C. Kadie, Empirical Analysis of Predictive Algorithms for Collaborative Filtering, Proceedings of the Fourteenth Conference on Uncertainty in Artificial Intelligence, Madison, WI USA, p.43-52, July, 1998

Shahabi C., A.M. Zarkesh, J. Adibi, and V. Shah: Knowledge, Discovery from Users Web Page Navigation, Proceedings of the IEEE, RIDE97 Workshop, April, 1997.

Shahabi C., F. Banaei-Kashani, J. Faruque, and A. Faisal: Feature Matrices: A Model for Efficient and Anonymous Web Usage Mining , EC-Web 2001, Germany, September 2001

Fagin R.: Combining Fuzzy Information from Multiple Systems, Proceedings of Fifteenth ACM Symposyum on Principles of Database Systems, Montreal, pp. 216-226, 1996.

Shahabi C., and Y. Chen: A Unified Framework to Incorporate Soft Query into Image Retrieval Systems , International Conference on Enterprise Information Systems, Setubal, Portugal, July 2001

Documents

Yoda: An Accurate and Scalable Web-based Recommendation Systems Cyrus Shahabi, Farnoush Banaei-Kashani, Yi-Shin Chen, and Dennis McLeod Integrated Media