Upload
margarita-bagshaw
View
214
Download
0
Tags:
Embed Size (px)
Citation preview
Mining customer ratings for product recommendation using the support vector machine and
the latent class model
William K. Cheung, James T. Kwok, Martin H. Law,
Kwok-Ching Tsui
Intelligent Systems Research Group, BT Laboratories
Hong Kong Baptist University
What is a Recommender System?
RecommenderSystem . . .
Records of other customers(possibly with ratings)
Product Recommendation in E-commerce
Products
Recommendations
www.amazon.com
Product Recommendation in E-commerce
Products
Recommendations
www.cdnow.com
Overview
Content-based Recommender
System
Personal Profile
CollaborativeRecommender
System. . .
Records of other customers(possibly with ratings)
Ratings Ratings
The Support VectorMachine (SVM)
The Support VectorMachine (SVM)
The Extended LatentClass Model (ELCM)
The Extended LatentClass Model (ELCM)
Presentation Outline
• Content-based Recommendation– Existing Solutions and Their Limitations– Our Proposed Solution - the SVM
• Collaborative Recommendation– Existing Solutions and Their Limitations– Our Proposed Solution - the Extended LCM
• Experimental Evaluation• Conclusion and Future Works
Content-based Recommendation• Matching between the personal profile and the
features extracted from product descriptions.• Assumptions:
– Customer personal profiles are available.
– Detailed product descriptions are available so that a set of representative features can be extracted.
– Both the profiles and the product descriptions share the same representation.
Content-based Recommender
System
Personal Profile
Some Existing Solutions
• Keyword Matching– problems of synonymy and polysemy.
• Pattern Classification Approaches– f={ f1(y), f2(y), … fm(y) } the set of features for product y
– ax(f(y)) the classifier output for customer x’s interest obtained via training, such that
yx 1 (y))(ax in interested is fyx 0 (y))(ax in interested NOT is f
– Examples of classifiers:• Naïve Bayes, k-NN, C4.5 (decision tree)
Feature Selection Problem
• The performance of content-based recommendation depends heavily on the discriminative power of the features selected to be extracted.– Too few features => hard to learn useful profiles
(shallow analysis)– Too many features => hard to estimate the
classifier’s parameters with good generalisation performance.
Our Proposed Solution - the use of SVM
• The Support Vector Machine has been shown to be able to achieve good generalisation performance for classification of high-dimensional data sets and its training can be framed as solving a quadratic programming problem.
• => ones can simply use all extracted features for the input and there is no need for feature selection at all.
Pattern Classification...
Which line is the best?(Training and Generalization)
• Intuitively, maximize the margin between classes • Theoretically sound
– related to minimizing the VC-dimension under the theory of structural risk minimization
Support Vector Machine (SVM)
margin
Solving for the line
• Computationally, this leads to a quadratic programming problem– maximize a quadratic objective function
subject to some linear constraints– no local maximum (cf neural networks)
Support Vectors
• The line depends only on a small number of training examples.
Nonlinear Cases
• use another coordinates system such that the “curve” becomes a “line”
Kernels
• Only inner products, (x)T (y) , are involved in the calculation
• Under certain conditions, there exists a kernel K such that K(x,y)=(x)T (y)
– e.g. Polynomial of degree d: K(x,y)=(xTy+1)d
• replace xTy by (x)T (y)
Overlapping Cases
• Impossible to perfectly separates the two classes– Include an error term
• Instead of maximizing margin, minimize error + / margin
• Again, involves only quadratic programming
Collaborative Recommendation
• Matching between the customer’s ratings with the ratings of others (the word-of-mouth approach).
• Assumptions:– Customer ratings of a reasonably large group of
customers are available.
– Each product has been rated by some of the customers.
– The product ratings are overlapping to certain degrees.
CollaborativeRecommender
System
. . .
Records of other customers(possibly with ratings)
Product Ratings
Product Ratings
Some Existing Solutions
• Memory-based Approach– Pearson Correlation Coefficient
– … and its variants– suffer from the sparsity and the first-rater problems.
• Model-based Approach– solve the sparsity problem by incorporating a priori models.– E.g., Naïve Bayes Classifier, Bayesian Network, Latent
Class Model
)()()()(
)()(),(
BBT
BBAAT
AA
BBT
AABA
vvvvvvvv
vvvvxxw
Limitations• The sparsity problem (lacking sufficient ratings)• The first-rater problem (encountering new products)
A New Customer xn
Customer x1
Customer x2
Customer x3
5 4
-
- - - - - -
5 -4 - - - -
1 -- 4 4 - - -
5 -- -- - - -
Grouping Preference Ratings
A New Customer xn
Customer x1
Customer x2
Customer x3
5 4
-
- - - - - -
5 -4 - - - -
1 -- 4 4 - - -
5 -- -- - - -
- to solve the sparsity problem PreferencePattern #1
PreferencePattern #2
Recommended !Recommended !
Integrating Product Contents
A New Customer xn
Customer x1
Customer x2
Customer x3
5 4
-
- - - - - -
5 -4 - - - -
1 -- 4 4 - - -
5 -- -- - - -
PreferencePattern #1
PreferencePattern #2
- to solve the first-rater problem
Recommended !
Our Proposed Solution - the use of LCM
• The latent class model has been proposed by Thomas Hofmann et al. in IJCAI’99 for clustering preference ratings with promising results.
• Limitation: only capable of recommending products to customers in the training set.
• We extend their model so that– a) Existing products can be recommended to the
customers not in the training set– b) New products can be recommended to the existing
customers (not described in the paper).
Latent Class Model
CustomerX
ProductY
PreferencePattern Z
Observed Hidden
Model Training:Learn P(z), P(x|z) andP(y|z) usingthe EM algorithm. Themodel initialization isdone by theK-means clustering.
z
zyPzxPzPyxP )|()|()(),(
Existing Products to Existing Customers
• Compute the probabilities that x is interested in y
• Products can then be sorted according to the values
of P(y|x) for recommendation.
)|()()|(
)()|(
)|()|()|(
zyPzPzxP
zPzxP
zyPxzPxyP
z z
z
Extension 1: Existing Products to New Customers
z
nn zyPxzPxyP )|()|()|(
),()()|(
),()|(ˆ
),|(ˆ)|(
hnYy
h
hnYy
h
hnn
yxnzPzyP
yxnyzP
YxzPxzP
hh
hh
xn is not inside thetraining set. Thus, wedon’t have P(z|xn).
Inner product of the pdf of pattern z and the ratings of xn.
Extension 2: New Products to Existing Customers
z
nn zyPxzPxyP )|()|()|(
yn is not inside thetraining set. Thus, wedon’t have P(yn|z).
zzny
zny
n
nnn
zfyf
zfyfyzP
zP
yPyzPzyP
)()(exp
)()(exp)|(
)(
)(ˆ)|()|(
distance between yn and z in the feature space
Performance Measures
• accuracy: the percentage of correct recommendations • recall: the percentage of interesting products that can be
located in the output list• precision: the percentage of products in the output list which
are really interesting to the customer.• break-even point: The point where recall = precision
• expected utility:
– its value is high if the products rated high appear early in the output list.
jj
iji
dvR
)1/()1(2
)0,max( )( minmaxmin
iiii RRRRutility
Experiment One: Setup(content-based by SVM)
• Product ratings data set– EachMovie (from DEC)
• Product description data set– Internet Movie Database (http://www.imdb.com)– Size of feature set = 6620, including
• Release date, Runtime, Language, Director, Producer, Original music, Writing credit, ...
• No. of products = 1628 – 5-fold cross-validation– ~1200 for training and remaining for testing
• No. of customers = 100
Experiment One: Results(content-based by SVM)
Accuracy(%)
Break-evenPoint (%)
Utility (%)
SVM 77 80.3 65
Naïve Bayes 76 78.8 61
C4.5rules (100) 74 76.0 52
C4.5rules(400) 75 75.1 52
1-NN 69 76.2 45
C4.5(100) 74 - -
C4.5 (400) 74 - -
majority 75 - -
Experiment Two: Setup(collaborative by ELCM)
• Ratings data set– EachMovie (from DEC)
• Training– No. of products = 500– No. of customers = 90
• Testing– No. of customers = 10– No. of products = 250– Size of the product set where ratings are considered for
matching, L = {10, 63, 83, 125, 250}
Experiment Two: Results(collaborative by ELCM)
Accuracy (%) Break-even point(%)
Utility (%)L
No. oflatent
classes ELCM P-Corr ELCM P-Corr ELCM P-Corr
25061015
636263
6175.677.075.5
75.6576259
65
12561015
616262
6075.676.675.3
73.5576360
61
8361015
616062
5075.677.372.3
73.3586160
52
6361015
615860
5175.677.272.3
71.0596259
52
1061015
626162
4072.071.070.5
63.1565655
45
Conclusion and Future Works
• SVM and ELCM are empirically shown to be promising for content-based recommendation and collaborative recommendation, respectively.
• Future works– ELCM
• Model Enhancement - BiELCM, hierarchical, ...
• Scalability issue of the EM algorithm for ELCM
• Modelling dynamic preference patterns
• Applications to cross-selling?
– Integration of SVM and ELCM for improvement