Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin

Mining customer ratings for product recommendation using the support vector machine and

the latent class model

William K. Cheung, James T. Kwok, Martin H. Law,

Kwok-Ching Tsui

Intelligent Systems Research Group, BT Laboratories

Hong Kong Baptist University

What is a Recommender System?

RecommenderSystem . . .

Records of other customers(possibly with ratings)

Product Recommendation in E-commerce

Products

Recommendations

www.amazon.com

Product Recommendation in E-commerce

Products

Recommendations

www.cdnow.com

Overview

Content-based Recommender

System

Personal Profile

CollaborativeRecommender

System. . .


Ratings Ratings

The Support VectorMachine (SVM)

The Support VectorMachine (SVM)

The Extended LatentClass Model (ELCM)

The Extended LatentClass Model (ELCM)

Presentation Outline

• Content-based Recommendation– Existing Solutions and Their Limitations– Our Proposed Solution - the SVM

• Collaborative Recommendation– Existing Solutions and Their Limitations– Our Proposed Solution - the Extended LCM

• Experimental Evaluation• Conclusion and Future Works

Content-based Recommendation• Matching between the personal profile and the

features extracted from product descriptions.• Assumptions:

– Customer personal profiles are available.

– Detailed product descriptions are available so that a set of representative features can be extracted.

– Both the profiles and the product descriptions share the same representation.

Content-based Recommender

System

Personal Profile

Some Existing Solutions

• Keyword Matching– problems of synonymy and polysemy.

• Pattern Classification Approaches– f={ f1(y), f2(y), … fm(y) } the set of features for product y

– ax(f(y)) the classifier output for customer x’s interest obtained via training, such that

yx 1 (y))(ax in interested is fyx 0 (y))(ax in interested NOT is f

– Examples of classifiers:• Naïve Bayes, k-NN, C4.5 (decision tree)

Feature Selection Problem

• The performance of content-based recommendation depends heavily on the discriminative power of the features selected to be extracted.– Too few features => hard to learn useful profiles

(shallow analysis)– Too many features => hard to estimate the

classifier’s parameters with good generalisation performance.

Our Proposed Solution - the use of SVM

• The Support Vector Machine has been shown to be able to achieve good generalisation performance for classification of high-dimensional data sets and its training can be framed as solving a quadratic programming problem.

• => ones can simply use all extracted features for the input and there is no need for feature selection at all.

Pattern Classification...

Which line is the best?(Training and Generalization)

• Intuitively, maximize the margin between classes • Theoretically sound

– related to minimizing the VC-dimension under the theory of structural risk minimization

Support Vector Machine (SVM)

margin

Solving for the line

• Computationally, this leads to a quadratic programming problem– maximize a quadratic objective function

subject to some linear constraints– no local maximum (cf neural networks)

Support Vectors

• The line depends only on a small number of training examples.

Nonlinear Cases

• use another coordinates system such that the “curve” becomes a “line”

Kernels

• Only inner products, (x)T (y) , are involved in the calculation

• Under certain conditions, there exists a kernel K such that K(x,y)=(x)T (y)

– e.g. Polynomial of degree d: K(x,y)=(xTy+1)d

• replace xTy by (x)T (y)

Overlapping Cases

• Impossible to perfectly separates the two classes– Include an error term

• Instead of maximizing margin, minimize error + / margin

• Again, involves only quadratic programming

Collaborative Recommendation

• Matching between the customer’s ratings with the ratings of others (the word-of-mouth approach).

• Assumptions:– Customer ratings of a reasonably large group of

customers are available.

– Each product has been rated by some of the customers.

– The product ratings are overlapping to certain degrees.

CollaborativeRecommender

System

. . .


Product Ratings

Product Ratings

Some Existing Solutions

• Memory-based Approach– Pearson Correlation Coefficient

– … and its variants– suffer from the sparsity and the first-rater problems.

• Model-based Approach– solve the sparsity problem by incorporating a priori models.– E.g., Naïve Bayes Classifier, Bayesian Network, Latent

Class Model

)()()()(

)()(),(

BBT

BBAAT

AA

BBT

AABA

vvvvvvvv

vvvvxxw

Limitations• The sparsity problem (lacking sufficient ratings)• The first-rater problem (encountering new products)

A New Customer xn

Customer x1

Customer x2

Customer x3

5 4

-

- - - - - -

5 -4 - - - -

1 -- 4 4 - - -

5 -- -- - - -

Grouping Preference Ratings

A New Customer xn

Customer x1

Customer x2

Customer x3

5 4

-

- - - - - -

5 -4 - - - -

1 -- 4 4 - - -

5 -- -- - - -

- to solve the sparsity problem PreferencePattern #1

PreferencePattern #2

Recommended !Recommended !

Integrating Product Contents

A New Customer xn

Customer x1

Customer x2

Customer x3

5 4

-

- - - - - -

5 -4 - - - -

1 -- 4 4 - - -

5 -- -- - - -



- to solve the first-rater problem

Recommended !

Our Proposed Solution - the use of LCM

• The latent class model has been proposed by Thomas Hofmann et al. in IJCAI’99 for clustering preference ratings with promising results.

• Limitation: only capable of recommending products to customers in the training set.

• We extend their model so that– a) Existing products can be recommended to the

customers not in the training set– b) New products can be recommended to the existing

customers (not described in the paper).

Latent Class Model

CustomerX

ProductY

PreferencePattern Z

Observed Hidden

Model Training:Learn P(z), P(x|z) andP(y|z) usingthe EM algorithm. Themodel initialization isdone by theK-means clustering.

z

zyPzxPzPyxP )|()|()(),(

Existing Products to Existing Customers

• Compute the probabilities that x is interested in y

• Products can then be sorted according to the values

of P(y|x) for recommendation.

)|()()|(

)()|(

)|()|()|(

zyPzPzxP

zPzxP

zyPxzPxyP

z z

z

Extension 1: Existing Products to New Customers

z

nn zyPxzPxyP )|()|()|(

),()()|(

),()|(ˆ

),|(ˆ)|(

hnYy

h

hnYy

h

hnn

yxnzPzyP

yxnyzP

YxzPxzP

hh

hh

xn is not inside thetraining set. Thus, wedon’t have P(z|xn).

Inner product of the pdf of pattern z and the ratings of xn.

Extension 2: New Products to Existing Customers

z

nn zyPxzPxyP )|()|()|(

yn is not inside thetraining set. Thus, wedon’t have P(yn|z).

zzny

zny

n

nnn

zfyf

zfyfyzP

zP

yPyzPzyP

)()(exp

)()(exp)|(

)(

)(ˆ)|()|(

distance between yn and z in the feature space

Performance Measures

• accuracy: the percentage of correct recommendations • recall: the percentage of interesting products that can be

located in the output list• precision: the percentage of products in the output list which

are really interesting to the customer.• break-even point: The point where recall = precision

• expected utility:

– its value is high if the products rated high appear early in the output list.

jj

iji

dvR

)1/()1(2

)0,max( )( minmaxmin

iiii RRRRutility

Experiment One: Setup(content-based by SVM)

• Product ratings data set– EachMovie (from DEC)

• Product description data set– Internet Movie Database (http://www.imdb.com)– Size of feature set = 6620, including

• Release date, Runtime, Language, Director, Producer, Original music, Writing credit, ...

• No. of products = 1628 – 5-fold cross-validation– ~1200 for training and remaining for testing

• No. of customers = 100

Experiment One: Results(content-based by SVM)

Accuracy(%)

Break-evenPoint (%)

Utility (%)

SVM 77 80.3 65

Naïve Bayes 76 78.8 61

C4.5rules (100) 74 76.0 52

C4.5rules(400) 75 75.1 52

1-NN 69 76.2 45

C4.5(100) 74 - -

C4.5 (400) 74 - -

majority 75 - -

Experiment Two: Setup(collaborative by ELCM)

• Ratings data set– EachMovie (from DEC)

• Training– No. of products = 500– No. of customers = 90

• Testing– No. of customers = 10– No. of products = 250– Size of the product set where ratings are considered for

matching, L = {10, 63, 83, 125, 250}

Experiment Two: Results(collaborative by ELCM)

Accuracy (%) Break-even point(%)

Utility (%)L

No. oflatent

classes ELCM P-Corr ELCM P-Corr ELCM P-Corr

25061015

636263

6175.677.075.5

75.6576259

65

12561015

616262

6075.676.675.3

73.5576360

61

8361015

616062

5075.677.372.3

73.3586160

52

6361015

615860

5175.677.272.3

71.0596259

52

1061015

626162

4072.071.070.5

63.1565655

45

Conclusion and Future Works

• SVM and ELCM are empirically shown to be promising for content-based recommendation and collaborative recommendation, respectively.

• Future works– ELCM

• Model Enhancement - BiELCM, hierarchical, ...

• Scalability issue of the EM algorithm for ELCM

• Modelling dynamic preference patterns

• Applications to cross-selling?

– Integration of SVM and ELCM for improvement

Documents

Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin