A Review of Information Filtering-CF

8/9/2019 A Review of Information Filtering-CF

1/48

A Review of Information Filtering

Part II: Collaborative Filtering

Chengxiang Zhai

Language Technologies Institiute

School of Computer ScienceCarnegie Mellon University


2/48

Ou tline

A Concept u al Framework for CollaborativeFiltering (CF)

Rating-based Methods (Breese et al. 98) Memory-based methods

Model-based methods

Preference-based Methods(Cohen et al. 99 & Fre u nd et al. 98)

S u mmary & Research Directions


3/48

What is Collaborative Filtering (CF)?

Making filtering decisions for an individ u alu ser based on the j u dgments of other u sers

Inferring individ u als interest/preferencesfrom that of other similar u sers

General idea

Given a u ser u , find similar u sers { u 1 , , u m } Predict u s preferences based on the preferences

of u 1 , , u m


4/48

CF: Applications

Recommender Systems: books , CDs , Videos , Movies , potentially anything!

Can be combined with content-based filtering

Example (commercial) systems

Gro u pLens (Resnick et al. 94): u senet news rating

Amazon: book recommendation

Firefly (p u rchased by Microsoft?): m u sicrecommendation

Alexa: web page recommendation


5/48

CF: Ass u mptions

Users with a common interest will have similar preferences

Users with similar preferences probably share

the same interest

Examples

interest is IR => read SIGIR papers

read SIGIR papers => interest is IR

S u fficiently large n u mber of u ser preferencesare available


6/48

CF: Int u itions

User similarity

If Jamie liked the paper , Ill like the paper

? If Jamie liked the movie , Ill like the movie

S u ppose Jamie and I viewed similar movies in thepast six months

Item similarity

Since 90% of those who liked Star Wars also likedIndependence Day , and , yo u liked Star Wars

Yo u may also like Independence Day


7/48

Collaborative Filtering vs.Content-based Filtering

Basic filtering q u estion: W ill u ser U like item X ?

Two different ways of answering it Look at what U likes

Look at who likes X

Can be combined

=> characterize X => content-based filtering

=> characterize U => collaborative filtering


8/48

Rating-based vs. Preference-based

Rating-based: Users preferences areencoded u sing n u merical ratings on items

Complete ordering

Absol u te val u es can be meaningf u l

Bu t , val u es m u st be normalized to combine

Preferences: Users preferences are

represented by partial ordering of items Partial ordering

Easier to exploit implicit preferences


9/48

A Formal Framework for Rating

u1u2

ui...

um

Users: U

Objects: O

o1 o2 o j on

3 1.5 . 2

2

1

3

Xij=f(ui,o j)=?

?

The task

Unknown f unctionf: U x Op R

Ass u me known f val u es for some ( u ,o)s

Predict f val u es for other (u ,o)s

Essentially f u nctionapproximation , like other learning problems


10/48

Where are the int u itions?

Similar u sers have similar preferences If u } u , then for all os , f(u ,o) } f(u ,o)

Similar objects have similar u ser preferences

If o } o, then for all u s, f(u ,o) } f(u ,o)

In general , f is locally constant

If u } u and o } o, then f( u ,o) } f(u ,o)

Local smoothness makes it possible to predictu nknown val u es by interpolation or extrapolation

What does local mean?


11/48

Two Gro u ps of Approaches

Memory-based approaches

f(u ,o) = g( u )(o) } g( u )(o) if u } u

Find neighbors of u and combine g( u )(o)s

Model-based approaches

Ass u me str u ct u res/model: object cl u ster , u ser cl u ster , f defined on cl u sters

f(u ,o) = f(c u , c o )

Estimation & Probabilistic inference


12/48

Memory-based Approaches(Breese et al. 98)

General ideas:

Xij: rating of object j by u ser i

n i: average rating of all objects by u ser i Normalized ratings: V ij = X ij - n i

Memory-based prediction

Specific approaches differ in w(a ,i) -- thedistance/similarity between u ser a and i

!!!!!

m

iaajajij

m

iaj iawk nv xviaw K v 11 ),(/1

),(


13/48

User Similarity Meas u res

Pearson correlation coefficient (s u m over commonly rated items)

Cosine meas u re

Many other possibilities!

!

j i ij

j a aj

j i ij a aj

p n x n x

n x n x

i a w 22 )()(

))((

),(

!!

!!n

j ij

n

j aj

n

j ij aj

c

x x

x x i a w

1

2

1

2

1),(


14/48

Improving User SimilarityMeas u res (Breese et al. 98)

Dealing with missing val u es: defa u ltratings

Inverse User Freq u ency (IUF): similar toIDF

Case Amplification:u

se w(a ,I)p

, e.g. , p=2.5


15/48

Model-based Approaches(Breese et al. 98)

General ideas

Ass u me that data/ratings are explained by aprobabilistic model with parameter U

Estimate/learn model parameter Ubased on data

Predict u nknown rating u sing E U[xk+1 | x 1, , xk], which is comp u ted u sing the estimated model

Specific methods differ in the model u sedand how the model is estimated

r x x r x p x x x E k k r

k k ),,...,|(],...,|[ UU 1111 !!


16/48

Probabilistic Cl u stering

Cl u stering u sers based on their ratings

Ass u me ratings are observations of am u ltinomial mixt u re model with parameters

p(C) , p(x i|C)

Model estimated u sing standard EM

Predict ratings u sing E [xk+1 | x 1, , xk]

)|(),...,|(),...,|(

),...,|(],...,|[

1111

1111

cC r x p x xcC p x xr x p

r x xr x p x x x E

k k c

k k

k k r

k k

!!!!!

!!


17/48

Bayesian Network

Use BN to capt u re object/item dependency

Each item/object is a node

(Dependency) str u ct u re is learned from all data

Model parameters: p(x k+1 |pa(x k+1 )) wherepa(x k+1 ) is the parents/predictors of x k+1(represented as a decision tree)

Predict ratings u sing E [xk+1 | x 1, , xk]

111

1111

),...,|(

),...,|(],...,|[

!

!!

k k k

k k r

k k

xnodeat treedecisiontheby given x xr x p

r x xr x p x x x E


18/48

Three-way Aspect Model(Popesc u l et al. 2001)

CF + content-based

Generative model

(u ,d ,w) as observations

z as hidden variable

Standard EM

Essentially cl u stering the jointdata

Eval u ation on ResearchIndexdata

Fo u nd its better to treat ( u ,w) asobservations


19/48

Eval u ation Criteria (Breese et al. 98)

Rating acc u racy

Average absol u te deviation

Pa = set of items predicted

Ranking acc u racy

Expected u tility

Exponentially decaying viewing probabillity

E ( halflife )= the rank where the viewing probability=0.5

d = ne u tral rating

|||| aj aj P j

P a x x S a

a ! 1

! j

j aj

a

d x )/()(

),max (112

0

E


20/48

Datasets


21/48

Res u lts

- BN & CR+ are generally better than VSIM& BC

- BN is best withmore training data- VSIMis better with little training data- Inverse User Freq. Is effective- Case amplification ismostly effective


22/48

S u mmary of Rating-based Methods

Effectiveness

Both memory-based and model-based methods canbe effective

The correlation method appears to be rob u st

Bayesian network works well with plenty of trainingdata , b u t not very well with little training data

The cosine similarity method works well with littletraining data


23/48

S u mmary of Rating-based Methods (cont.)

Efficiency

Memory based methods are slower than model-based methods in predicting

Learning can be extremely slow for model-basedmethods


24/48

Preference-based Methods(Cohen et al. 99 , Fre u nd et al. 98)

Motivation

Explicit ratings are not always available , b u t implicitorderings/preferences might be available

O nly relative ratings are meaningf u l, even if whenratings are available

Combining preferences has other applications , e.g. ,

Merging results from different search engines


25/48

A Formal Model of Preferences

Instances: O ={o 1, , o n}

Ranking f u nction: R: (U x) O x O p [0 ,1]

R( u ,v)=1 means u is strongly preferred to v

R( u ,v)=0 means v is strongly preferred to u

R( u ,v)=0.5 means no preference

Feedback: F = {( u ,v)}, u is preferred to v

Minimize Loss:),(minarg),(

||),(

),(

FLvu F

FLH

Fvu

!! 1

1

Hypothesis space


26/48

The Hypothesis Space H

Witho u t constraints on H , the loss isminimized by any R that agrees with F

Appropriate constraints for collaborativefiltering

Compare this with

1!! i a U i

i i a w v u w v u }{

),(),(

!!

!!!m

iaajajij

m

iaj iawk nv xviaw K v

11

),(/1),(


27/48

The Hedge Algorithm for Combining Preferences

Iterative u pdating of w 1, w 2, , w n

Initialization: w i is u niform

Updating: F [0 ,1]

L=0 => weight stays

L is large => weight is decreased

t

F R Lt i t

i Z w

w t t

i ),( F!1


28/48

Some Theoretical Res u lts

The c u m u lative loss of Ra will not be m u chworse than that of the best ranking

expert/featu

rePreferences Ra => ordering V=> R V

L(R V,F)


29/48

A Greedy O rdering Algorithm

Use weighted graph to representpreferences R

For each node , comp u te the potential val u e , I.e. , o u tgoing_weights - ingoing_weights

Rank the node with the highest potentialval u e above all others

Remove this node and its edges , repeat

At least half of the optimal agreement isg u aranteed

!O u O u

v u R u v R v ),(),()(T


30/48


31/48

Eval u ation of O rdering Algorithms

Meas u re: weight coverage

Datasets = randomly generated smallgraphs

O bservations

The basic greedy algorithm works better than arandom perm u tation baseline

Improved version is generally better , b u t theimprovement is insignificant for large graphs


32/48

Metasearch Experiments

Task: Known item search Search for a ML researchers homepage

Search for a u niversity homepage

Search expert = variant of qu

eryLearn to merge res u lts of all search experts

Feedback

Complete : known item preferred to all others Click data : known item preferred to all above it

Leave-one-o u t testing


33/48

Metasearch Res u lts

Meas u res: compare combined preferences withindivid u al ranking f u nction

sign test: to see which system tends to rank theknown relevant article higher.

#q u eries with the known relevant item ranked abovek.

average rank of the known relevant item

Learned system better than individ u al expertby all meas u re (not s u rprising , why?)


34/48

Metasearch Res u lts (cont.)


35/48

Direct Learning of anO rdering F u nction

Each expert is treated as a ranking feat u ref i: O p R U {0} (allow partial ranking)

Given preference feedback * : X x X p R

Goal: to learn H that minimizes the loss

D* (x 0,x1): a distrib u tion over X x X (act u ally au niform dist. over pairs with feedback order)D* (x 0,x1) = c max{0 , * (x 0,x1) }

)]()([Pr )]]()()[[,()( ~),(,

101010 10

10

x H x H x H x H x x D H rloss D x x x x

D "!"! * *


36/48

The RankBoost Algorithm

Iterative u pdating of D(x 0,x1)Initialization: D 1= D *For t=1 , ,T : Train weak learner u sing D t Get weak hypothesis h t: X p R

Choose Et >0 Update

Final hypothesis:

t

x h x h t

t Z

e x x x x

t t t ))()((),(),(

10

10101 !

E

!

!T

t

t x h x H 1

)()(


37/48

How to Choose Et and Design h t ?

Bo u nd on the ranking loss

Th u s , we sho u ld choose Et that minimizes thebo u nd

Three approaches:

Nu merical search

Special case: h is either 0 or 1

Approximation of Z , then find analytic sol u tion

!

e*

T

t

t D Z rloss

1


38/48

Efficient RankBoost for Bipartite

Feedback

t

x h x h t

t Z

e x x x x

t t t ))()((),(),(

10

10101 !

E

0

)(0

01

0)()(

t

xht

t Z

e xv xv

t t E

!

1

)(1

11

1)()(

t

xht

t Z

e xv xv

t t E

!

10

1010 )()(),(

t t t

t t t

Z Z Z

xv xv x x D

!

!

Complexity at each round: O(|X0||X1|) O(|X0|+|X1|)

Bipartite feedback:Essentially binary classification

X0

X1


39/48

Eval u ation of RankBoost

Meta-search: Same as in (Cohen et al 99)

Perfect feedback

4-fold cross validation


40/48

EachMovie Eval u ation

# users #movies/user

#feedback movies


41/48

Performance ComparisonCohen et al. 99 vs. Freund et al. 99


42/48

S u mmary

CF is easy The u sers expectation is low

Any recommendation is better than none

Making it practically u sef u l

CF is hard

Data sparseness

Scalability Domain-dependent


43/48

S u mmary (cont.)

CF as a Learning Task Rating-based form u lation

Learn f: U x O -> R

Algorithms

Instance-based /memory-based (k -nearest neighbors )

Model-based (probabilistic clustering )

Preference-based form u lation

Learn PREF: U x O x O -> R Algorithms

General preference combination (Hedge ), greedy ordering

Efficient restricted preference combination (Ran kBoost )


44/48

S u mmary (cont.)

Eval u ation

Rating-based methods

Sim ple methods seem to be reasonably effective

Advantage of so phisticated methods seems to be limited

Preference-based methods

More effective than rating-based methods according to

one evaluationEvaluation on meta-search is wea k


45/48

Research Directions

Exploiting complete information

CF + content-based filtering + domainknowledge + u ser model

More localized kernels for instance-based methods

Predicting movies need different neighbor u sers than predicting books

S u ggesting u sing items similar to the targetitem as feat u res to find neighbors


46/48

Research Directions (cont.)

Modeling time

There might be seq u ential patterns on theitems a u ser p u rchased (e.g. , bread machine ->

bread machine mix)

Probabilistic model of preferences

Making preference f u nction a probability

f u nction , e.g , P(A>B|U) Cl u stering items and u sers

Minimizing preference disagreements


47/48

References

Cohen , W.W., Schapire , R.E. , and Singer , Y. (1999) "Learning to O rder Things" , Jo u rnal of AI Research , Vol u me 10 , pages 243-270.

Fre u nd , Y., Iyer , R. ,Schapire , R.E. , & Singer , Y. (1999). An efficient boosting algorithmfor combining preferences. M achine Learning Jo u rnal. 1999.

Breese , J. S. , Heckerman , D. , and Kadie , C. (1998). Empirical Analysis of PredictiveAlgorithms for Collaborative Filtering. In Proceedings of the 14th Conference onUncertainty in Articial Intelligence , pp. 43-52.

Alexandrin Popesc u l and Lyle H. Ungar , Probabilistic Models for UnifiedCollaborative and Content-Based Recommendation in Sparse-Data Environments , UAI 2001.

N. Good , J.B. Schafer , J. Konstan , A. Borchers , B. Sarwar , J. Herlocker , and J. Riedl."Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.


48/48

The End

Tha nk yo u!

Documents

A Review of Information Filtering-CF