Upload
vincent-chu
View
228
Download
0
Embed Size (px)
Citation preview
8/9/2019 A Review of Information Filtering-CF
1/48
A Review of Information Filtering
Part II: Collaborative Filtering
Chengxiang Zhai
Language Technologies Institiute
School of Computer ScienceCarnegie Mellon University
8/9/2019 A Review of Information Filtering-CF
2/48
Ou tline
A Concept u al Framework for CollaborativeFiltering (CF)
Rating-based Methods (Breese et al. 98) Memory-based methods
Model-based methods
Preference-based Methods(Cohen et al. 99 & Fre u nd et al. 98)
S u mmary & Research Directions
8/9/2019 A Review of Information Filtering-CF
3/48
What is Collaborative Filtering (CF)?
Making filtering decisions for an individ u alu ser based on the j u dgments of other u sers
Inferring individ u als interest/preferencesfrom that of other similar u sers
General idea
Given a u ser u , find similar u sers { u 1 , , u m } Predict u s preferences based on the preferences
of u 1 , , u m
8/9/2019 A Review of Information Filtering-CF
4/48
CF: Applications
Recommender Systems: books , CDs , Videos , Movies , potentially anything!
Can be combined with content-based filtering
Example (commercial) systems
Gro u pLens (Resnick et al. 94): u senet news rating
Amazon: book recommendation
Firefly (p u rchased by Microsoft?): m u sicrecommendation
Alexa: web page recommendation
8/9/2019 A Review of Information Filtering-CF
5/48
CF: Ass u mptions
Users with a common interest will have similar preferences
Users with similar preferences probably share
the same interest
Examples
interest is IR => read SIGIR papers
read SIGIR papers => interest is IR
S u fficiently large n u mber of u ser preferencesare available
8/9/2019 A Review of Information Filtering-CF
6/48
CF: Int u itions
User similarity
If Jamie liked the paper , Ill like the paper
? If Jamie liked the movie , Ill like the movie
S u ppose Jamie and I viewed similar movies in thepast six months
Item similarity
Since 90% of those who liked Star Wars also likedIndependence Day , and , yo u liked Star Wars
Yo u may also like Independence Day
8/9/2019 A Review of Information Filtering-CF
7/48
Collaborative Filtering vs.Content-based Filtering
Basic filtering q u estion: W ill u ser U like item X ?
Two different ways of answering it Look at what U likes
Look at who likes X
Can be combined
=> characterize X => content-based filtering
=> characterize U => collaborative filtering
8/9/2019 A Review of Information Filtering-CF
8/48
Rating-based vs. Preference-based
Rating-based: Users preferences areencoded u sing n u merical ratings on items
Complete ordering
Absol u te val u es can be meaningf u l
Bu t , val u es m u st be normalized to combine
Preferences: Users preferences are
represented by partial ordering of items Partial ordering
Easier to exploit implicit preferences
8/9/2019 A Review of Information Filtering-CF
9/48
A Formal Framework for Rating
u1u2
ui...
um
Users: U
Objects: O
o1 o2 o j on
3 1.5 . 2
2
1
3
Xij=f(ui,o j)=?
?
The task
Unknown f unctionf: U x Op R
Ass u me known f val u es for some ( u ,o)s
Predict f val u es for other (u ,o)s
Essentially f u nctionapproximation , like other learning problems
8/9/2019 A Review of Information Filtering-CF
10/48
Where are the int u itions?
Similar u sers have similar preferences If u } u , then for all os , f(u ,o) } f(u ,o)
Similar objects have similar u ser preferences
If o } o, then for all u s, f(u ,o) } f(u ,o)
In general , f is locally constant
If u } u and o } o, then f( u ,o) } f(u ,o)
Local smoothness makes it possible to predictu nknown val u es by interpolation or extrapolation
What does local mean?
8/9/2019 A Review of Information Filtering-CF
11/48
Two Gro u ps of Approaches
Memory-based approaches
f(u ,o) = g( u )(o) } g( u )(o) if u } u
Find neighbors of u and combine g( u )(o)s
Model-based approaches
Ass u me str u ct u res/model: object cl u ster , u ser cl u ster , f defined on cl u sters
f(u ,o) = f(c u , c o )
Estimation & Probabilistic inference
8/9/2019 A Review of Information Filtering-CF
12/48
Memory-based Approaches(Breese et al. 98)
General ideas:
Xij: rating of object j by u ser i
n i: average rating of all objects by u ser i Normalized ratings: V ij = X ij - n i
Memory-based prediction
Specific approaches differ in w(a ,i) -- thedistance/similarity between u ser a and i
!!!!!
m
iaajajij
m
iaj iawk nv xviaw K v 11 ),(/1
),(
8/9/2019 A Review of Information Filtering-CF
13/48
User Similarity Meas u res
Pearson correlation coefficient (s u m over commonly rated items)
Cosine meas u re
Many other possibilities!
!
j i ij
j a aj
j i ij a aj
p n x n x
n x n x
i a w 22 )()(
))((
),(
!!
!!n
j ij
n
j aj
n
j ij aj
c
x x
x x i a w
1
2
1
2
1),(
8/9/2019 A Review of Information Filtering-CF
14/48
Improving User SimilarityMeas u res (Breese et al. 98)
Dealing with missing val u es: defa u ltratings
Inverse User Freq u ency (IUF): similar toIDF
Case Amplification:u
se w(a ,I)p
, e.g. , p=2.5
8/9/2019 A Review of Information Filtering-CF
15/48
Model-based Approaches(Breese et al. 98)
General ideas
Ass u me that data/ratings are explained by aprobabilistic model with parameter U
Estimate/learn model parameter Ubased on data
Predict u nknown rating u sing E U[xk+1 | x 1, , xk], which is comp u ted u sing the estimated model
Specific methods differ in the model u sedand how the model is estimated
r x x r x p x x x E k k r
k k ),,...,|(],...,|[ UU 1111 !!
8/9/2019 A Review of Information Filtering-CF
16/48
Probabilistic Cl u stering
Cl u stering u sers based on their ratings
Ass u me ratings are observations of am u ltinomial mixt u re model with parameters
p(C) , p(x i|C)
Model estimated u sing standard EM
Predict ratings u sing E [xk+1 | x 1, , xk]
)|(),...,|(),...,|(
),...,|(],...,|[
1111
1111
cC r x p x xcC p x xr x p
r x xr x p x x x E
k k c
k k
k k r
k k
!!!!!
!!
8/9/2019 A Review of Information Filtering-CF
17/48
Bayesian Network
Use BN to capt u re object/item dependency
Each item/object is a node
(Dependency) str u ct u re is learned from all data
Model parameters: p(x k+1 |pa(x k+1 )) wherepa(x k+1 ) is the parents/predictors of x k+1(represented as a decision tree)
Predict ratings u sing E [xk+1 | x 1, , xk]
111
1111
),...,|(
),...,|(],...,|[
!
!!
k k k
k k r
k k
xnodeat treedecisiontheby given x xr x p
r x xr x p x x x E
8/9/2019 A Review of Information Filtering-CF
18/48
Three-way Aspect Model(Popesc u l et al. 2001)
CF + content-based
Generative model
(u ,d ,w) as observations
z as hidden variable
Standard EM
Essentially cl u stering the jointdata
Eval u ation on ResearchIndexdata
Fo u nd its better to treat ( u ,w) asobservations
8/9/2019 A Review of Information Filtering-CF
19/48
Eval u ation Criteria (Breese et al. 98)
Rating acc u racy
Average absol u te deviation
Pa = set of items predicted
Ranking acc u racy
Expected u tility
Exponentially decaying viewing probabillity
E ( halflife )= the rank where the viewing probability=0.5
d = ne u tral rating
|||| aj aj P j
P a x x S a
a ! 1
! j
j aj
a
d x )/()(
),max (112
0
E
8/9/2019 A Review of Information Filtering-CF
20/48
Datasets
8/9/2019 A Review of Information Filtering-CF
21/48
Res u lts
- BN & CR+ are generally better than VSIM& BC
- BN is best withmore training data- VSIMis better with little training data- Inverse User Freq. Is effective- Case amplification ismostly effective
8/9/2019 A Review of Information Filtering-CF
22/48
S u mmary of Rating-based Methods
Effectiveness
Both memory-based and model-based methods canbe effective
The correlation method appears to be rob u st
Bayesian network works well with plenty of trainingdata , b u t not very well with little training data
The cosine similarity method works well with littletraining data
8/9/2019 A Review of Information Filtering-CF
23/48
S u mmary of Rating-based Methods (cont.)
Efficiency
Memory based methods are slower than model-based methods in predicting
Learning can be extremely slow for model-basedmethods
8/9/2019 A Review of Information Filtering-CF
24/48
Preference-based Methods(Cohen et al. 99 , Fre u nd et al. 98)
Motivation
Explicit ratings are not always available , b u t implicitorderings/preferences might be available
O nly relative ratings are meaningf u l, even if whenratings are available
Combining preferences has other applications , e.g. ,
Merging results from different search engines
8/9/2019 A Review of Information Filtering-CF
25/48
A Formal Model of Preferences
Instances: O ={o 1, , o n}
Ranking f u nction: R: (U x) O x O p [0 ,1]
R( u ,v)=1 means u is strongly preferred to v
R( u ,v)=0 means v is strongly preferred to u
R( u ,v)=0.5 means no preference
Feedback: F = {( u ,v)}, u is preferred to v
Minimize Loss:),(minarg),(
||),(
),(
FLvu F
FLH
Fvu
!! 1
1
Hypothesis space
8/9/2019 A Review of Information Filtering-CF
26/48
The Hypothesis Space H
Witho u t constraints on H , the loss isminimized by any R that agrees with F
Appropriate constraints for collaborativefiltering
Compare this with
1!! i a U i
i i a w v u w v u }{
),(),(
!!
!!!m
iaajajij
m
iaj iawk nv xviaw K v
11
),(/1),(
8/9/2019 A Review of Information Filtering-CF
27/48
The Hedge Algorithm for Combining Preferences
Iterative u pdating of w 1, w 2, , w n
Initialization: w i is u niform
Updating: F [0 ,1]
L=0 => weight stays
L is large => weight is decreased
t
F R Lt i t
i Z w
w t t
i ),( F!1
8/9/2019 A Review of Information Filtering-CF
28/48
Some Theoretical Res u lts
The c u m u lative loss of Ra will not be m u chworse than that of the best ranking
expert/featu
rePreferences Ra => ordering V=> R V
L(R V,F)
8/9/2019 A Review of Information Filtering-CF
29/48
A Greedy O rdering Algorithm
Use weighted graph to representpreferences R
For each node , comp u te the potential val u e , I.e. , o u tgoing_weights - ingoing_weights
Rank the node with the highest potentialval u e above all others
Remove this node and its edges , repeat
At least half of the optimal agreement isg u aranteed
!O u O u
v u R u v R v ),(),()(T
8/9/2019 A Review of Information Filtering-CF
30/48
8/9/2019 A Review of Information Filtering-CF
31/48
Eval u ation of O rdering Algorithms
Meas u re: weight coverage
Datasets = randomly generated smallgraphs
O bservations
The basic greedy algorithm works better than arandom perm u tation baseline
Improved version is generally better , b u t theimprovement is insignificant for large graphs
8/9/2019 A Review of Information Filtering-CF
32/48
Metasearch Experiments
Task: Known item search Search for a ML researchers homepage
Search for a u niversity homepage
Search expert = variant of qu
eryLearn to merge res u lts of all search experts
Feedback
Complete : known item preferred to all others Click data : known item preferred to all above it
Leave-one-o u t testing
8/9/2019 A Review of Information Filtering-CF
33/48
Metasearch Res u lts
Meas u res: compare combined preferences withindivid u al ranking f u nction
sign test: to see which system tends to rank theknown relevant article higher.
#q u eries with the known relevant item ranked abovek.
average rank of the known relevant item
Learned system better than individ u al expertby all meas u re (not s u rprising , why?)
8/9/2019 A Review of Information Filtering-CF
34/48
Metasearch Res u lts (cont.)
8/9/2019 A Review of Information Filtering-CF
35/48
Direct Learning of anO rdering F u nction
Each expert is treated as a ranking feat u ref i: O p R U {0} (allow partial ranking)
Given preference feedback * : X x X p R
Goal: to learn H that minimizes the loss
D* (x 0,x1): a distrib u tion over X x X (act u ally au niform dist. over pairs with feedback order)D* (x 0,x1) = c max{0 , * (x 0,x1) }
)]()([Pr )]]()()[[,()( ~),(,
101010 10
10
x H x H x H x H x x D H rloss D x x x x
D "!"! * *
8/9/2019 A Review of Information Filtering-CF
36/48
The RankBoost Algorithm
Iterative u pdating of D(x 0,x1)Initialization: D 1= D *For t=1 , ,T : Train weak learner u sing D t Get weak hypothesis h t: X p R
Choose Et >0 Update
Final hypothesis:
t
x h x h t
t Z
e x x x x
t t t ))()((),(),(
10
10101 !
E
!
!T
t
t x h x H 1
)()(
8/9/2019 A Review of Information Filtering-CF
37/48
How to Choose Et and Design h t ?
Bo u nd on the ranking loss
Th u s , we sho u ld choose Et that minimizes thebo u nd
Three approaches:
Nu merical search
Special case: h is either 0 or 1
Approximation of Z , then find analytic sol u tion
!
e*
T
t
t D Z rloss
1
8/9/2019 A Review of Information Filtering-CF
38/48
Efficient RankBoost for Bipartite
Feedback
t
x h x h t
t Z
e x x x x
t t t ))()((),(),(
10
10101 !
E
0
)(0
01
0)()(
t
xht
t Z
e xv xv
t t E
!
1
)(1
11
1)()(
t
xht
t Z
e xv xv
t t E
!
10
1010 )()(),(
t t t
t t t
Z Z Z
xv xv x x D
!
!
Complexity at each round: O(|X0||X1|) O(|X0|+|X1|)
Bipartite feedback:Essentially binary classification
X0
X1
8/9/2019 A Review of Information Filtering-CF
39/48
Eval u ation of RankBoost
Meta-search: Same as in (Cohen et al 99)
Perfect feedback
4-fold cross validation
8/9/2019 A Review of Information Filtering-CF
40/48
EachMovie Eval u ation
# users #movies/user
#feedback movies
8/9/2019 A Review of Information Filtering-CF
41/48
Performance ComparisonCohen et al. 99 vs. Freund et al. 99
8/9/2019 A Review of Information Filtering-CF
42/48
S u mmary
CF is easy The u sers expectation is low
Any recommendation is better than none
Making it practically u sef u l
CF is hard
Data sparseness
Scalability Domain-dependent
8/9/2019 A Review of Information Filtering-CF
43/48
S u mmary (cont.)
CF as a Learning Task Rating-based form u lation
Learn f: U x O -> R
Algorithms
Instance-based /memory-based (k -nearest neighbors )
Model-based (probabilistic clustering )
Preference-based form u lation
Learn PREF: U x O x O -> R Algorithms
General preference combination (Hedge ), greedy ordering
Efficient restricted preference combination (Ran kBoost )
8/9/2019 A Review of Information Filtering-CF
44/48
S u mmary (cont.)
Eval u ation
Rating-based methods
Sim ple methods seem to be reasonably effective
Advantage of so phisticated methods seems to be limited
Preference-based methods
More effective than rating-based methods according to
one evaluationEvaluation on meta-search is wea k
8/9/2019 A Review of Information Filtering-CF
45/48
Research Directions
Exploiting complete information
CF + content-based filtering + domainknowledge + u ser model
More localized kernels for instance-based methods
Predicting movies need different neighbor u sers than predicting books
S u ggesting u sing items similar to the targetitem as feat u res to find neighbors
8/9/2019 A Review of Information Filtering-CF
46/48
Research Directions (cont.)
Modeling time
There might be seq u ential patterns on theitems a u ser p u rchased (e.g. , bread machine ->
bread machine mix)
Probabilistic model of preferences
Making preference f u nction a probability
f u nction , e.g , P(A>B|U) Cl u stering items and u sers
Minimizing preference disagreements
8/9/2019 A Review of Information Filtering-CF
47/48
References
Cohen , W.W., Schapire , R.E. , and Singer , Y. (1999) "Learning to O rder Things" , Jo u rnal of AI Research , Vol u me 10 , pages 243-270.
Fre u nd , Y., Iyer , R. ,Schapire , R.E. , & Singer , Y. (1999). An efficient boosting algorithmfor combining preferences. M achine Learning Jo u rnal. 1999.
Breese , J. S. , Heckerman , D. , and Kadie , C. (1998). Empirical Analysis of PredictiveAlgorithms for Collaborative Filtering. In Proceedings of the 14th Conference onUncertainty in Articial Intelligence , pp. 43-52.
Alexandrin Popesc u l and Lyle H. Ungar , Probabilistic Models for UnifiedCollaborative and Content-Based Recommendation in Sparse-Data Environments , UAI 2001.
N. Good , J.B. Schafer , J. Konstan , A. Borchers , B. Sarwar , J. Herlocker , and J. Riedl."Combining Collaborative Filtering with Personal Agents for Better Recommendations." Proceedings AAAI-99. pp 439-446. 1999.
8/9/2019 A Review of Information Filtering-CF
48/48
The End
Tha nk yo u!