Upload
jose-a-rodriguez-serrano
View
32
Download
2
Embed Size (px)
Citation preview
MotivationRanking Preferences
Comparing itemsPerspectives
Introduction to preference learning
Jose A. Rodriguez-Serrano
June 14, 2016
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionExamplesOutlook
Introduction to preference learning
Preference learning
Inducing predictive preference models from empirical data.
Caveats:
This presentation: Some supervised models
Human preferences are very complex (and might be inconsistent)Machine learning is not magic
it’s inferring model parameters from datawhen data can be transformed into a “signal” with sufficientinformation of the target task
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionExamplesOutlook
Additional Sources
Preference Learning: A Tutorial Introduction (Fürnkranz andHüllermeier) http://www.ke.tu-darmstadt.de/events/PL-12/slides/PL-Tutorial-1.pdf
http://www.preference-learning.org/
B.Kulis, Metric Learning: A Survey, 2012,http://web.cse.ohio-state.edu/~kulis/pubs/ftml_metric_learning.pdf
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionExamplesOutlook
Where does this come from?
Learning user preferences in search engines using click-through data
Joachims, Optimizing search engines using clickthrough data SIGKDD 2002
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionExamplesOutlook
Learning preference models for transport
e.g. Chidlovskii, Improved Trip Planning by Learning fromTravelers’ Choices, Mining Urban Data, 2015.
Option 1
Estimated duration = 38 min
# changes = 1
Frequency (waiting time) = 6min
Walking time = 11 min
Cost = (1.7+2) EUR
Data : x1 = [38,1,6,11,3.7, . . .],x2 = . . . ,x3 = . . . ,x4 = . . .Label: `(x1,x2,x3,x4) = (1,0,0,0)
Other examples: question answering, ad selection, . . .Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionExamplesOutlook
“Semantic” search
A. Gordo, J. A. Rodriguez-Serrano, F. Perronnin, E. Valveny, Leveraging Category-Level Labels for Instance-Level ImageRetrieval, CVPR 2012.
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionExamplesOutlook
Preference learning
Inducing predictive preference models from empirical data.
Learning to rank preferences
Learning to compare items
Applications: Model / explain / predict user preferences for products?Compare customers?
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Outline
1 Motivation
2 Ranking Preferences
3 Comparing items
4 Perspectives
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Ranking preferences
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
How to do that
(plotted using XKCDify)
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
SVM for preference learning
Compatibility features of user and product f(u,p) = [f 1up, . . . f
Dup]
Relevance score: R(u,p) = wT f(u,p)
We want: wT f(u,p+)> wT f(u,p−)⇒ wT (f(u,p+)− f(u,p−))> 0︸ ︷︷ ︸This is a linear classifier!
wT can be learned with a linear classifier
1 Construct data. Each triplet u, p1, p2 is a sample (xi , `i) with
Features: xi = f(u,p1)− f(u,p2)Label: `i =+1 if p1 is preferred, `i =−1 if p2 is preferred
2 Train a binary linear classifier (e.g SVM) with these data
Alternative: Structured output learning Nowozin and Lampert, Structured Learning and Prediction in Computer Vision, 2011
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Special case 1: Multi-class ranking SVM (one model perproduct)
Preference for product 1: xTu w1
Preference for product 2: xTu w2
Multi-class ranking SVM
Data comes in form of preference triplets: x,c+,c−
We want w+T x > w−T x
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Multi-class ranking SVM (2)
Method to learn ω = w1,w2, . . .
1 Sample (x,c+,c−)2 If w+
T x > w−T x+1, then δi = 0, else δi = 13 If False, update:
wp← wp(1−ηλ )+δiλxi
wn← wn(1−ηλ )−δiλxi
wk ← wk (1−ηλ )
4 Go to (1) until convergence
We obtain the w’s that represent each product (it can be interpreted asan “encoding” of the products)
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Stochastic gradient descent
Lb(ω) = ∑i,p,n
Li,p,n = ∑i,p,n
max(0,1−wpT xi +wn
T xi)+η ∑j‖ωj‖2 (1)
Gradient descent: Make one step in the direction of the gradient.ω ← ω−λ
∂L∂ω
Stochastic gradient descent:
1 Sample i, p, n
2 Compute ω ← ω−λ∂Li,p,n
∂ω
(Then we end up with the solution of the previous slide)
Bottou, Large-Scale Learning with Stochastic Gradient Descent, 2010
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Special case 2: Feature embeddings
Compatibility between a user and a product: g(u)T Wh(p)x = g(u) = [x1, . . . ,xD]
T
p = h(p) = [p1, . . . ,pE ]T
1 model only, customer and product set open.Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Special case 2: Feature embeddings (2)
1 Sample (xi ,p+,p−)2 Check if xT Wp+ > xT Wp−+13 If False, update:
W←W+λx(p+−p−)T
4 Go to (1) until convergence
Bai et al, Supervised Semantic Indexing, CIKM 2009
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Simplicity matters
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
IntroductionSVM for preference learningMulti-class ranking SVMFeature embeddingsRecap
Recap & thoughts
Different ways to learn and predict preferences:
SVM for preference learning→When function f (u,p) is known, 1model onlyMulti-class ranking SVM→ Closed set of products, 1 model perproductLabel embedding→ Open set of products, 1 model only
Properties
Simple (cost of deploying is small)Easy to personalize (initialize from global model)
Potential applications:
Best next product recommenderUnderstand user preferences for channel
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
Metric LearningCanonical Correlation AnalysisRecap
Outline
1 Motivation
2 Ranking Preferences
3 Comparing items
4 Perspectives
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
Metric LearningCanonical Correlation AnalysisRecap
Metric Learning
Supervised notion that customer x should be more similar to ythan to z.
Express as computing a similarity between customersa(x,y) = xT Wy
Same solution as before
Example: Find similar customers to customers who have purchased aproduct.
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
Metric LearningCanonical Correlation AnalysisRecap
Metric learning (2)
Low-rank decomposition W = UT U (W is D×D, W is K ×D, K < D)
Learning rule becomes U← U−λU(xi(x+i ,x−i )T +(x+
i ,x−i )xTi )
(Supervised) dimensionality reduction a(x,y) = (Ux)T Uy
Bai et al, Supervised Semantic Indexing, CIKM 2009Chechik et al., Large Scale Online Learning of Image Similarity Through Ranking, JMLR 2010Davis et al., Information Theoretic Metric Learning, ICML 2007Hu et al., Discriminative deep metric learning for face verification in the wild, CVPR 2014
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
Metric LearningCanonical Correlation AnalysisRecap
Canonical Correlation Analysis
Multiple views of datax1 = {x1,1, . . . ,x1,D} y1 = {y1,1, . . . ,y1,D}x2 = {x2,1 . . . ,x2,D} y2 = {y2,1, . . . ,y2,D}... (e.g. transactionality, socio-demographics variables, etc. )
Canonical correlation analysis (CCA)
Project multiple views of data to a common subspace wherecorrelation is maximized.
C(wk ,uk ) =wk XYuk√
wTk XT Xwk uk YT Yuk
(2)
s.t.wTk XT Xwk = 1,uT
k YT Yuk = 1 (3)
Solution: Generalized eigenvalues of XT Y(YT Y+ρ I−1)YT Xwk = λ 2(XT X+ρ Iwk )H. Hotelling. Relations between two sets of variables. Biometrika, 1936.
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
Metric LearningCanonical Correlation AnalysisRecap
Recap & thoughts
Different ways to “learn to compare” items:
Metric learning: 1 model only (supervised by proximity)
Canonical correlation analysis (consolidates multiple views)
Potential applications:
Find similar customers
Best next product recommender
Understand user preferences for channel
Improve predictions of similarity-based regression
Find new buyers
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
Relation to Deep Learning
Outline
1 Motivation
2 Ranking Preferences
3 Comparing items
4 Perspectives
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
Relation to Deep Learning
Relation to deep learning (1)
Projection-based methods are basically 1-layer perceptronsh = Ux, hi = ∑j Uijxj
In metric learning, instead of minimizing squared error, we minimize aranking loss (“ranking perceptron”).We can add all the “deep learning” creativity.
“Nothing is stronger than an idea whose time has come” (' Victor Hugo)
Jose A. Rodriguez-Serrano Introduction to preference learning
MotivationRanking Preferences
Comparing itemsPerspectives
Relation to Deep Learning
Relation to deep learning (2)
Learn metric on top of Restricted Boltzmann Machines, Deep BeliefNetworks, (Stacked) (Denoising) Autoencoders, etc.
Jose A. Rodriguez-Serrano Introduction to preference learning