Upload
others
View
4
Download
0
Embed Size (px)
Citation preview
Spectral Method and Regularized MLE AreBoth Optimal for Top-K Ranking
Yuxin ChenElectrical Engineering, Princeton University
Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang
RankingA fundamental problem in a wide range of contexts• web search, recommendation systems, admissions, sports
competitions, voting, ...
PageRankfigure credit: Dzenan Hamzic
Top-K ranking 2/ 21
Rank aggregation from pairwise comparisons
pairwise comparisons for ranking top tennis playersfigure credit: Bozoki, Csato, Temesi
Top-K ranking 3/ 21
Parametric modelsAssign latent score to each of n items w∗ = [w∗1, · · · , w∗n]
i: rank wi: preference score
1
i:rank
w⇤ i:
preference
score
1
• This work: Bradley-Terry-Luce (logistic) model
P {item j beats item i} =w∗j
w∗i + w∗j• Other models: Thurstone model, low-rank model, ...
Top-K ranking 4/ 21
Parametric modelsAssign latent score to each of n items w∗ = [w∗1, · · · , w∗n]
i: rank wi: preference score
1
i:rank
w⇤ i:
preference
score
1
• This work: Bradley-Terry-Luce (logistic) model
P {item j beats item i} =w∗j
w∗i + w∗j• Other models: Thurstone model, low-rank model, ...
Top-K ranking 4/ 21
Typical ranking proceduresEstimate latent scores−→ rank items based on score estimates
Goal: identify the set of top-K items under minimal sample size
Top-K ranking 5/ 21
Top-K rankingEstimate latent scores−→ rank items based on score estimates
Goal: identify the set of top-K items under minimal sample size
Top-K ranking 5/ 21
Model: random sampling
• Comparison graph: Erdos–Renyi graph G ∼ G(n, p)
1
2
34
5
6
7
8
910
11
12
• For each (i, j) ∈ G, obtain L paired comparisons
y(l)i,j
ind.=
1, with prob. w∗jw∗i +w∗j
0, else1 ≤ l ≤ L
yi,j = 1L
L∑l=1
y(l)i,j (sufficient statistic)
Top-K ranking 6/ 21
Model: random sampling
• Comparison graph: Erdos–Renyi graph G ∼ G(n, p)
1
2
34
5
6
7
8
910
11
12
• For each (i, j) ∈ G, obtain L paired comparisons
y(l)i,j
ind.=
1, with prob. w∗jw∗i +w∗j
0, else1 ≤ l ≤ L
yi,j = 1L
L∑l=1
y(l)i,j (sufficient statistic)
Top-K ranking 6/ 21
Prior art
Spectral method
MLE
mean square error for estimating scores
top-K ranking accuracy
✔✔
Spectral MLE ✔ ✔
Negahban et al. ‘12
Negahban et al. ’12 Hajek et al. ‘14
Chen & Suh. ’15
Top-K ranking 7/ 21
Prior art
Spectral method
MLE
mean square error for estimating scores
top-K ranking accuracy
✔✔
Spectral MLE ✔ ✔
“meta metric”a
Negahban et al. ‘12
Negahban et al. ’12 Hajek et al. ‘14
Chen & Suh. ’15
Top-K ranking 7/ 21
Small `2 loss 6= high ranking accuracy
These two estimates have same `2 loss, but output different rankings
Need to control entrywise error!
Top-K ranking 8/ 21
Small `2 loss 6= high ranking accuracy
These two estimates have same `2 loss, but output different rankings
Need to control entrywise error!
Top-K ranking 8/ 21
Small `2 loss 6= high ranking accuracy
These two estimates have same `2 loss, but output different rankings
Need to control entrywise error!
Top-K ranking 8/ 21
Small `2 loss 6= high ranking accuracy
These two estimates have same `2 loss, but output different rankings
Need to control entrywise error!
Top-K ranking 8/ 21
Optimality?
Is spectral method or MLE alone optimal for top-K ranking?
Partial answer (Jang et al ’16):spectral method works if comparison graph is sufficiently dense
This work: affirmative answer for both methods + entire regime︸ ︷︷ ︸inc. sparse graphs
Top-K ranking 9/ 21
Optimality?
Is spectral method or MLE alone optimal for top-K ranking?
Partial answer (Jang et al ’16):spectral method works if comparison graph is sufficiently dense
This work: affirmative answer for both methods + entire regime︸ ︷︷ ︸inc. sparse graphs
Top-K ranking 9/ 21
Optimality?
Is spectral method or MLE alone optimal for top-K ranking?
Partial answer (Jang et al ’16):spectral method works if comparison graph is sufficiently dense
This work: affirmative answer for both methods + entire regime︸ ︷︷ ︸inc. sparse graphs
Top-K ranking 9/ 21
Spectral method (Rank Centrality)
Negahban, Oh, Shah ’12
• Construct a probability transition matrix P , whose off-diagonalentries obey
Pi,j ∝{yi,j , if (i, j) ∈ G0, if (i, j) /∈ G
• Return score estimate as leading left eigenvector of P
Top-K ranking 10/ 21
Rationale behind spectral method
In large-sample case, P → P ∗, whose off-diagonal entries obey
P ∗i,j ∝
w∗j
w∗i +w∗j, if (i, j) ∈ G
0, if (i, j) /∈ G
• Stationary distribution of reversible︸ ︷︷ ︸check detailed balance
P ∗
π∗ ∝ [w∗1, w∗2, . . . , w∗n]︸ ︷︷ ︸true score
Top-K ranking 11/ 21
Regularized MLE
Negative log-likelihood
L (w) := −∑
(i,j)∈G
{yj,i log wi
wi + wj+ (1− yj,i) log wj
wi + wj
}
• L (w) becomes convex after reparametrization:
w −→ θ = [θ1, · · · , θn], θi = logwi
(Regularized MLE) minimizeθ Lλ (θ) := L (θ) + 12λ‖θ‖
22
choose λ �√
np lognL
Top-K ranking 12/ 21
Regularized MLE
Negative log-likelihood
L (w) := −∑
(i,j)∈G
{yj,i log wi
wi + wj+ (1− yj,i) log wj
wi + wj
}
• L (w) becomes convex after reparametrization:
w −→ θ = [θ1, · · · , θn], θi = logwi
(Regularized MLE) minimizeθ Lλ (θ) := L (θ) + 12λ‖θ‖
22
choose λ �√
np lognL
Top-K ranking 12/ 21
Main resultcomparison graph G(n, p); sample size � pn2L
1
2
34
5
6
7
8
910
11
12
Theorem 1 (Chen, Fan, Ma, Wang ’17)
When p & lognn , both spectral method and regularized MLE achieve
optimal sample complexity for top-K ranking!
infeasible feasible sample size
1
infeasible feasible sample size
1
Comparison with Jang et al’16
Jang et al’16: spectral method controls entrywise error if p &Û
logn
n¸ ˚˙ ˝relatively dense
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4
1
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4
1
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4
1
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4
1
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4score
separation: �K : score separation
1
Top-K Ranking 14/ 21
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
scoreseparation: �K : score separation
achievable by both methods
1
• ∆K :=w∗(K)−w
∗(K+1)
‖w∗‖∞ : score separation
Top-K ranking 13/ 21
Main result
infeasible feasible sample size
1
infeasible feasible sample size
1
Comparison with Jang et al’16
Jang et al’16: spectral method controls entrywise error if p &Û
logn
n¸ ˚˙ ˝relatively dense
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4
1
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4
1
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4
1
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4
1
Our work / optimal sample size Jang et al ’16 �K
�log n
n
�1/4score
separation: �K : score separation
1
Top-K Ranking 14/ 21
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
scoreseparation: �K : score separation
achievable by both methods
1
• ∆K :=w∗(K)−w
∗(K+1)
‖w∗‖∞ : score separation
Top-K ranking 13/ 21
Comparison with Jang et al ’16
Jang et al ’16: spectral method controls entrywise error if p &
√lognn︸ ︷︷ ︸
relatively dense
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
1
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
1
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
1
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
1
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
scoreseparation: �K : score separation
1
Top-K ranking 14/ 21
Comparison with Jang et al ’16
Jang et al ’16: spectral method controls entrywise error if p &
√lognn︸ ︷︷ ︸
relatively denseOur work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
1
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
1
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
1
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
1
Our work / optimal sample size Jang et al ’16 �K
⇣lognn
⌘1/4
scoreseparation: �K : score separation
1
Top-K ranking 14/ 21
Empirical top-K ranking accuracy
0.1 0.2 0.3 0.4 0.5"K : score separation
0
0.2
0.4
0.6
0.8
1
top-K
rankin
gacc
ura
cySpectral MethodRegularized MLE
n = 200, p = 0.25, L = 20Top-K ranking 15/ 21
Optimal control of entrywise errori: rank w⇤
i : preference scorefigure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {
�K
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {
�K
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K < 12�K
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K < 12�K
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K < 12�K · · ·
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K < 12�K · · ·
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K < 12�K · · ·
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K < 12�K · · ·
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K < 12�K · · ·
1
i: rank w⇤i : preference score
figure credit: Opydo
w⇤1 w1 w⇤
2 w2 w⇤3 w3 w⇤
K wK w⇤K+1 wK+1
true score score estimates
z }| {|{z}
�K < 12�K · · ·
1
Theorem 2Suppose p & logn
n and sample size & n logn∆2
K. Then with high prob.,
the estimates w returned by both methods obey (up to global scaling)
‖w −w∗‖∞‖w∗‖∞
<12∆K
Top-K ranking 16/ 21
Key ingredient: leave-one-out analysis
For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)
1
2
3
𝑚
𝑛
1 2 3 𝑚 𝑛…
Key ingredient: leave-one-out analysis
For each m œ [n], introduce leave-one-out estimate w(m)
infeasible feasible sample size statistical independence stability
1
infeasible feasible sample size statistical independence stability
1
Top-K Ranking 15/ 19
……
…
infeasible feasible sample size statistical independence stability y = [yi,j ]1i,jn =)
1
infeasible feasible sample size statistical independence stability y = [yi,j ]1i,jn =)
1
infeasible feasible sample size statistical independence stability
1
infeasible feasible sample size statistical independence stability
1
Top-K ranking 17/ 21
Key ingredient: leave-one-out analysis
For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)
1
2
3
𝑚
𝑛
1 2 3 𝑚 𝑛…
Key ingredient: leave-one-out analysis
For each m œ [n], introduce leave-one-out estimate w(m)
infeasible feasible sample size statistical independence stability
1
infeasible feasible sample size statistical independence stability
1
Top-K Ranking 15/ 19
……
…
infeasible feasible sample size statistical independence stability y = [yi,j ]1i,jn =)
1
infeasible feasible sample size statistical independence stability y = [yi,j ]1i,jn =)
1
infeasible feasible sample size statistical independence stability
1
infeasible feasible sample size statistical independence stability
1
Top-K ranking 17/ 21
Exploit statistical independence
1
2
3
𝑚
𝑛
1 2 3 𝑚 𝑛…
Key ingredient: leave-one-out analysis
For each m œ [n], introduce leave-one-out estimate w(m)
infeasible feasible sample size statistical independence stability
1
infeasible feasible sample size statistical independence stability
1
Top-K Ranking 15/ 19
……
…
infeasible feasible sample size statistical independence stability y = [yi,j ]1i,jn =)
1
infeasible feasible sample size statistical independence stability y = [yi,j ]1i,jn =)
1
leave-one-out estimate w(m) |= all data related to mth itemTop-K ranking 18/ 21
Leave-one-out stability
leave-one-out estimate w(m) ≈ true estimate w
• Spectral method: eigenvector perturbation bound
‖π − π‖π∗ .∥∥π(P − P )
∥∥π∗
spectral-gap
◦ new Davis-Kahan bound for probability transition matrices︸ ︷︷ ︸asymmetric
• MLE: local strong convexity
‖θ − θ‖2 .‖∇Lλ(θ; y)‖2
strong convexity parameter
Top-K ranking 19/ 21
Leave-one-out stability
leave-one-out estimate w(m) ≈ true estimate w
• Spectral method: eigenvector perturbation bound
‖π − π‖π∗ .∥∥π(P − P )
∥∥π∗
spectral-gap
◦ new Davis-Kahan bound for probability transition matrices︸ ︷︷ ︸asymmetric
• MLE: local strong convexity
‖θ − θ‖2 .‖∇Lλ(θ; y)‖2
strong convexity parameter
Top-K ranking 19/ 21
Leave-one-out stability
leave-one-out estimate w(m) ≈ true estimate w
• Spectral method: eigenvector perturbation bound
‖π − π‖π∗ .∥∥π(P − P )
∥∥π∗
spectral-gap
◦ new Davis-Kahan bound for probability transition matrices︸ ︷︷ ︸asymmetric
• MLE: local strong convexity
‖θ − θ‖2 .‖∇Lλ(θ; y)‖2
strong convexity parameter
Top-K ranking 19/ 21
A small sample of related works• Parametric models
◦ Ford ’57◦ Hunter ’04◦ Negahban, Oh, Shah ’12◦ Rajkumar, Agarwal ’14◦ Hajek, Oh, Xu ’14◦ Chen, Suh ’15◦ Rajkumar, Agarwal ’16◦ Jang, Kim, Suh, Oh ’16◦ Suh, Tan, Zhao ’17
• Non-parametric models◦ Shah, Wainwright ’15◦ Shah, Balakrishnan, Guntuboyina, Wainwright ’16◦ Chen, Gopi, Mao, Schneider ’17
• Leave-one-out analysis◦ El Karoui, Bean, Bickel, Lim, Yu ’13◦ Zhong, Boumal ’17◦ Abbe, Fan, Wang, Zhong ’17◦ Ma, Wang, Chi, Chen ’17◦ Chen, Chi, Fan, Ma ’18◦ Chen, Chi, Fan, Ma, Yan ’19
Top-K ranking 20/ 21
Summary
Spectral method
Regularized MLE
Optimal samplecomplexity
Linear-time computational complexity
✔
✔ ✔
✔
Novel entrywise perturbation analysis for spectral method and convexoptimization
Paper: “Spectral method and regularized MLE are both optimal for top-Kranking”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics, vol. 47, 2019
Top-K ranking 21/ 21