Spectral Method and Regularized MLE Are Both …yc5/slides/topK_slides.pdfRank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players ﬁgure credit:

Spectral Method and Regularized MLE AreBoth Optimal for Top-K Ranking

Yuxin ChenElectrical Engineering, Princeton University

Joint work with Jianqing Fan, Cong Ma and Kaizheng Wang

RankingA fundamental problem in a wide range of contexts• web search, recommendation systems, admissions, sports

competitions, voting, ...

PageRankfigure credit: Dzenan Hamzic

Top-K ranking 2/ 21

Rank aggregation from pairwise comparisons

pairwise comparisons for ranking top tennis playersfigure credit: Bozoki, Csato, Temesi

Top-K ranking 3/ 21

Parametric modelsAssign latent score to each of n items w∗ = [w∗1, · · · , w∗n]

i: rank wi: preference score

1

i:rank

w⇤ i:

preference

score

1

• This work: Bradley-Terry-Luce (logistic) model

P {item j beats item i} =w∗j

w∗i + w∗j• Other models: Thurstone model, low-rank model, ...

Top-K ranking 4/ 21

Parametric modelsAssign latent score to each of n items w∗ = [w∗1, · · · , w∗n]

i: rank wi: preference score

1

i:rank

w⇤ i:

preference

score

1

• This work: Bradley-Terry-Luce (logistic) model

P {item j beats item i} =w∗j

w∗i + w∗j• Other models: Thurstone model, low-rank model, ...

Top-K ranking 4/ 21

Typical ranking proceduresEstimate latent scores−→ rank items based on score estimates

Goal: identify the set of top-K items under minimal sample size

Top-K ranking 5/ 21

Top-K rankingEstimate latent scores−→ rank items based on score estimates

Goal: identify the set of top-K items under minimal sample size

Top-K ranking 5/ 21

Model: random sampling

• Comparison graph: Erdos–Renyi graph G ∼ G(n, p)

1

2

34

5

6

7

8

910

11

12

• For each (i, j) ∈ G, obtain L paired comparisons

y(l)i,j

ind.=

1, with prob. w∗jw∗i +w∗j

0, else1 ≤ l ≤ L

yi,j = 1L

L∑l=1

y(l)i,j (sufficient statistic)

Top-K ranking 6/ 21

Model: random sampling

• Comparison graph: Erdos–Renyi graph G ∼ G(n, p)

1

2

34

5

6

7

8

910

11

12

• For each (i, j) ∈ G, obtain L paired comparisons

y(l)i,j

ind.=

1, with prob. w∗jw∗i +w∗j

0, else1 ≤ l ≤ L

yi,j = 1L

L∑l=1

y(l)i,j (sufficient statistic)

Top-K ranking 6/ 21

Prior art

Spectral method

MLE

mean square error for estimating scores

top-K ranking accuracy

✔✔

Spectral MLE ✔ ✔

Negahban et al. ‘12

Negahban et al. ’12 Hajek et al. ‘14

Chen & Suh. ’15

Top-K ranking 7/ 21

Prior art

Spectral method

MLE

mean square error for estimating scores

top-K ranking accuracy

✔✔

Spectral MLE ✔ ✔

“meta metric”a

Negahban et al. ‘12

Negahban et al. ’12 Hajek et al. ‘14

Chen & Suh. ’15

Top-K ranking 7/ 21

Small `2 loss 6= high ranking accuracy

These two estimates have same `2 loss, but output different rankings

Need to control entrywise error!

Top-K ranking 8/ 21




Top-K ranking 8/ 21




Top-K ranking 8/ 21




Top-K ranking 8/ 21

Optimality?

Is spectral method or MLE alone optimal for top-K ranking?

Partial answer (Jang et al ’16):spectral method works if comparison graph is sufficiently dense

This work: affirmative answer for both methods + entire regime︸︷︷︸inc. sparse graphs

Top-K ranking 9/ 21

Optimality?




Top-K ranking 9/ 21

Optimality?




Top-K ranking 9/ 21

Spectral method (Rank Centrality)

Negahban, Oh, Shah ’12

• Construct a probability transition matrix P , whose off-diagonalentries obey

Pi,j ∝{yi,j , if (i, j) ∈ G0, if (i, j) /∈ G

• Return score estimate as leading left eigenvector of P

Top-K ranking 10/ 21

Rationale behind spectral method

In large-sample case, P → P ∗, whose off-diagonal entries obey

P ∗i,j ∝

w∗j

w∗i +w∗j, if (i, j) ∈ G

0, if (i, j) /∈ G

• Stationary distribution of reversible︸︷︷︸check detailed balance

P ∗

π∗ ∝ [w∗1, w∗2, . . . , w∗n]︸︷︷︸true score


Regularized MLE

Negative log-likelihood

L (w) := −∑

(i,j)∈G

{yj,i log wi

wi + wj+ (1− yj,i) log wj

wi + wj

}

• L (w) becomes convex after reparametrization:

w −→ θ = [θ1, · · · , θn], θi = logwi

(Regularized MLE) minimizeθ Lλ (θ) := L (θ) + 12λ‖θ‖

22

choose λ �√

np lognL


Regularized MLE

Negative log-likelihood

L (w) := −∑

(i,j)∈G

{yj,i log wi

wi + wj+ (1− yj,i) log wj

wi + wj

}

• L (w) becomes convex after reparametrization:

w −→ θ = [θ1, · · · , θn], θi = logwi

(Regularized MLE) minimizeθ Lλ (θ) := L (θ) + 12λ‖θ‖

22

choose λ �√

np lognL


Main resultcomparison graph G(n, p); sample size � pn2L

1

2

34

5

6

7

8

910

11

12

Theorem 1 (Chen, Fan, Ma, Wang ’17)

When p & lognn , both spectral method and regularized MLE achieve

optimal sample complexity for top-K ranking!

infeasible feasible sample size

1


1

Comparison with Jang et al’16

Jang et al’16: spectral method controls entrywise error if p &Û

logn

n¸ ˚˙ ˝relatively dense

Our work / optimal sample size Jang et al ’16 �K

�log n

n

�1/4

1


�log n

n

�1/4

1


�log n

n

�1/4

1


�log n

n

�1/4

1


�log n

n

�1/4score

separation: �K : score separation

1

Top-K Ranking 14/ 21


⇣lognn

⌘1/4

scoreseparation: �K : score separation

achievable by both methods

1

• ∆K :=w∗(K)−w

∗(K+1)

‖w∗‖∞ : score separation


Main result


1


1

Comparison with Jang et al’16

Jang et al’16: spectral method controls entrywise error if p &Û

logn

n¸ ˚˙ ˝relatively dense


�log n

n

�1/4

1


�log n

n

�1/4

1


�log n

n

�1/4

1


�log n

n

�1/4

1


�log n

n

�1/4score

separation: �K : score separation

1



⇣lognn

⌘1/4


achievable by both methods

1

• ∆K :=w∗(K)−w

∗(K+1)

‖w∗‖∞ : score separation


Comparison with Jang et al ’16

Jang et al ’16: spectral method controls entrywise error if p &

√lognn︸︷︷︸

relatively dense


⇣lognn

⌘1/4

1


⇣lognn

⌘1/4

1


⇣lognn

⌘1/4

1


⇣lognn

⌘1/4

1


⇣lognn

⌘1/4


1


Comparison with Jang et al ’16

Jang et al ’16: spectral method controls entrywise error if p &

√lognn︸︷︷︸

relatively denseOur work / optimal sample size Jang et al ’16 �K

⇣lognn

⌘1/4

1


⇣lognn

⌘1/4

1


⇣lognn

⌘1/4

1


⇣lognn

⌘1/4

1


⇣lognn

⌘1/4


1


Empirical top-K ranking accuracy

0.1 0.2 0.3 0.4 0.5"K : score separation

0

0.2

0.4

0.6

0.8

1

top-K

rankin

gacc

ura

cySpectral MethodRegularized MLE

n = 200, p = 0.25, L = 20Top-K ranking 15/ 21

Optimal control of entrywise errori: rank w⇤

i : preference scorefigure credit: Opydo

w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1

true score score estimates

1

i: rank w⇤i : preference score

figure credit: Opydo

w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {

�K

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {

�K

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K < 12�K

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K < 12�K

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K < 12�K · · ·

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K < 12�K · · ·

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K < 12�K · · ·

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K < 12�K · · ·

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K < 12�K · · ·

1



w⇤1 w1 w⇤

2 w2 w⇤3 w3 w⇤

K wK w⇤K+1 wK+1


z }| {|{z}

�K < 12�K · · ·

1

Theorem 2Suppose p & logn

n and sample size & n logn∆2

K. Then with high prob.,

the estimates w returned by both methods obey (up to global scaling)

‖w −w∗‖∞‖w∗‖∞

<12∆K


Key ingredient: leave-one-out analysis

For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)

1

2

3

𝑚

𝑛

1 2 3 𝑚 𝑛…


For each m œ [n], introduce leave-one-out estimate w(m)

infeasible feasible sample size statistical independence stability

1


1


……

…

infeasible feasible sample size statistical independence stability y = [yi,j ]1i,jn =)

1


1


1


1



For each 1 ≤ m ≤ n, introduce leave-one-out estimate w(m)

1

2

3

𝑚

𝑛

1 2 3 𝑚 𝑛…




1


1


……

…


1


1


1


1


Exploit statistical independence

1

2

3

𝑚

𝑛

1 2 3 𝑚 𝑛…




1


1


……

…


1


1

leave-one-out estimate w(m) |= all data related to mth itemTop-K ranking 18/ 21

Leave-one-out stability

leave-one-out estimate w(m) ≈ true estimate w

• Spectral method: eigenvector perturbation bound

‖π − π‖π∗ .∥∥π(P − P )

∥∥π∗

spectral-gap

◦ new Davis-Kahan bound for probability transition matrices︸︷︷︸asymmetric

• MLE: local strong convexity

‖θ − θ‖2 .‖∇Lλ(θ; y)‖2

strong convexity parameter





‖π − π‖π∗ .∥∥π(P − P )

∥∥π∗

spectral-gap



‖θ − θ‖2 .‖∇Lλ(θ; y)‖2






‖π − π‖π∗ .∥∥π(P − P )

∥∥π∗

spectral-gap



‖θ − θ‖2 .‖∇Lλ(θ; y)‖2



A small sample of related works• Parametric models

◦ Ford ’57◦ Hunter ’04◦ Negahban, Oh, Shah ’12◦ Rajkumar, Agarwal ’14◦ Hajek, Oh, Xu ’14◦ Chen, Suh ’15◦ Rajkumar, Agarwal ’16◦ Jang, Kim, Suh, Oh ’16◦ Suh, Tan, Zhao ’17

• Non-parametric models◦ Shah, Wainwright ’15◦ Shah, Balakrishnan, Guntuboyina, Wainwright ’16◦ Chen, Gopi, Mao, Schneider ’17

• Leave-one-out analysis◦ El Karoui, Bean, Bickel, Lim, Yu ’13◦ Zhong, Boumal ’17◦ Abbe, Fan, Wang, Zhong ’17◦ Ma, Wang, Chi, Chen ’17◦ Chen, Chi, Fan, Ma ’18◦ Chen, Chi, Fan, Ma, Yan ’19


Summary

Spectral method

Regularized MLE

Optimal samplecomplexity

Linear-time computational complexity

✔

✔ ✔

✔

Novel entrywise perturbation analysis for spectral method and convexoptimization

Paper: “Spectral method and regularized MLE are both optimal for top-Kranking”, Y. Chen, J. Fan, C. Ma, K. Wang, Annals of Statistics, vol. 47, 2019


Documents

Spectral Method and Regularized MLE Are Both …yc5/slides/topK_slides.pdfRank aggregation from pairwise comparisons pairwise comparisons for ranking top tennis players ﬁgure credit: