Near-Neighbor Methods in Random Preference Completionhomepages.rpi.edu/~liua6/docs/NN_presentation.pdf · 2/27/2019 · Near-Neighbor Methods in Random Preference Completion Settings

Near-Neighbor Methods in Random

Preference Completion

Ao Liu[RPI], Qiong Wu[WM], Zhenming Liu[WM] and Lirong Xia[RPI]

02/27/2019

Introduction to Preference Completion

1

Learn to Ratings Learn to Ranks (Preference Completion)

(Commonly used) (More robust)

x1 R1 = {y1, y3}≻ y2 ≻ y4 ≻ y5

x2 R2 = y1 ≻ y3 and y4 ≻ y5

x3 R3 = y5 ≻ y4 ≻ others

x4 R4 = y1 ≻ y3 ≻ y4 ≻ y5

i.e., commented, “I prefer y5 to

y4, all others are worse.”

Near-Neighbor Methods in Random Preference Completion

Settings for (user-wise) preference completion problem in recommender systems:

• y1,···, ym : m alternatives (items).

• x1,···, xn : n agents (users) with given partial preference over y1,···, ym.

2

x1 R1 = {y1, y3}≻ y2 ≻ y4 ≻ y5

x2 R2 = y1 ≻ y3 and y4 ≻ y5

x3 R3 = y5 ≻ y4 ≻ y1 ≻ y3

x4 R4 = y1 ≻ y3 ≻ y4 ≻ y5

Inputs: n agents’ partial preferences

Output: i th agent’ full ranking

xi Ri = y1 ≻ y2 ≻ y3 ≻ y4 ≻ y5

Near-Neighbor Method[Liu-2007]:

Similar Agents

y2 ≻ y4 ≻ y5

A Widely-Used Algorithm [Liu-2009, Katz-Samuels and Scott-2018]

3

NK(R1, R4) =

NK(R2, R4) =

NK(R3, R4) =

x1 R1 = {y1, y3}≻ y2 ≻ y4 ≻ y5

x2 R2 = y3 ≻ y1 and y4 ≻ y5

x3 R3 = y5 ≻ y4 ≻ y1 ≻ y3

x4 R4 = y1 ≻ y3 ≻ y4 ≻ y5

Observable:

n agents’ partial preferences.

Near Neighbor

Formal Definition of KT-kNN Algorithm

Normalized Kendall-tau (NK) distance

between rankings Ri and Rj :

NK(Ri, Rj) = # Pairs ranked opposite in Ri and Rj

# Pairs ranked both by Ri and Rj

Ranked both: y3 ≻ y1 and y4 ≻ y5

Ranked opposite: y3 ≻ y1

1

2

5

6

0

5

NK( , ) = 1

2

Example:

? ?

?

Rankings from adding

deterministic noise to utilities:

– Yes [Katz-Samuels and Scott-2018]

Rankings from adding random

noise to utilites:

– Open Question (Algorithms work under

deterministic settings usually also

work under random settings)

Is NK an

effective metric?

Formal Definition of KT-kNN Algorithm

Normalized Kendall-tau (NK) distance

between rankings Ri and Rj :

NK(Ri, Rj) = # Pairs ranked opposite in Ri and Rj

# Pairs ranked both by Ri and Rj

4

• All agents and alternatives are assigned with a

latent vector in Rd space (latent space).

A Very Simple and Classic model:

Latent Space

Kyle Stan

Kenny

Eric

1

2

3

4

5

Late

nt

Fea

ture

1

Latent Feature 2

Alternatives

Agents

Alternatives closer to

agents have higher utility

A Widely-Used Algorithm [Liu-2009, Katz-Samuels and Scott-2018]

– Need a model for ground-truth

KT-kNN’s Inefficacy under Random Noise Setting

Main Contribution 1: KT-kNN is incorrect on datasets with Plackett-Luce noise.

Theorem 1: (informal) For 1D latent space, with at least 50% probability,

• ||x*KT-kNN

- xi|| = Θ(1) .

• Alternative closer to agents has higher probability to be more preferred [Plackett-1975, Luce-1977].

• Agents close to each other have similar distribution on rankings.

Properties:

xi

1 -1 -0.5

Predicted by

KT-kNN Predicted NN

should be

5

For example, if DY = Uniform([-1,1]), For xi∈[-1, -0.5],

with high probability we have x*KT-kNN close to -1.

Noise Setup: Another Very Simple and Classic Model

Plackett-Luce Model: Verified using real-world election data [Gormley and Murphy-2007]

Anchor-kNN, a Correct Algorithm

• Our algorithm uses information from other agents’ rankings.

• Features: F(xi) ≡ (Fi(1), … , Fi

(n)) = ( NK(R1, Ri) , · · · , NK(Rn, Ri) ). • Distance Function: D( xi , xj ) ≈ || F(xi) - F(xj) ||1.

Main Contribution2: New kNN algorithm able to find correct neighbors

6

NK Distance Matrix

NK(R2, R3)

F(xi) (Row i)

F(xj) (Row j)

xi xj

xk

Anchor-kNN, a Correct Algorithm

Theorem 2: (informal) For 1-dimensional latent space, if all agents

rank at least poly-log(m) alternatives, with probability 1-o(n-0.2),

|| x*Anchor-kNN – xi || < o(1) .

7

Main Contribution2: New kNN algorithm able to find correct neighbors

Theorem 1: (informal) For 1D latent space, with at least 50%

probability, ||x*

KT-kNN - xi|| = Θ(1) .

Numerical Validations

8

Numerical Validations to Anchor-kNN and KT-kNN:

• KT-kNN: incorrect algorithm

• Anchor-kNN: our (correct) algorithm

• Ground Truth-kNN: Information theoretical optimal

k = 751, the optimal k for Ground Truth-kNN

Real-World Experiments

9

• KT-kNN: incorrect algorithm

• Anchor-kNN: our (correct) algorithm

• Collaborative Filter (CF): base-line algorithm, using cosine similarity.

KT Coefficient Spearman Rho Precision @ 5

CF KT-kNN Anchor-

kNN CF KT-kNN

Anchor-

kNN CF KT-kNN

Anchor-

kNN

k = 5 0.0531 0.0548 0.0989 0.0787 0.0811 0.1462 0.3286 0.3386 0.4045

k = 15 0.1199 0.1259 0.1646 0.1770 0.1860 0.2423 0.4291 0.4573 0.4850

k = 25 0.1403 0.157 0.1869 0.2214 0.2307 0.2742 0.4718 0.4823 0.5077

Dataset: standard Netflix dataset

Conclusion: Anchor-kNN >> KT-kNN ≈ Collaborative Filter

Generalizations

10

Generalization: High dimensional latent spaces

• We only proved 1-D cases using symmetry property.

• We believe Anchor-kNN Algorithm is also correct for higher dimensional latent

spaces according to our simulation on higher dimensional latent spaces.

Conclusions

• We proved a widely used KT-kNN Algorithm is

incorrect on noisy datasets.

• We proposed Anchor-kNN Algorithm that works on

noisy datasets.

• We generalized all conclusions above to high-

dimensional latent spaces on both synthetic and real-

world data.

11

References:

1. [Liu-2007] Tie-Yan Liu. Learning to rank for information retrieval. Found. Trends Inf. Retr., 3(3):225–331,

March 2009. ISSN 1554-0669.

2. [Katz-Samuels and Scott-2018] Katz-Samuels J, Scott C. Nonparametric Preference Completion.

International Conference on Artificial Intelligence and Statistics 2018 Mar 31 (pp. 632-641).

3. [Gormley and Murphy-2007] Gormley IC, Murphy TB. A latent space model for rank data. Statistical

Network Analysis: Models, Issues, and New Directions 2007 (pp. 90-102). Springer, Berlin, Heidelberg.

4. [Plackett-1975] Robin L. Plackett. The analysis of permutations. Journal of the Royal Statistical Society.

Series C (Applied Statistics), 24(2):193–202, 1975.

5. [Luce-1977] R. Duncan Luce. The choice axiom after twenty years. Journal of Mathematical Psychology,

15(3):215–233, 1977.

12

Thanks for your time !

Documents

Near-Neighbor Methods in Random Preference Completionhomepages.rpi.edu/~liua6/docs/NN_presentation.pdf · 2/27/2019 · Near-Neighbor Methods in Random Preference Completion Settings