22
Xi Chen Stern School of Business New York University [joint work with Yuchen Zhang, Michael Jordan (UC Berkeley) and Denny Zhou (Microsoft Research)] Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing

Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Xi Chen

Stern School of BusinessNew York University

[joint work with Yuchen Zhang, Michael Jordan (UC Berkeley) and Denny Zhou (Microsoft Research)]

Spectral Methods Meet EM: A Provably Optimal Algorithm for Crowdsourcing

Page 2: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Big Data Analysis

2

DOG

Page 3: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Binary Labeling Tasks in Crowdsourcing

3

Binary Labeling Task:𝑚𝑚 workers 𝑛𝑛 binary tasks/items: 𝑦𝑦𝑗𝑗 ∈ −1, +1 , 𝑗𝑗 ∈ {1, … ,𝑛𝑛}

Elliptical: +1 Spiral: −𝟏𝟏

Page 4: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Binary Labeling Tasks in Crowdsourcing

4

+1 −𝟏𝟏

+1

+1 −𝟏𝟏 −𝟏𝟏+1

Challenge: different workers has different reliability

Page 5: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Statistical Estimation in Crowdsourcing

Statistical estimation in crowdsourcing: Given partially observed labeling matrix for 𝑛𝑛 tasks from 𝑚𝑚 workers Infer the true task labels Infer the accuracy of workers

How to model labeling process and accuracy of workers ?

5

Worker 1 Worker 2 Worker 3 Worker 4 Worker 5Task 1 +1 −𝟏𝟏 −𝟏𝟏 +1 −𝟏𝟏Task 2 +1 −𝟏𝟏 +1Task 3 −𝟏𝟏 +𝟏𝟏 +𝟏𝟏 −𝟏𝟏Task 4 −𝟏𝟏 −𝟏𝟏 −𝟏𝟏 −𝟏𝟏Task 5 +𝟏𝟏 −𝟏𝟏

Page 6: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Dawid & Skene Model (1979)

A generative model for multi-class labeling task (𝑘𝑘 classes) For any task 𝑗𝑗 ∈ [𝑛𝑛] with true label 𝑦𝑦𝑗𝑗:

Pr(𝑦𝑦𝑗𝑗 = 𝑙𝑙 ) = 𝑤𝑤𝑙𝑙 for 𝑙𝑙 ∈ 𝑘𝑘 For worker 𝑖𝑖 ∈ 𝑚𝑚 :

Pr(worker 𝑖𝑖 will label a randomly chosen task)= 𝜋𝜋𝑖𝑖 The label from worker 𝑖𝑖 ∈ [𝑚𝑚] to task 𝑗𝑗 ∈ 𝑛𝑛 : 𝑧𝑧𝑖𝑖𝑗𝑗∈ 𝑅𝑅𝑘𝑘 (𝑧𝑧𝑖𝑖𝑗𝑗 = 𝑒𝑒1, … , 𝑒𝑒𝑘𝑘)

Pr( 𝑧𝑧𝑖𝑖𝑗𝑗 = 𝑒𝑒𝑐𝑐| 𝑦𝑦𝑗𝑗) = 𝜇𝜇𝑖𝑖𝑦𝑦𝑗𝑗𝑐𝑐

Confusion matrix 𝐶𝐶𝑖𝑖 = {𝜇𝜇𝑖𝑖𝑦𝑦𝑐𝑐}𝑦𝑦,𝑐𝑐 for each worker 𝑖𝑖 ∈ [𝑚𝑚]

6

Worker 1: 𝑪𝑪𝟏𝟏 Worker 2: 𝑪𝑪𝟐𝟐 Worker 3: 𝑪𝑪𝟑𝟑

Page 7: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

The label from worker 𝑖𝑖 ∈ [𝑚𝑚] to task 𝑗𝑗 ∈ 𝑛𝑛 : 𝑧𝑧𝑖𝑖𝑗𝑗Pr( 𝑧𝑧𝑖𝑖𝑗𝑗 = 𝑒𝑒𝑐𝑐| 𝑦𝑦𝑗𝑗) = 𝜇𝜇𝑖𝑖𝑦𝑦𝑗𝑗𝑐𝑐

Joint likelihood of 𝜇𝜇𝑖𝑖𝑦𝑦𝑗𝑗𝑐𝑐 given true labels 𝑦𝑦𝑗𝑗 and 𝑧𝑧𝑖𝑖𝑗𝑗

Maximize the marginal log-likelihood (w.r.t. true labels 𝑦𝑦𝑗𝑗)

EM Algorithm (Dawid & Skene, 1979) Initialize label estimate �𝑦𝑦𝑗𝑗 for 𝑗𝑗 ∈ 𝑛𝑛 by majority voting Estimate the confusion matrix { �𝜇𝜇𝑖𝑖𝑦𝑦𝑐𝑐}𝑦𝑦,𝑐𝑐for 𝑖𝑖 ∈ 𝑚𝑚 Re-estimate the label �𝑦𝑦𝑗𝑗 for 𝑗𝑗 ∈ 𝑛𝑛 Repeat the above two steps until convergence 7

Dawid & Skene Model (1979)

Page 8: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Drawback of EM

Maximizing the log-likelihood leads to estimator which achieves the minimax rate in terms of labeling error (Gao & Zhou, 2013)

However, the log-likelihood is not concave. EM may trap in local stationary points

Idea: provide EM a good initialization ! Recent work [Balakrishanan, et al., 14, Wang et. al., 14] requires that 𝜃𝜃0 ∈ 𝐵𝐵2(𝑟𝑟,𝜃𝜃∗)

Using spectral method to obtain a good initialization! 8

Page 9: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Spectral Methods (method of moments)

Spectral Method: estimate latent parameters (i.e., confusion matrices) from the first three moments of data

Hidden Markov Model (Hsu, Kakade and Zhang, 2012) Multi-view Model (Anandkumar et al., 2012) Gaussian Mixture Model (Hsu and Kakade, 2013) Latent Dirichlet Model (Anandkumar et al., 2013) Community Detection (Anandkumar et al., 2013) Mixture of Linear Regression (Chaganty and Liang, 2013) Mixture of Discrete Product Distributions (Jain and Oh, 2014) ….

9

Page 10: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Spectral Method for Crowdsourcing

The label from worker 𝑖𝑖 ∈ [𝑚𝑚] to task 𝑗𝑗 ∈ 𝑛𝑛 : 𝑧𝑧𝑖𝑖𝑗𝑗Pr( 𝑧𝑧𝑖𝑖𝑗𝑗 = 𝑒𝑒𝑐𝑐| 𝑦𝑦𝑗𝑗) = 𝜇𝜇𝑖𝑖𝑦𝑦𝑗𝑗𝑐𝑐

𝜇𝜇𝑖𝑖𝑦𝑦 ∈ 𝑅𝑅𝑘𝑘: label distribution of worker 𝑖𝑖 when the true label is 𝑦𝑦

Assumption: 𝜇𝜇𝑖𝑖𝑦𝑦 𝑦𝑦∈[𝑘𝑘]are linearly independent for 𝑖𝑖 ∈ [𝑚𝑚]

Spectral Method: Obtain unbiased estimator �𝑀𝑀𝑖𝑖,2, �𝑀𝑀𝑖𝑖,3 of the following moments

from raw data for each worker 𝑖𝑖 ∈ 𝑚𝑚

Obtain estimate �𝑤𝑤, �𝜇𝜇𝑖𝑖𝑦𝑦 from �𝑀𝑀𝑖𝑖,2, �𝑀𝑀𝑖𝑖,3 (tensor decomposition via tensor power method from Anandkumar et. al., 14)

10

Page 11: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Spectral Method for Crowdsourcing

How to obtain unbiased estimator �𝑀𝑀𝑖𝑖,2, �𝑀𝑀𝑖𝑖,3? Study population version of

Naïve Cross Moments: for worker 𝑎𝑎, 𝑏𝑏 and task 𝑗𝑗:

Symmetrization technique from multi-view model (Anandkumar et al., 2012)

11

Page 12: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Spectral Method for Crowdsourcing

Spectral Initialization: Compute empirical moment estimates: �𝑀𝑀𝑖𝑖,2, �𝑀𝑀𝑖𝑖,3

Recover �𝑤𝑤, �𝜇𝜇𝑖𝑖𝑦𝑦 from �𝑀𝑀𝑖𝑖,2, �𝑀𝑀𝑖𝑖,3 from tensor decomposition

The error of recovery (estimation error of confusion matrix) depends on the estimation errors: �𝑀𝑀𝑖𝑖,2 − 𝑀𝑀𝑖𝑖,2, �𝑀𝑀𝑖𝑖,3 − 𝑀𝑀𝑖𝑖,3 and goes to zero as 𝑛𝑛 → ∞

Pros: provably consistent estimator

Cons: slow convergence / not robust for small data / no labeling prediction accuracy

12

Page 13: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Spectral vs EM

EM Algorithm: no theoretical guarantee Spectral: provably consistent but slow convergence Spectral + EM ?

13

Page 14: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Spectral + EM

Spectral + EM Algorithm

1. Initialize confusion matrix estimate �𝜇𝜇𝑖𝑖𝑦𝑦 𝑖𝑖∈ 𝑚𝑚 ,𝑦𝑦∈[𝑘𝑘]by spectral method

2. Based on (1): estimate labels �𝑦𝑦𝑗𝑗 for 𝑗𝑗 ∈ 𝑛𝑛

3. Based on (2): re-calculate confusion matrix estimate �𝜇𝜇𝑖𝑖𝑦𝑦 𝑖𝑖∈ 𝑚𝑚 ,𝑦𝑦∈[𝑘𝑘]

4. Repeat (2) and (3) for one or more times

5. Output �𝜇𝜇𝑖𝑖𝑦𝑦 and �𝑦𝑦𝑗𝑗

14

Page 15: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Assumption

A partition of 𝑚𝑚 workers into three disjoint groups, for each group, the average (over workers in the group) probability of assigning the true label 𝑦𝑦 > 1

𝑘𝑘

Much weaker than the quality requirement for each worker

15

Page 16: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Theoretical Analysis

Measures the average ability of workers in identifying distinct labels

𝐷𝐷𝐾𝐾𝐾𝐾(𝜇𝜇𝑖𝑖𝑦𝑦 , 𝜇𝜇𝑖𝑖𝑦𝑦′) = ∑𝑐𝑐=1𝑘𝑘 𝜇𝜇𝑖𝑖𝑦𝑦𝑐𝑐log 𝜇𝜇𝑖𝑖𝑖𝑖𝑖𝑖𝜇𝜇𝑖𝑖𝑖𝑖′𝑖𝑖

�𝐷𝐷 > 0: every pair of labels can be distinguished by at least one worker

Smaller �𝐷𝐷: needs more workers and more tasks

Other quantities: 𝑤𝑤min = min𝑦𝑦∈[𝑘𝑘] 𝑤𝑤𝑦𝑦 ,𝜋𝜋min = min𝑖𝑖∈[𝑚𝑚] 𝜋𝜋𝑖𝑖16

Page 17: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Theoretical Results

17

Page 18: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Optimality of the Result

Are 𝑚𝑚 = �𝑂𝑂 1�𝐷𝐷

workers necessary ?

Is the estimation error rate | �𝜇𝜇𝑖𝑖𝑦𝑦 − 𝜇𝜇𝑖𝑖𝑦𝑦|2≤ 1

𝑤𝑤𝑖𝑖𝜋𝜋𝑖𝑖𝑛𝑛optimal?

18

Page 19: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Simulated Study: Convergence Rate

Opt-D&S: Spectral+ EM MV-D&S: Majority Voting + EM

19

Page 20: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Real Data Experiments

20

Page 21: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Other Problems in Crowdsourcing

Modeling task difficulty and wisely allocate budget among tasks & workers: Bayesian Markov decision process Optimistic knowledge gradient policy (C and Lin and Zhou, JMLR

2015)

Adaptive worker selection and optimal stopping Adaptive sequential probability ratio test (Ada-SPRT) (Li and

Chen and C and Liu and Ying, under submission)

Top 𝐾𝐾 worker identification with gold samples Top arm identification in multi-armed bandit (C and Li and Zhou,

under submission)

21

Page 22: Spectral Methods Meet EM: A Provably Optimal Algorithm for ...people.stern.nyu.edu/xchen3/TeachingMaterial/CMU_Summer_Schoo… · Denny Zhou (Microsoft Research)] Spectral Methods

Spectral vs Spectral + EM

Spectral does NOT provide labeling prediction and labeilngaccuracy First step of EM is unavoidable

For any worker 𝑖𝑖 ∈ 𝑚𝑚 and true label 𝑦𝑦 ∈ 𝑘𝑘 , for �𝜇𝜇𝑖𝑖𝑦𝑦 − 𝜇𝜇𝑖𝑖𝑦𝑦

2≤ 𝜖𝜖

22

Error Spectral Spectral +EM

𝜖𝜖