222
Low-rank matrix estimation PC imputation A Model for MCA Contribution to missing values & principal components methods Julie Josse Soutenance d’habilitation à diriger des recherches Orsay University, 30 August 2016 1 / 44

Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Contribution to missing values& principal components methods

Julie JosseSoutenance d’habilitation à diriger des recherches

Orsay University, 30 August 2016

August 30, 2016

1 / 44

Page 2: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Research activitiesPrincipal Component Analysis⇒ continuous variables

Correspondence Analysis⇒ contingency table

Multiple Correspondence Analysis⇒ categorical variables

Figure 4: Multi-way glioma data set: Characteristics of oligodendrogliomas are linked to modifications ofthe genomic status of genes located on 1p and 19q positions.

27

−3 −2 −1 0 1 2 3 4

−3

−2

−1

01

2

Dim 1 (21.00%)

Dim

2 (

13.5

1%)

AA3

AO1

AO2

AO3

AOA1AOA2

AOA3AOA4AOA6

AOA7

GBM1

GBM11GBM15

GBM16

GBM21

GBM22

GBM23

GBM24

GBM25

GBM26GBM27

GBM28

GBM29

GBM3

GBM30

GBM31

GBM4GBM5

GBM6

GBM9

GNN1

GS1

GS2 JPA2

JPA3

LGG1

O1O2O3

O4

O5

sGBM1

sGBM3

A

GBMO

OA

AGBMOOA

−0.4 −0.2 0.0 0.2 0.4 0.6

−0.

4−

0.2

0.0

0.2

0.4

Regularized CA

Dim 1 (36.40%)

Dim

2 (

13.3

1%)

Roos

Trum

Eise

Kenn

John

Nixo

Ford

Cart

Reag

BushClin

Bushf

Obam●

−1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

Regularized CA

Dim 1 (36.40%)

Dim

2 (

13.3

1%)

America

American

administration

america

ask care child

cut

development

do

dollar

energy

expenditure

family

federal

fiscal get

job

know

labor

let n

oil

parent

policy

production

programreform

say

school

shall

soviet

tthankthat tonight

war

whichwho

work

●●●

0.0

0.5

1.0

Dim

2 (6

.91%

)

-1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

Dim 1 (16.95%)

Dim

2 (6

.91%

)

-0.4 -0.2 0.0 0.2 0.4

-0.2

0.0

0.2

0.4

MCA factor map

Dim 1 (16.95%)

Dim

2 (6

.91%

)

Reading_0Reading_1

Listening music_0

Listening music_1

Cinema_0

Cinema_1

Show_0

Show_1

Exhibition_0Exhibition_1

Computer_0

Computer_1

Sport_0

Sport_1Walking_0

Walking_1

Travelling_0 Travelling_1Playing music_0

Playing music_1Collecting_0

Collecting_1

Volunteering_0

Volunteering_1

Mechanic_0

Mechanic_1

Gardening_0

Gardening_1

Knitting_0

Knitting_1

Cooking_0

Cooking_1

Fishing_0

Fishing_1

TV_0

TV_1

TV_2

TV_3

TV_4

2 / 44

Page 3: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Research activitiesPrincipal Component Analysis⇒ continuous variables

Correspondence Analysis⇒ contingency table

Multiple Correspondence Analysis⇒ categorical variables

Figure 4: Multi-way glioma data set: Characteristics of oligodendrogliomas are linked to modifications ofthe genomic status of genes located on 1p and 19q positions.

27

−3 −2 −1 0 1 2 3 4

−3

−2

−1

01

2

Dim 1 (21.00%)

Dim

2 (

13.5

1%)

AA3

AO1

AO2

AO3

AOA1AOA2

AOA3AOA4AOA6

AOA7

GBM1

GBM11GBM15

GBM16

GBM21

GBM22

GBM23

GBM24

GBM25

GBM26GBM27

GBM28

GBM29

GBM3

GBM30

GBM31

GBM4GBM5

GBM6

GBM9

GNN1

GS1

GS2 JPA2

JPA3

LGG1

O1O2O3

O4

O5

sGBM1

sGBM3

A

GBMO

OA

AGBMOOA

−0.4 −0.2 0.0 0.2 0.4 0.6

−0.

4−

0.2

0.0

0.2

0.4

Regularized CA

Dim 1 (36.40%)

Dim

2 (

13.3

1%)

Roos

Trum

Eise

Kenn

John

Nixo

Ford

Cart

Reag

BushClin

Bushf

Obam●

−1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

1.5

Regularized CA

Dim 1 (36.40%)

Dim

2 (

13.3

1%)

America

American

administration

america

ask care child

cut

development

do

dollar

energy

expenditure

family

federal

fiscal get

job

know

labor

let n

oil

parent

policy

production

programreform

say

school

shall

soviet

tthankthat tonight

war

whichwho

work

●●●

0.0

0.5

1.0

Dim

2 (6

.91%

)

-1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

Dim 1 (16.95%)

Dim

2 (6

.91%

)

-0.4 -0.2 0.0 0.2 0.4

-0.2

0.0

0.2

0.4

MCA factor map

Dim 1 (16.95%)

Dim

2 (6

.91%

)

Reading_0Reading_1

Listening music_0

Listening music_1

Cinema_0

Cinema_1

Show_0

Show_1

Exhibition_0Exhibition_1

Computer_0

Computer_1

Sport_0

Sport_1Walking_0

Walking_1

Travelling_0 Travelling_1Playing music_0

Playing music_1Collecting_0

Collecting_1

Volunteering_0

Volunteering_1

Mechanic_0

Mechanic_1

Gardening_0

Gardening_1

Knitting_0

Knitting_1

Cooking_0

Cooking_1

Fishing_0

Fishing_1

TV_0

TV_1

TV_2

TV_3

TV_4

2 / 44

Page 4: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Research activities

⇒ Missing values• PC with missing values• Use PC methods for single and multiple imputation

⇒ Complete case• Low rank matrix estimation (point estimates/ variability)• Understanding the connections between PC and models

⇒ Transfer• Selecting "tuning" parameters• R packages: FactoMineR, missMDA, denoiseR

3 / 44

Page 5: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 New practices in visualization with PC methods• Singular values shrinkage• Bootstrap framework

2 Inference with missing values using PC methods• Single and multiple imputation with PCA• Imputation with MCA

3 Models for PC methods• A multilogit bilinear model for MCA

4 / 44

Page 6: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

5 / 44

Page 7: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Low-rank matrix estimation

⇒ Model: X ∈ Rn×p ∼ L(µ) with E [X ] = µ of low-rank k

⇒ Gaussian noise model: Xn×p = µn×p + εn×p, εijiid∼N

(0, σ2

)Image, recommendation system...

⇒ Classical solution: truncated SVD µk =∑k

l=1 ul dl v>l

Hard thresholding: dl · 1(l ≤ k) = dl · 1(dl ≥ λ)

argminµ{‖X − µ‖22 : rank (µ) ≤ k

}⇒ Selecting k? Generalized cross-validation (Josse & Husson, 2012)

6 / 44

Page 8: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Low-rank matrix estimation

⇒ Model: X ∈ Rn×p ∼ L(µ) with E [X ] = µ of low-rank k

⇒ Gaussian noise model: Xn×p = µn×p + εn×p, εijiid∼N

(0, σ2

)Image, recommendation system...

⇒ Classical solution: truncated SVD µk =∑k

l=1 ul dl v>l

Hard thresholding: dl · 1(l ≤ k) = dl · 1(dl ≥ λ)

argminµ{‖X − µ‖22 : rank (µ) ≤ k

}⇒ Selecting k? Generalized cross-validation (Josse & Husson, 2012)

6 / 44

Page 9: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Low-rank matrix estimation

⇒ Model: X ∈ Rn×p ∼ L(µ) with E [X ] = µ of low-rank k

⇒ Gaussian noise model: Xn×p = µn×p + εn×p, εijiid∼N

(0, σ2

)Image, recommendation system...

⇒ Classical solution: truncated SVD µk =∑k

l=1 ul dl v>lHard thresholding: dl · 1(l ≤ k) = dl · 1(dl ≥ λ)

argminµ{‖X − µ‖22 : rank (µ) ≤ k

}⇒ Selecting k? Generalized cross-validation (Josse & Husson, 2012)

Not the best in MSE to recover µ!

6 / 44

Page 10: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Shrinking - thresholding SVD

Soft thresholding: ψ (dl) = dl max(1− λ

dl , 0)

µλ =min{n, p}∑

l=1ul ψ (dl) v>l

argminµ{‖X − µ‖22 + λ‖µ‖∗

}⇒ Selecting λ? Risk: MSE(λ) = E‖µ− µλ‖22Stein Unbiased Risk Estimate (Candès et al., 2012) (σ2 is known)

SURE = −npσ2 +min(n,p)∑

lmin(λ2, dl) + 2σ2div(µλ)

7 / 44

Page 11: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Shrinking - thresholding SVD

Soft thresholding: ψ (dl) = dl max(1− λ

dl , 0)

µλ =min{n, p}∑

l=1ul ψ (dl) v>l

argminµ{‖X − µ‖22 + λ‖µ‖∗

}

⇒ Selecting λ? Risk: MSE(λ) = E‖µ− µλ‖22

Stein Unbiased Risk Estimate (Candès et al., 2012) (σ2 is known)

SURE = −npσ2 + RSS + 2σ2div(µλ)

7 / 44

Page 12: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Shrinking - thresholding SVD

X = µ+ ε, with εij ∼ N(0, σ2

)⇒ Non linear shrinkage: µshrink =

∑min{n, p}l=1 ul ψ (dl) v>l

• Shabalin & Nobel (2013); Gavish & Donoho (2014). Asymptoticn = np and p → ∞, np/p → β, 0 < β ≤ 1

ψ(dl) = 1dl

√(d2

l − (β − 1)nσ2)2 − 4βnσ4 · 1(l ≥ (1 +

√β)nσ2

)• Verbank, Josse & Husson (2013). Asymptotic n, p fixed, σ → 0

ψ(dl) = dl(d2l − σ2

d2l

)·1(l ≤ k) signal variance

total variance

=(dl −

σ2

dl

)·1(l ≤ k)

⇒ Efron & Morris (1972) - empirical Bayes (Gaussian prior; PPCA)

8 / 44

Page 13: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Shrinking - thresholding SVD

X = µ+ ε, with εij ∼ N(0, σ2

)⇒ Non linear shrinkage: µshrink =

∑min{n, p}l=1 ul ψ (dl) v>l

• Shabalin & Nobel (2013); Gavish & Donoho (2014). Asymptoticn = np and p → ∞, np/p → β, 0 < β ≤ 1

ψ(dl) = 1dl

√(d2

l − (β − 1)nσ2)2 − 4βnσ4 · 1(l ≥ (1 +

√β)nσ2

)• Verbank, Josse & Husson (2013). Asymptotic n, p fixed, σ → 0

ψ(dl) = dl(d2l − σ2

d2l

)· 1(l ≤ k) =

(dl −

σ2

dl

)· 1(l ≤ k)

⇒ Efron & Morris (1972) - empirical Bayes (Gaussian prior; PPCA)

8 / 44

Page 14: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Shrinking - thresholding SVD

• Josse & Sardy (2014). Adaptive estimator. Finite sample.

ψ(dl) = dl max(1− λγ

dγl, 0)

argminµ{‖X − µ‖22 + λγ‖µ‖∗,w , with wl = 1/dγ−1l

}⇒ Selecting (λ, γ)? Data dependent:

SURE = −npσ2 +∑min(n,p)

l=1 d2l min(λ2γ

d2γl, 1)

+ 2σ2div(µτ,γ)

GSURE = RSS(1−div(µτ,γ)/(np))2 (σ2 unknown)

9 / 44

Page 15: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

SVD via an autoencoderGaussian model: Xn×p = µn×p + εn×p, with εij

iid∼N(0, σ2

)⇒ Classical formulation:• µk = argminµ

{‖X − µ‖22 : rank (µ) ≤ k

}⇒ Another formulation: autoencoder (Boulard & Kamp, 1988)

• µk = XBk , where Bk = argminB{‖X − XB‖22 : rank (B) ≤ k

}

TSVD: µk =∑k

l=1 ul dl v>l =∑k

l=1 fl v>l

Fn×k = XV (V ′V )−1

V ′k×p = (F ′F )−1F ′Xµk = FV ′

= XV (V ′V )−1V ′ = XPV

10 / 44

Page 16: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

SVD via an autoencoderGaussian model: Xn×p = µn×p + εn×p, with εij

iid∼N(0, σ2

)⇒ Classical formulation:• µk = argminµ

{‖X − µ‖22 : rank (µ) ≤ k

}⇒ Another formulation: autoencoder (Boulard & Kamp, 1988)

• µk = XBk , where Bk = argminB{‖X − XB‖22 : rank (B) ≤ k

}

TSVD: µk =∑k

l=1 ul dl v>l =∑k

l=1 fl v>l

Fn×k = XV (V ′V )−1

V ′k×p = (F ′F )−1F ′Xµk = FV ′

= XV (V ′V )−1V ′ = XPV

10 / 44

Page 17: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Parametric bootstrap

⇒ Autoencoder: compress X

µk = XBk , where Bk = argminB{‖X − XB‖22 : rank (B) ≤ k

}⇒ Aim: recover µ

µ∗k = XB∗k ,B∗k = argminB{EX∼L(µ)

[‖µ− XB‖22

]: rank (B) ≤ k

}

⇒ Parametric bootstrap: plug µ by X

µstablek = XBk , Bk = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]: rank (B) ≤ k

}Model X = µ+ ε ⇒ X = X + ε

11 / 44

Page 18: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Parametric bootstrap

⇒ Autoencoder: compress X

µk = XBk , where Bk = argminB{‖X − XB‖22 : rank (B) ≤ k

}⇒ Aim: recover µ

µ∗k = XB∗k ,B∗k = argminB{EX∼L(µ)

[‖µ− XB‖22

]: rank (B) ≤ k

}⇒ Parametric bootstrap: plug µ by X

µstablek = XBk , Bk = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]: rank (B) ≤ k

}Model X = µ+ ε ⇒ X = X + ε

11 / 44

Page 19: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Stable Autoencoder (Josse & Wager, 2016)⇒ Bootstrap = Feature noising

µstablek = XBk , Bk = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]: rkB ≤ k

}Bk = argminB

{Eε[‖X − (X + ε)B‖22

]: rkB ≤ k

}

Theorem⇒ Equivalent to a ridge autoencoder problem

µstableλ,k = XBλ, Bλ = argminB{‖X − XB‖22 + λ ‖B‖22

}: rkB ≤ k

⇒ Solution: singular-value shrinkage estimator

µstableλ,k = uldl

1 + λ/d2lv>l with λ = nσ2

µstableλ,k = XBλ, Bλ =(X>X + S

)−1X>X , S = diag(λ)

12 / 44

Page 20: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Stable Autoencoder (Josse & Wager, 2016)⇒ Bootstrap = Feature noising

µstablek = XBk , Bk = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]: rkB ≤ k

}Bk = argminB

{Eε[‖X − (X + ε)B‖22

]: rkB ≤ k

}Theorem⇒ Equivalent to a ridge autoencoder problem

µstableλ,k = XBλ, Bλ = argminB{‖X − XB‖22 + λ ‖B‖22

}: rkB ≤ k

⇒ Solution: singular-value shrinkage estimator

µstableλ,k = uldl

1 + λ/d2lv>l with λ = nσ2

µstableλ,k = XBλ, Bλ =(X>X + S

)−1X>X , S = diag(λ)

12 / 44

Page 21: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Stable Autoencoder (Josse & Wager, 2016)⇒ Bootstrap = Feature noising

µstablek = XBk , Bk = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]: rkB ≤ k

}Bk = argminB

{Eε[‖X − (X + ε)B‖22

]: rkB ≤ k

}Theorem⇒ Equivalent to a ridge autoencoder problem

µstableλ,k = XBλ, Bλ = argminB{‖X − XB‖22 + λ ‖B‖22

}: rkB ≤ k

⇒ Solution: singular-value shrinkage estimator

µstableλ,k =k∑

l=1ul

dl1 + λ/d2l

v>l with λ = nσ2

µstableλ,k = XBλ, Bλ =(X>X + S

)−1X>X , S = diag(λ)

12 / 44

Page 22: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Stable Autoencoder (Josse & Wager, 2016)⇒ Bootstrap = Feature noising

µstablek = XBk , Bk = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]: rkB ≤ k

}Bk = argminB

{Eε[‖X − (X + ε)B‖22

]: rkB ≤ k

}Theorem⇒ Equivalent to a ridge autoencoder problem

µstableλ,k = XBλ, Bλ = argminB{‖X − XB‖22 + λ ‖B‖22

}: rkB ≤ k

⇒ Solution: singular-value shrinkage estimator

µstableλ,k =k∑

l=1ul

dl1 + λ/d2l

v>l with λ = nσ2

µstableλ,k = XBλ, Bλ =(X>X + S

)−1X>X , S = diag(λ)

12 / 44

Page 23: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Feature noising and regularization in regression

Bishop (1995). Training with noise is equivalent to Tikhonovregularization. Neural computation.

β = argminβ{Eεij

iid∼N (0, σ2)

[‖Y − (X + ε)β‖22

]}

Many noisy copies of the data and average out the auxiliary noiseis equivalent to ridge regularization with λ = nσ2:

β(R)λ = argminβ

{‖Y − Xβ‖+ λ ‖β‖22

}

⇒ Control overfitting by artificially corrupting the training data

13 / 44

Page 24: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative Stable Autoencoder

µISA = XB B = argminB{EX∼L(µ)

[∥∥∥µ− XB∥∥∥22

]}Iterative algorithm. Init: µ = X , Bλ =

(X>X + S

)−1X>X

1 µ = XB2 B = (µ>µ+ S)−1µ>µ S = diag(nσ2)

14 / 44

Page 25: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative Stable Autoencoder

µISA = XB B = argminB{EX∼L(µ)

[∥∥∥µ− XB∥∥∥22

]}Iterative algorithm. Init: µ = X , Bλ =

(X>X + S

)−1X>X

1 µ = XB2 B = (µ>µ+ S)−1µ>µ S = diag(nσ2)

A centered parametric bootstrapµ∗k = XB∗k ,B∗k = argminB

{EX∼L(µ)

[‖µ− XB‖22

]: rkB ≤ k

}⇒ X as a proxy for µ

µstbk = XBk , Bk = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]: rkB ≤ k

}⇒ µ = XB as a proxy for µB = argminB

{EX∼L(µ)

[∥∥µ− XB∥∥22

]}14 / 44

Page 26: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative Stable Autoencoder

µISA = XB B = argminB{EX∼L(µ)

[∥∥∥µ− XB∥∥∥22

]}Iterative algorithm. Init: µ = X , Bλ =

(X>X + S

)−1X>X

1 µ = XB2 B = (µ>µ+ S)−1µ>µ S = diag(nσ2)

TheoremISA converges to a low rank solution

PropositionIn the isotropic Gaussian case ISA converges to

µISA =min{n, p}∑

l=1ul ψ (dl) v>l , ψ (d) =

{12

(d +√d2 − 4nσ2

)for d2 ≥ 4nσ2,

0 else.

⇒ Connection with optimal shrinkage under operator norm loss 14 / 44

Page 27: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Properties

X = µ+ ε

⇒ Different noise regimes of predilection:

• low noise: truncated SVD• moderate noise: Asympt, SA, ISA (nonlinear transformation)• high noise (SNR low, k large): Soft thresholding (collapses inother regimes)

⇒ Adaptive estimator is very flexible! (selection (λ, γ) with SURE)

⇒ ISA performs well in MSE (other than Gaussian noise)

15 / 44

Page 28: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Principal component analysis⇒ Do the genotypes have different performances in differentenvironments?µk =

∑kl=1 ul dl v

′l .

−3 −2 −1 0 1 2 3

−2

−1

01

23

Genotypes

Dim 1 (71.44%)

Dim

2 (

16.4

0%)

C_1C_2

C_3C_4

C_5

C_6 S_1

S_2

S_3

S_4S_5

S_6

S_7

S_8

S_F

S_M

−1.0 −0.5 0.0 0.5

−0.

50.

00.

5

Variables factor map (PCA)

Dim 1 (71.44%)

Dim

2 (

16.4

0%)

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

ALBACETEBADAJOZ

CORDOBA

CORUNA

LLEIDA

ORENSE

SALAMANCA

SEVILLA

TOLEDO

SEVILLA_2

Scores F = UD1/2 and variables coordinates VD1/2

⇒ Separates Complete (C) and Substituted (S)16 / 44

Page 29: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Exploratory multivariate analysis⇒ X = µ+ ε fixed effects: interest in rows (not i.i.d) and columns

⇒ µ better estimated with shrinked estimator µISA than µTSVDk

−3 −2 −1 0 1 2 3

−2

−1

01

23

Genotypes

Dim 1 (71.44%)

Dim

2 (

16.4

0%)

C_1C_2

C_3C_4

C_5

C_6 S_1

S_2

S_3

S_4S_5

S_6

S_7

S_8

S_F

S_M

−3 −2 −1 0 1 2 3

−2

−1

01

23

Genotypes ISA

Dim 1 (86.27%)

Dim

2 (

13.7

3%)

C_1C_2

C_3C_4

C_5

C_6 S_1

S_2

S_3

S_4S_5

S_6

S_7

S_8

S_FS_M

PCA (TSVD) Regularized PCA (ISA)

17 / 44

Page 30: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Exploratory multivariate analysis⇒ X = µ+ ε fixed effects: interest in rows (not i.i.d) and columns

⇒ µ better estimated with shrinked estimator µISA than µTSVDk

−3 −2 −1 0 1 2 3

−1

01

2

clustering

Dim 1 (71.44%)

Dim

2 (

16.4

0%)

C_2

C_4

C_6

C_5

C_3

C_1

S_M

S_1

S_F

S_7

S_5

S_3

S_6

S_8

S_2

S_4

cluster 1 cluster 2

cluster 3

cluster 4

cluster 1 cluster 2 cluster 3 cluster 4

−3 −2 −1 0 1 2 3

−2

−1

01

23

clustering ISA

Dim 1 (86.27%)

Dim

2 (

13.7

3%)

C_1C_2

C_3C_4

C_5

C_6 S_1

S_2

S_3

S_4S_5

S_6

S_7

S_8

S_FS_M

321

PCA (TSVD) Regularized PCA (ISA)

17 / 44

Page 31: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Discussion: low rank estimation• Bayesian SVD model (constraints): X = µ+ ε = UDV ′ + ε

Hoff, 2009: Uniform priors U, V special cases of von Mises-Fisherdistribution, Stiefel manifold.⇒ µ (shrunk) - uncertainty (µ1, ..., µB), also on k.

Future research:• Count data Xij ∼ P(µij) logµij = αi + βj +

∑kl=1 dluilvjl

argmind,u,v{−∑

i,j

[Xij∑k

l=1 dluilvj − exp(∑k

l=1 dluilvjl)]

+∑K

l=1 γldl}

⇒ Optimal / data driven choice of weights γl?⇒ Empirical Bayes: µij ∼ Gamma(α, θij); log θij =

∑kl=1 dluilvjl ;

von Mises-Fisher Uniform priors

• Bayesian ISA?• Bagging PCA to shrink - Adaptive with outliers

18 / 44

Page 32: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

19 / 44

Page 33: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Missing valuesare everywhere: unanswered questions in a survey, lostdata, damaged plants, machines that fail...

The best thing to do with missing values is not to haveany” Gertrude Mary Cox.

⇒ Still an issue in the "big data" area

Data integration: data from different sources20 / 44

Page 34: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Hospital data set7000 patients/ 250 variables

Centre Mecanisme Age Sexe Poids Taille BMI PAS PAD1 Beaujon Chute d’une hauteur 54 m 85 NR NR 180 1102 Lille Autre 33 m 80 1.8 24.69 130 623 Pitie Salpetriere AVP 2 roues motorise 26 m NR NR NR 131 624 Beaujon AVP 2 roues motorise 63 m 80 1.8 24.69 145 896 Pitie Salpetriere AVP bicyclette 33 m 75 NR NR 104 867 Pitie Salpetriere AVP pieton 30 m NR NR NR 107 669 HEGP Arme blanche 16 m 98 1.92 26.58 118 5410 Toulon AVP (voiture. camion. bus) 20 m NR NR NR 124 7311 Bicetre Chute d’une hauteur 61 m 84 1.7 29.07 144 10512 Bicetre AVP 2 roues motorise 20 m 80 1.85 23.37 124 6713 Bicetre AVP pieton 38 m 95 1.9 26.32 88 60...................

SpO2 Temperature Lactates Hb Glasgow ...........1 97 35.6 <NA> 12.7 122 100 36.5 4.8 11.1 153 100 36 3.9 11.4 34 100 36.7 1.66 13 156 100 36 NF 14.4 157 100 36.6 NF 14.3 159 100 37.5 13 15.9 1510 100 36.9 NF 13.7 1511 100 36.6 1.2 14.2 1412 100 37.2 NF 14.7 1513 99 33.6 3.5 12.3 13............

21 / 44

Page 35: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Recommended methods

⇒ Modify the method, the estimation process to deal with missingvalues. Maximum likelihood: EM algorithm to obtain pointestimates (+ other algorithms for their variability)

One specific algorithm for each statistical method...

⇒ Imputation (multiple) to get a completed data set on which youcan perform any statistical method

22 / 44

Page 36: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Missing values in PCA

⇒ PCA: least squares

argminµ{‖X − µ‖22 : rank (µ) ≤ k

}⇒ PCA with missing values: weighted least squares

argminµ{‖Wn×p ∗ (X − µ)‖22 : rank (µ) ≤ k

}with Wij = 0 if Xij is missing, Wij = 1 otherwise

Many algorithms:Gabriel & Zamir, 1979: weighted alternating least squares (withoutexplicit imputation)Kiers, 1997: iterative PCA (with imputation)

23 / 44

Page 37: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA

1 initialization ` = 0: X 0 (mean imputation)

2 step `:(a) PCA on the completed data → (U`,D`,V `); k dim kept(b) (µ`)k = U`D`V `′ X ` = W ∗ X + (1−W ) ∗ µ`

3 steps of estimation and imputation are repeated

⇒ µ from incomplete data: EM algorithmX = µ+ ε, εij

iid∼N(0, σ2

)with µ of low rank

⇒ Completed data: good imputation (matrix completion, Netflix)Reduction of variability (imputation by UDV ′)

Selecting k? Generalized cross-validation (Josse & Husson, 2012)

24 / 44

Page 38: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA

1 initialization ` = 0: X 0 (mean imputation)

2 step `:(a) PCA on the completed data → (U`,D`,V `); k dim kept(b) (µ`)k = U`D`V `′ X ` = W ∗ X + (1−W ) ∗ µ`

3 steps of estimation and imputation are repeated

⇒ µ from incomplete data: EM algorithmX = µ+ ε, εij

iid∼N(0, σ2

)with µ of low rank

⇒ Completed data: good imputation (matrix completion, Netflix)

Reduction of variability (imputation by UDV ′)

Selecting k? Generalized cross-validation (Josse & Husson, 2012)

24 / 44

Page 39: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA

1 initialization ` = 0: X 0 (mean imputation)

2 step `:(a) PCA on the completed data → (U`,D`,V `); k dim kept(b) (µ`)k = U`D`V `′ X ` = W ∗ X + (1−W ) ∗ µ`

3 steps of estimation and imputation are repeated

⇒ µ from incomplete data: EM algorithmX = µ+ ε, εij

iid∼N(0, σ2

)with µ of low rank

⇒ Completed data: good imputation (matrix completion, Netflix)Reduction of variability (imputation by UDV ′)

Selecting k? Generalized cross-validation (Josse & Husson, 2012)

24 / 44

Page 40: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Soft thresholding iterative SVD

⇒ Overfitting issues of iterative PCA: many parameters (Un×k ,Vk×p)/observed values (k large - many NA); noisy data

⇒ Regularized versions. Init - estimation - imputation steps:

the imputation: µPCAij =∑k

l=1 dsulv>l is replaced by

a "shrunk" imputation (µSoftij )λ =∑min(n,p)

l=1 (dl − λ)+ulv>l

argminµ{‖W ∗ (X − µ)‖22 + λ‖µ‖∗

}

SoftImpute for large matrices. T. Hastie, R. Mazumber, 2015

25 / 44

Page 41: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Selecting regularization parameters

(µSoftij )λ =∑min(n,p)

l=1 (dl − λ)+ulv>l ⇒ Selecting λ?

Complete: X = µ+ ε εijiid∼N

(0, σ2

)MSE(λ) = E‖µ− µλ‖2

Stein Unbiased Risk Estimate (Candès et al., 2012) (σ2 known)SURE = −npσ2 + RSS + 2σ2div(µλ)

Incomplete:

SUREmiss = −(np − |NA|)σ2 +∑ij∈obs

(Xij − (µij)missλ )2 + 2σ2divmiss(µmiss

λ )

divmiss(µmissλ ) =

∑ij∈obs

µmissλ (Xij + δ)− µmiss

λ (Xij)δ

⇒ GSURE - Iterative Adaptive (the best for imputing) - Iterative ISA...

26 / 44

Page 42: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Selecting regularization parameters

(µSoftij )λ =∑min(n,p)

l=1 (dl − λ)+ulv>l ⇒ Selecting λ?

Complete: X = µ+ ε εijiid∼N

(0, σ2

)MSE(λ) = E‖µ− µλ‖2

Stein Unbiased Risk Estimate (Candès et al., 2012) (σ2 known)SURE = −npσ2 + RSS + 2σ2div(µλ)

Incomplete:

SUREmiss = −(np − |NA|)σ2 +∑ij∈obs

(Xij − (µij)missλ )2 + 2σ2divmiss(µmiss

λ )

divmiss(µmissλ ) =

∑ij∈obs

µmissλ (Xij + δ)− µmiss

λ (Xij)δ

⇒ GSURE - Iterative Adaptive (the best for imputing) - Iterative ISA...

26 / 44

Page 43: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Selecting regularization parameters

(µSoftij )λ =∑min(n,p)

l=1 (dl − λ)+ulv>l ⇒ Selecting λ?

Complete: X = µ+ ε εijiid∼N

(0, σ2

)MSE(λ) = E‖µ− µλ‖2

Stein Unbiased Risk Estimate (Candès et al., 2012) (σ2 known)SURE = −npσ2 + RSS + 2σ2div(µλ)

Incomplete:

SUREmiss = −(np − |NA|)σ2 +∑ij∈obs

(Xij − (µij)missλ )2 + 2σ2divmiss(µmiss

λ )

divmiss(µmissλ ) =

∑ij∈obs

µmissλ (Xij + δ)− µmiss

λ (Xij)δ

⇒ GSURE - Iterative Adaptive (the best for imputing) - Iterative ISA...

26 / 44

Page 44: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Hospital data set7000 patients/ 250 variables

Centre Mecanisme Age Sexe Poids Taille BMI PAS PAD1 Beaujon Chute d’une hauteur 54 m 85 NR NR 180 1102 Lille Autre 33 m 80 1.8 24.69 130 623 Pitie Salpetriere AVP 2 roues motorise 26 m NR NR NR 131 624 Beaujon AVP 2 roues motorise 63 m 80 1.8 24.69 145 896 Pitie Salpetriere AVP bicyclette 33 m 75 NR NR 104 867 Pitie Salpetriere AVP pieton 30 m NR NR NR 107 669 HEGP Arme blanche 16 m 98 1.92 26.58 118 5410 Toulon AVP (voiture. camion. bus) 20 m NR NR NR 124 7311 Bicetre Chute d’une hauteur 61 m 84 1.7 29.07 144 10512 Bicetre AVP 2 roues motorise 20 m 80 1.85 23.37 124 6713 Bicetre AVP pieton 38 m 95 1.9 26.32 88 60...................

SpO2 Temperature Lactates Hb Glasgow ...........1 97 35.6 <NA> 12.7 122 100 36.5 4.8 11.1 153 100 36 3.9 11.4 34 100 36.7 1.66 13 156 100 36 NF 14.4 157 100 36.6 NF 14.3 159 100 37.5 13 15.9 1510 100 36.9 NF 13.7 1511 100 36.6 1.2 14.2 1412 100 37.2 NF 14.7 1513 99 33.6 3.5 12.3 13............

27 / 44

Page 45: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Filled-in hospital data7000 patients/ 250 variables

Centre Mecanisme Age Sexe Poids Taille BMI PAS PAD1 Beaujon Chute d’une hauteur 54 m 85.00 1.84 27.04 83 132 Lille Autre 33 m 80.00 1.80 24.69 33 983 Pitie Salpetriere AVP 2 roues motorise 26 m 81.78 1.85 24.33 34 984 Beaujon AVP 2 roues motorise 63 m 80.00 1.80 24.69 48 1256 Pitie Salpetriere AVP bicyclette 33 m 75.00 1.83 24.53 6 1227 Pitie Salpetriere AVP pieton 30 m 81.89 1.82 25.24 9 1029 HEGP Arme blanche 16 m 98.00 1.92 26.58 21 9010 Toulon AVP (voiture, camion, bus) 20 m 81.68 1.82 25.05 27 10911 Bicetre Chute d’une hauteur 61 m 84.00 1.70 29.07 47 812 Bicetre AVP 2 roues motorise 20 m 80.00 1.85 23.37 27 10313 Bicetre AVP pieton 38 m 95.00 1.90 26.32 179 96

SpO2 Temperature Lactates Hb Glasgow............1 46 61 289.07 33 142 2 72 464.00 16 143 2 65 416.00 19 74 2 74 130.00 36 66 2 65 285.91 50 67 2 73 244.99 49 69 2 83 196.00 65 610 2 76 262.44 43 611 2 73 84.00 48 512 2 79 276.17 53 613 48 39 388.00 29 14

Imputation based on forests (Stekhoven, Buhlmann, 2011) cannot extrapolatebut handle nonlinear relationships.

28 / 44

Page 46: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multiple imputation (Rubin, 1987)Single imputation: underestimation of standard errors⇒ a single value can’t reflect the uncertainty of prediction

1 Generating B imputed data sets

(F U ′)ik

(different plausible values for each missing entry. Ex: one cell:predict 4.88, empirical interval [3.98 ; 5.89])

2 Performing the analysis on each imputed data βb, Var(βb)

3 Combining: β = 1B∑B

b=1 βb

T = 1B∑B

b=1 Var(βb)

+ 1B−1

∑Bb=1

(βb − β

)2

29 / 44

Page 47: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multiple imputation (Rubin, 1987)Single imputation: underestimation of standard errors⇒ a single value can’t reflect the uncertainty of prediction

1 Generating B imputed data sets(F U ′)ik

(different plausible values for each missing entry. Ex: one cell:predict 4.88, empirical interval [3.98 ; 5.89])

2 Performing the analysis on each imputed data βb, Var(βb)

3 Combining: β = 1B∑B

b=1 βb

T = 1B∑B

b=1 Var(βb)

+ 1B−1

∑Bb=1

(βb − β

)2

29 / 44

Page 48: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multiple imputation (Rubin, 1987)Single imputation: underestimation of standard errors⇒ a single value can’t reflect the uncertainty of prediction

1 Generating B imputed data sets(F U ′)ik

(different plausible values for each missing entry. Ex: one cell:predict 4.88, empirical interval [3.98 ; 5.89])

2 Performing the analysis on each imputed data βb, Var(βb)

3 Combining: β = 1B∑B

b=1 βb

T = 1B∑B

b=1 Var(βb)

+ 1B−1

∑Bb=1

(βb − β

)229 / 44

Page 49: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

PCA multiple imputation (proper)Model X = µ+ ε εij

iid∼N(0, σ2

)1 Variability of parameters : µ1 = U1D1V ′1, .., µB = UBDBV B′

⇒ bootstrap/bayesian (data augmentation)

2 Noise: b = 1, ...,B missing Xbij drawn from N ((µij)b, (σ2)b)

(F U ′)ik (F U ′)1ik+ ε1

ik (F U ′)2ik+ ε2

ik(F U ′)3

ik+ ε3

ik(F U ′)B

ik+ εB

ik

Husson, Josse, Wager (2014). Complete case: variability of PCAestimates: asymptotic, residuals bootstrap, a cell-wise jackknifeAudigier, Husson & Josse (2015). Multiple imputation for continuousvariables using Bayesian PCA.

30 / 44

Page 50: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

PCA multiple imputation (proper)Model X = µ+ ε εij

iid∼N(0, σ2

)1 Variability of parameters: µ1 = U1D1V ′1, .., µB = UBDBV B′

⇒ bootstrap/bayesian (data augmentation)

2 Noise: b = 1, ...,B missing Xbij drawn from N ((µij)b, (σ2)b)

(F U ′)ik (F U ′)1ik+ ε1

ik (F U ′)2ik+ ε2

ik(F U ′)3

ik+ ε3

ik(F U ′)B

ik+ εB

ik

Husson, Josse, Wager (2014). Complete case: variability of PCAestimates: asymptotic, residuals bootstrap, a cell-wise jackknifeAudigier, Husson & Josse (2015). Multiple imputation for continuousvariables using Bayesian PCA.

30 / 44

Page 51: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Supplementary projectionImputed by Regularized PCA/ B imputed data sets from MIPCASame observed values (blue)/ different predictions for missing values

Supplementary projectionPCA

Regularized iterative PCA⇒ reference configuration

⇒ Individuals’ position (and variables) with other predictions31 / 44

Page 52: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Supplementary projectionImputed by Regularized PCA/ B imputed data sets from MIPCASame observed values (blue)/ different predictions for missing values

Supplementary projectionPCA

Regularized iterative PCA⇒ reference configuration

⇒ Individuals’ position (and variables) with other predictions31 / 44

Page 53: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Supplementary projectionImputed by Regularized PCA/ B imputed data sets from MIPCASame observed values (blue)/ different predictions for missing values

Supplementary projectionPCA

Regularized iterative PCA⇒ reference configuration

⇒ Individuals’ position (and variables) with other predictions31 / 44

Page 54: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multiple imputation: visualization of variance of prediction

-5 0 5

-8-6

-4-2

02

4

Supplementary projection

Dim 1 (43.53%)

Dim

2 (2

6.27

%)

S Michaud

S Renaudie S Trotignon

S Buisse Domaine

S Buisse Cristal

V Aub Silex

V Aub Marigny

V Font Domaine

V Font Brules

V Font Coteaux

-1.0 -0.5 0.0 0.5 1.0

-1.0

-0.5

0.0

0.5

1.0

Variable representation

Dim 1 (43.53%)

Dim

2 (2

6.27

%)

Odor.Intensity.before.shakingOdor.Intensity.after.shakingExpression

O.fruity

O.passion

O.citrus

O.candied.fruit

O.vanillaO.wooded

O.mushroomO.planteO.flower

O.alcohol

Typicity

Attack.intensity

Sweetness

AcidityBitterness

AstringencyFreshness

OxidationSmoothness

Aroma.intensityAroma.persistency

Visual.intensityGrade

Surface.feeling

⇒ Does single imputation make sense?32 / 44

Page 55: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Properties⇒ Joint model: Xi . ∼ N (µ,Σ)

1 Variability of parameters. Bootstrap rows: X 1, ... , XB

EM algorithm: (µ1, Σ1), ... , (µB, ΣB)2 Noise. imputation: Xb

ij drawn from N(µb, Σb

)⇒ Conditional modeling: one model/variable• Imputation model: B multiple imputed sets with PCA, JM, CM

(F U ′)ik

• Analysis model: θb, Var(θb)- combine Rubin’s rules: θ, T

⇒ Good estimates of θ (bias) and coverage ≈ 0.95 (CI width OK):variability due to missing values is taken into account⇒ PCA: small - large n/p; strong - weak relation; low-high % NA

33 / 44

Page 56: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multiple Correspondence Analysis (MCA)Xn×m coded with indicator matrix A

1 0 . . . 1 0 01 0 . . . 1 0 01 0 . . . 1 0 00 1 . . . 0 1 0

A =

0 1 . . . 0 0 10 1 . . . 0 1 0

p1 0Dp =

. . .

0 pJ

A SVD on weighted matrix: Z = 1√mn (A− 1pT )D−1/2p = UDV ′

The PC (F = UD) satisfies: Fl = argmaxFl∈Rn1m∑m

j=1 η2(Fl ,Xj)

Benzecri, 1973 :"All in all, doing a data analysis, in good mathematics, is simplyfinding eigenvectors; all the science (or the art) of it is just to find the right matrix todiagonalize"

34 / 44

Page 57: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

35 / 44

Page 58: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)

• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

35 / 44

Page 59: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V

2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

35 / 44

Page 60: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k

3 column margins are updated

35 / 44

Page 61: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

35 / 44

Page 62: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

V1 V2 V3 … V14 V1_a V1_b V1_c V2_e V2_f V3_g V3_h …ind 1 a NA g … u ind 1 1 0 0 0.71 0.29 1 0 …ind 2 NA f g u ind 2 0.12 0.29 0.59 0 1 1 0 …ind 3 a e h v ind 3 1 0 0 1 0 0 1 …ind 4 a e h v ind 4 1 0 0 1 0 0 1 …ind 5 b f h u ind 5 0 1 0 0 1 0 1 …ind 6 c f h u ind 6 0 0 1 0 1 0 1 …ind 7 c f NA v ind 7 0 0 1 0 1 0.37 0.63 …

… … … … … … … … … … … … … …ind 1232 c f h v ind 1232 0 0 1 0 1 0 1 …

⇒ the imputed values can be seen as degree of membership

35 / 44

Page 63: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

Two ways to impute categories: majority or draw

35 / 44

Page 64: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

Two ways to impute categories: majority or drawInference with missing values (ex logistic regression)⇒ Multiple imputation with MCA (Audigier, Husson & Josse, 2016)

35 / 44

Page 65: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Take home message

Principal component methods are powerful for single and multipleimputation with continuous, categorical variables: require a smallnumber of parameters and capture the similarities between rowsand relationship between variables.

From a practical point of view:• very quick (1 min for MIMCA and 1h30 for forests on a datawith 6000 rows/14 variables)

• can be applied on data of various size (many categories)• provide correct inferences for analysis model based onrelationships between pairs of variables

• require to select tuning parameters with missing values (k, λ)

36 / 44

Page 66: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

37 / 44

Page 67: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multilogit-bilinear model

P(xij = c) = πijc = eθij (c)∑Cjc′=1 eθij (c

′),

θij(c) = βj(c) + Γji (c) = βj(c) +k∑

l=1dluilvjl(c)

vj(c) = (√d1vj1(c),

√d2vj2(c)):

question j , category c with coor-dinates one point/ Cj categories.The latent variables ui = D1/2

2 ui

P(xij = c) ∝exp

{βj(c)− 1

2‖vj(c)− ui‖2}

Latent Space

Dimension 1

Dim

ensi

on 2

vj(1)

vj(2)

vj(3)

vj(4)

u1

u2

⇒For variable j , individual 1 is more likely to choose response 3 or4 than individual 2 is.

38 / 44

Page 68: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Relationship with MCAData: Xn×m - A =

[A1| · · · |Am], Aj ∈ {0, 1}n×Cj

Model: πijc = eβj (c)+Γji (c)∑Cjc′=1 e

βj (c′)+Γji (c′)Param: ζ =

( βvec(Γ)

); ζ0 =

(log(p)0)

⇒ Rationale: Taylor expand ` around the independence model ζ0:

˜(β, Γ) = `(ζ0) + `′(ζ0)T (ζ − ζ0) + 12(ζ − ζ0)T `′′(ζ0)(ζ − ζ0)

˜(β, Γ) a quadratic function of its arguments, then maximizing thelatter amounts to a generalized SVD ⇒ MCA.

⇒ The joint likelihood is∏n

i=1∏m

j=1∏Cj

c=1 πAjic

ijc (independence)⇒ The log-likelihood for the MultiLogit Bilinear model is:

` =∑

i ,j,c Ajic log(πijk) =

∑i ,j,c A

jic log

(exp(βj (c)+Γj

i (c))∑Cjc′=1 exp(βj (c′)+Γj

i (c′))

)

39 / 44

Page 69: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Relationship with MCAData: Xn×m - A =

[A1| · · · |Am], Aj ∈ {0, 1}n×Cj

Model: πijc = eβj (c)+Γji (c)∑Cjc′=1 e

βj (c′)+Γji (c′)Param: ζ =

( βvec(Γ)

); ζ0 =

(log(p)0)

⇒ Rationale: Taylor expand ` around the independence model ζ0:

˜(β, Γ) = `(ζ0) + `′(ζ0)T (ζ − ζ0) + 12(ζ − ζ0)T `′′(ζ0)(ζ − ζ0)

˜(β, Γ) a quadratic function of its arguments, then maximizing thelatter amounts to a generalized SVD ⇒ MCA.

⇒ The joint likelihood is∏n

i=1∏m

j=1∏Cj

c=1 πAjic

ijc (independence)⇒ The log-likelihood for the MultiLogit Bilinear model is:

` =∑

i ,j,c Ajic log(πijk) =

∑i ,j,c A

jic log

(exp(βj (c)+Γj

i (c))∑Cjc′=1 exp(βj (c′)+Γj

i (c′))

)

39 / 44

Page 70: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Relationship with MCAData: Xn×m - A =

[A1| · · · |Am], Aj ∈ {0, 1}n×Cj

Model: πijc = eβj (c)+Γji (c)∑Cjc′=1 e

βj (c′)+Γji (c′)Param: ζ =

( βvec(Γ)

); ζ0 =

(log(p)0)

⇒ Rationale: Taylor expand ` around the independence model ζ0:

˜(β, Γ) = `(ζ0) + `′(ζ0)T (ζ − ζ0) + 12(ζ − ζ0)T `′′(ζ0)(ζ − ζ0)

˜(β, Γ) a quadratic function of its arguments, then maximizing thelatter amounts to a generalized SVD ⇒ MCA.

⇒ The joint likelihood is∏n

i=1∏m

j=1∏Cj

c=1 πAjic

ijc (independence)⇒ The log-likelihood for the MultiLogit Bilinear model is:

` =∑

i ,j,c Ajic log(πijk) =

∑i ,j,c A

jic log

(exp(βj (c)+Γj

i (c))∑Cjc′=1 exp(βj (c′)+Γj

i (c′))

)

39 / 44

Page 71: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Relationship with MCA`(β, Γ;A) = βj(Aj

i ) + Γji (Aji )− log

(∑Cjc=1 eβ

j (c)+Γji (c))

Compute ∂`

∂Γji (c)

and ∂`

∂Γji (c)∂Γj′

i′ (c′), assess at (β0 = log(p), 0) to get:

˜(β, Γ) ≈ 〈Γ,A− 1pT 〉 − 12‖ΓD

1/2p ‖2F

Solution: rank k SVD of (A− 1pT )D−1/2p ⇒ SVD in MCA.

Theorem (Fithian & Josse, 2016): The one-step likelihood estimate forthe MultiLogit Bilinear model with rank constraint k, obtained byexpanding around the independence model (β0 = log p, Γ0 = 0), is(β0, ΓMCA).

Lemma: Let G ∈ Rn×n,H1 ∈ Rn×n,H2 ∈ Rm×m, with H1,H2 � 0.argmaxΓ: rank(Γ)≤K 〈Γ,G〉 − 1

2‖H1ΓH2‖2FΓ∗ = H−11

[SVDK (H−11 GH−12 )

]H−12

40 / 44

Page 72: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Maximizing the likelihood

Data: A = [A1, . . . ,Am]

Model: πijc = exp(θij(c))∑Cjc′=1 exp(θij(c′))

θij(c) = βj(c) +∑k

l=1 dluilvjl(c)

Estimation: MLE but overfitting problems θij(c) →∞Penalized likelihood: −

∑ni=1

∑mj=1

∑Cjc=1 A

jic log(πijc) + λ

∑k?l=1 dl

Algorithm: MM with quadratic majorization (Groenen & Josse, 2016)

fij(θi) = −∑Cj

c=1 Ajic log(πijc) = −

∑Cjc=1 A

jic log

(exp(θij(c))∑Cj

c′=1 exp(θij(c′))

)With ∇fij(θi) = Aij − πij and H = ∇2fij(θi) = (Diag(πij)− πijπ′ij)

41 / 44

Page 73: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Majorization function

Theoremgij(θi , θ(0)

i ) = fij(θ(0)i ) + (θi − θ(0)

i )′∇fij(θ(0)i ) + 1/4‖θi − θ(0)

i ‖2 is amajorizing function of fij(θi) (using De Leeuw, 2005)12 I−∇

2fij(θi) is positive semi-definite (largest eigenvalue of∇2fij(θi) is smaller than 1/2)

H =

π1(1− π1) −π1π2 −π1π3 ...−π1π2 π2(1− π2) −π2π3 ...... ... π3(1− π3) ...... ... .... ...

Gerschgorin disks: the eigenvalue φ is always smaller than a diagonal element plus thesum of its absolute off-diagonal row (or col) valuesφ ≤ πijc − π2ijc + πijc

∑6=c πij`

φ ≤ πijc − π2ijc + πijc∑Cj

`=1 πij` − π2ijc

φ ≤ 2(πijc − π2ijc) = 2πijc(1− πijc).2πijc(1− πijc) reaches its maximum of 1/2 at πijc = 1/2

42 / 44

Page 74: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Majorization function

Theoremgij(θi , θ(0)

i ) = fij(θ(0)i ) + (θi − θ(0)

i )′∇fij(θ(0)i ) + 1/4‖θi − θ(0)

i ‖2 is amajorizing function of fij(θi) (using De Leeuw, 2005)12 I−∇

2fij(θi) is positive semi-definite (largest eigenvalue of∇2fij(θi) is smaller than 1/2)

Update parameters θij(c) = βj(c) +∑k

l=1 dkuilvjl(c)

L(β,U,D,V ) ≤ 14∑i ,j,c

Ajic(zijc − θij(c))2 + λ

( k?∑l=1

dl

)+ c

• Update U and V : SVD of Z = PΦQ′, U = P, V = Q′

• Update D:∑k?

l=1[(φl − dl)2 + λdl ] dl = max(0, φl − λ)

42 / 44

Page 75: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Results

• Low interaction: MCA and MLE agree• Strong signal: MCA less appropriate?• MCA estimates better than MLE in difficult settings! (small n,m, noisy): overfitting! penalized estimate improves on MCA

⇒ MCA a proxy to estimate the model’s parameters

• Initialization of the algorithms by PC methods like MCA

• Regularized MCA: influenced by the way the problem is written

• Select k with model based criteria (asymptotic picture unclear)

• Solving log-linear models by iterative CAlog(θijk) = λ0 + λAi + λBj + λCS + λABij + λACik + λBCjk + ...

43 / 44

Page 76: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Results

• Low interaction: MCA and MLE agree• Strong signal: MCA less appropriate?• MCA estimates better than MLE in difficult settings! (small n,m, noisy): overfitting! penalized estimate improves on MCA

⇒ MCA a proxy to estimate the model’s parameters

• Initialization of the algorithms by PC methods like MCA

• Regularized MCA: influenced by the way the problem is written

• Select k with model based criteria (asymptotic picture unclear)

• Solving log-linear models by iterative CAlog(θijk) = λ0 + λAi + λBj + λCS + λABij + λACik + λBCjk + ...

43 / 44

Page 77: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Perspectives

• Multiple imputation for mixed data:- uncertainty of the parameters: PCA residuals bootstrap,MCA nonparametric bootstrap

- extend the multilogit model

• Multiple imputation with "big data":- mixture (PCA or MCA) or biclustering to impute- theory of MI: asymptotic n << p

• Distributed computation with missing values (medical databases)

44 / 44

Page 78: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

45 / 44

Page 79: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Estimation of σ2

⇒ Number of independent parameters

σ2 = RSSddl =

n∑p

l=k+1 dlnp − p − nk − pk + k2 + k (Xn×p;Fn×k ;Vp×k)

⇒ Trace of the PCA projection matrix2 projection matrices: ‖Xn×p − Fn×kV ′k×p‖2{

V ′ = (F ′F )−1F ′X ⇒ PF = F (F ′F )−1F ′F = XV (V ′V )−1 ⇒ PV = V (V ′V )−1V ′

µk = FV ′ = XPV = PFX

vec(µ) = Pvec(X ) Pnp×np = (P ′V ⊗ In) + (I′p ⊗ PF )− (P ′V ⊗ PF )

Pazman & Denis, 2002; Candes & Tao, 200946 / 44

Page 80: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Cross-validation to select S?

??

??

??

??

?

??

??

?

?

??

?

?⇒ EM-CV (Bro et al. 2008)⇒ MSEPS = 1

np∑n

i=1∑p

j=1(Xij − (µSij )−ij)2⇒ Computational costly

⇒ In regression y = Py (Craven & Whaba, 1979)

y−ii − yi = yi − yi1− Pi ,i

⇒ Aim: write PCA as µ(S) = PX

(µSij )−ij − xij '(µSij )− Xij

1− Pij,ij

47 / 44

Page 81: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Cross-validation to select S?

??

??

??

??

?

??

??

?

?

??

?

?⇒ EM-CV (Bro et al. 2008)⇒ MSEPS = 1

np∑n

i=1∑p

j=1(Xij − (µSij )−ij)2⇒ Computational costly

⇒ In regression y = Py (Craven & Whaba, 1979)

y−ii − yi = yi − yi1− Pi ,i

⇒ Aim: write PCA as µ(S) = PX

(µSij )−ij − xij '(µSij )− Xij

1− Pij,ij

47 / 44

Page 82: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Cross-validation to select S?

??

??

??

??

?

??

??

?

?

??

?

?⇒ EM-CV (Bro et al. 2008)⇒ MSEPS = 1

np∑n

i=1∑p

j=1(Xij − (µSij )−ij)2⇒ Computational costly

⇒ In regression y = Py (Craven & Whaba, 1979)

y−ii − yi = yi − yi1− Pi ,i

⇒ Aim: write PCA as µ(S) = PX

(µSij )−ij − xij '(µSij )− Xij

1− Pij,ij

47 / 44

Page 83: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Cross-validation to select S?

??

??

??

??

?

??

??

?

?

??

?

?⇒ EM-CV (Bro et al. 2008)⇒ MSEPS = 1

np∑n

i=1∑p

j=1(Xij − (µSij )−ij)2⇒ Computational costly

⇒ In regression y = Py (Craven & Whaba, 1979)

y−ii − yi = yi − yi1− Pi ,i

⇒ Aim: write PCA as µ(S) = PX

(µSij )−ij − xij '(µSij )− Xij

1− Pij,ij

47 / 44

Page 84: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Cross-validation to select S?

??

??

??

??

?

??

??

?

?

??

?

?⇒ EM-CV (Bro et al. 2008)⇒ MSEPS = 1

np∑n

i=1∑p

j=1(Xij − (µSij )−ij)2⇒ Computational costly

⇒ In regression y = Py (Craven & Whaba, 1979)

y−ii − yi = yi − yi1− Pi ,i

⇒ Aim: write PCA as µ(S) = PX

(µSij )−ij − xij '(µSij )− Xij

1− Pij,ij

47 / 44

Page 85: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

PCA as a smoother2 projection matrices: ‖Xn×p − Fn×SV ′S×p‖22{

V ′ = (F ′F )−1F ′X ⇒ PF = F (F ′F )−1F ′F = XV (V ′V )−1 ⇒ PV = V (V ′V )−1V ′

µS = FV ′ = XPV = PFX

vec(µ(S)) = P(S)vec(X ) P(S)np×np = (P ′V ⊗ In) + (I′p ⊗ PF )− (P ′V ⊗ PF )

Pazman & Denis, 2002; Candes & Tao, 2009

⇒ Number of independent parameters:

σ2 = RSStr(Inp − P(S)) =

n∑min(n,p)

s=S+1 λs

np − (nS + pS − S2)

48 / 44

Page 86: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Cross-validation approximations

> nb <- estim_ncp(don)> nb$criterion

0 1 2 3 4 51.2884873 0.8069719 0.6400517 0.7045074 2.2257738 3.0274337

0 1 2 3 4 5 6

0.4

0.6

0.8

1.0

Number of components

Err

or

● ● ●●

●●

●●

CVGCVSmooth

CVS = 1np∑i ,j

(Xij − (µSij )−ij

)2

ACVS = 1np∑i ,j

(Xij − (µSij )1− Pij,ij

)2

GCV NAS = np‖W ∗ (X − µS))‖22(np − |NA| − (nS + pS − S2))2

Josse, J. & Husson, F. Selecting the number of components in PCA using cross-validation approximations.Computational Statististics and Data Analysis.

49 / 44

Page 87: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Cross-validation approximations> nb <- estim_ncp(don)> nb$criterion

0 1 2 3 4 51.2884873 0.8069719 0.6400517 0.7045074 2.2257738 3.0274337

0 1 2 3 4 5 6

0.4

0.6

0.8

1.0

Number of components

Err

or

● ● ●●

●●

●●

CVGCVSmooth

CVS = 1np∑i ,j

(Xij − (µSij )−ij

)2

ACVS = 1np∑i ,j

(Xij − (µSij )1− Pij,ij

)2

GCVS = 1np ×

∑i ,j(Xij − µSij ))2

(1− tr(P(S))/np)2

GCV NAS = np‖W ∗ (X − µS))‖22(np − |NA| − (nS + pS − S2))2

Josse, J. & Husson, F. Selecting the number of components in PCA using cross-validation approximations.Computational Statististics and Data Analysis.

49 / 44

Page 88: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Cross-validation approximations> nb <- estim_ncp(don)> nb$criterion

0 1 2 3 4 51.2884873 0.8069719 0.6400517 0.7045074 2.2257738 3.0274337

0 1 2 3 4 5 6

0.4

0.6

0.8

1.0

Number of components

Err

or

● ● ●●

●●

●●

CVGCVSmooth

CVS = 1np∑i ,j

(Xij − (µSij )−ij

)2

ACVS = 1np∑i ,j

(Xij − (µSij )1− Pij,ij

)2

GCVS =np∑

i ,j(Xij − µSij ))2

(np − tr(P(S)))2

GCV NAS = np‖W ∗ (X − µS))‖22(np − |NA| − (nS + pS − S2))2

Josse, J. & Husson, F. Selecting the number of components in PCA using cross-validation approximations.Computational Statististics and Data Analysis.

49 / 44

Page 89: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

50 / 44

Page 90: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Inference in PCA

−3 −2 −1 0 1 2 3

−2

−1

01

23

Genotypes

Dim 1 (71.44%)

Dim

2 (

16.4

0%)

C_1C_2

C_3C_4

C_5

C_6 S_1

S_2

S_3

S_4S_5

S_6

S_7

S_8

S_F

S_M

Josse, Husson & Wager (2015)51 / 44

Page 91: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Inference in PCA

−3 −2 −1 0 1 2 3

−3

−2

−1

01

2

Bootstrap

Dim 1 (71.44%)

Dim

2 (

16.4

0%)

C_1C_2

C_3C_4

C_5

C_6 S_1

S_2

S_3

S_4S_5

S_6

S_7

S_8

S_FS_M

Josse, Husson & Wager (2015) 51 / 44

Page 92: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Confidence ellipsesFixed-effects model Xn×p = µ+ ε εij

iid∼N(0, σ2

)with µ of rk k

MLE = LS µk = Un×kDk×kV′k×p.

Rq: Population data. (PPCA: random effects)

⇒ Asymptotic σ2 → 0

(Denis, Gower, 1994; Papadopoulo & Louraki 2000)

E(µkij) = µij V(µkij

)= σ2Pij,ij

vec(µk) = Pvec(X ) Pnp×np = (P ′V ⊗ In) + (I′p ⊗ PF )− (P ′V ⊗ PF )

2 projections: ‖X − FV ′‖2 µk = FV ′ = XPV = PFX{V ′ = (F ′F )−1F ′X ⇒ PF = F (F ′F )−1F ′F = XV (V ′V )−1 ⇒ PV = V (V ′V )−1V ′

⇒ Draw (µ1, ..., µB)

52 / 44

Page 93: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Confidence ellipsesFixed-effects model Xn×p = µ+ ε εij

iid∼N(0, σ2

)with µ of rk k

MLE = LS µk = Un×kDk×kV′k×p.

Rq: Population data. (PPCA: random effects)

⇒ Asymptotic σ2 → 0 (Denis, Gower, 1994; Papadopoulo & Louraki 2000)

E(µkij) = µij V(µkij

)= σ2Pij,ij

vec(µk) = Pvec(X ) Pnp×np = (P ′V ⊗ In) + (I′p ⊗ PF )− (P ′V ⊗ PF )

2 projections: ‖X − FV ′‖2 µk = FV ′ = XPV = PFX{V ′ = (F ′F )−1F ′X ⇒ PF = F (F ′F )−1F ′F = XV (V ′V )−1 ⇒ PV = V (V ′V )−1V ′

⇒ Draw (µ1, ..., µB)

52 / 44

Page 94: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Confidence ellipsesFixed-effects model Xn×p = µ+ ε εij

iid∼N(0, σ2

)with µ of rk k

MLE = LS µk = Un×kDk×kV′k×p.

Rq: Population data. (PPCA: random effects)

⇒ Asymptotic σ2 → 0 (Denis, Gower, 1994; Papadopoulo & Louraki 2000)

E(µkij) = µij V(µkij

)= σ2Pij,ij

vec(µk) = Pvec(X ) Pnp×np = (P ′V ⊗ In) + (I′p ⊗ PF )− (P ′V ⊗ PF )

2 projections: ‖X − FV ′‖2 µk = FV ′ = XPV = PFX{V ′ = (F ′F )−1F ′X ⇒ PF = F (F ′F )−1F ′F = XV (V ′V )−1 ⇒ PV = V (V ′V )−1V ′

⇒ Draw (µ1, ..., µB)52 / 44

Page 95: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Confidence ellipses

⇒ Bootstrap⇒ Rows bootstrap (Holmes, 1985, Timmerman et al, 2007)

• random sample from a population - sampling variance• confidence areas around the position of the variables

⇒ Residuals bootstrap

• full population data - X = µ+ ε - noise fluctuation• confidence areas around the individuals and the variables

1 residuals ε = X − µk . Bootstrap or draw from N (0, σ2) : εb

2 Xb = µk + εb

3 PCA on Xb → (µ1 = U1D1V 1′ , ..., µB = UBDBV B′)

53 / 44

Page 96: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Confidence ellipses

⇒ Bootstrap⇒ Rows bootstrap (Holmes, 1985, Timmerman et al, 2007)

• random sample from a population - sampling variance• confidence areas around the position of the variables

⇒ Residuals bootstrap

• full population data - X = µ+ ε - noise fluctuation• confidence areas around the individuals and the variables

1 residuals ε = X − µk . Bootstrap or draw from N (0, σ2) : εb

2 Xb = µk + εb

3 PCA on Xb → (µ1 = U1D1V 1′ , ..., µB = UBDBV B′)

53 / 44

Page 97: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Confidence ellipses⇒

Cell-wise

Jackknife

µ(k)(−ij) the estimator from the matrix without the cell (ij)

µ(k)(ij)

jackk = µk +√np(µ(k)(−ij) − µk

)Represent pseudo-values: (µ1, ..., µnp)Requires a method to deal with missing values - costly

⇒ Approximate Jackknife (Craven & Wahba, 1979 lemma)

In regression y = Py : y−ii − yi = yi−yi1−Pi,i

In PCA vec(µk) = Pvec(X )

xij − µ(k)(−ij)

ij 'xij − µkij1− Pij,ij

54 / 44

Page 98: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Confidence ellipses⇒ Cell-wise Jackknifeµ(k)(−ij) the estimator from the matrix without the cell (ij)

µ(k)(ij)

jackk = µk +√np(µ(k)(−ij) − µk

)Represent pseudo-values: (µ1, ..., µnp)Requires a method to deal with missing values - costly

⇒ Approximate Jackknife (Craven & Wahba, 1979 lemma)

In regression y = Py : y−ii − yi = yi−yi1−Pi,i

In PCA vec(µk) = Pvec(X )

xij − µ(k)(−ij)

ij 'xij − µkij1− Pij,ij

54 / 44

Page 99: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Confidence ellipses⇒ Cell-wise Jackknifeµ(k)(−ij) the estimator from the matrix without the cell (ij)

µ(k)(ij)

jackk = µk +√np(µ(k)(−ij) − µk

)Represent pseudo-values: (µ1, ..., µnp)Requires a method to deal with missing values - costly

⇒ Approximate Jackknife (Craven & Wahba, 1979 lemma)

In regression y = Py : y−ii − yi = yi−yi1−Pi,i

In PCA vec(µk) = Pvec(X )

xij − µ(k)(−ij)

ij 'xij − µkij1− Pij,ij

54 / 44

Page 100: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Visualizing confidence areas⇒ Spread of (µ1, ..., µB) gives the variability of PCA µkn×p

PCA

Procrustes

rotationPCA

rotation

55 / 44

Page 101: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

SimulationsModel: X = µ+ ε, εij ∼ N

(0, σ2

)• n/p: 100/20, 50/50 and 20/100• k: 2, 4 - the ratio d1/d2: 4, 1 - SNR: 4, 1, 0.8

Coverage properties

−20 −10 0 10 20 30

−20

−10

010

20

Dim 1 (41.59%)

Dim

2 (

14.7

2%)

●●

●●

●●

●●

⇒ Different regimes:

• High SNR - n large:Asymptotic ≈ Bootstrap

• Small SNR: Jackknifes

56 / 44

Page 102: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Inference in PCA graphs

−3 −2 −1 0 1 2 3

−3

−2

−1

01

2

Bootstrap

Dim 1 (71.44%)

Dim

2 (

16.4

0%)

C_1C_2

C_3C_4

C_5

C_6 S_1

S_2

S_3

S_4S_5

S_6

S_7

S_8

S_FS_M

−4 −2 0 2 4 6

−6

−4

−2

02

Jackknife

Dim 1 (71.44%)D

im 2

(16

.40%

) C_1C_2

C_3C_4C_5

C_6 S_1

S_2S_3

S_4S_5

S_6

S_7

S_8

S_FS_M

• Simulation based on the data• Incorporating the uncertainty of k!• Variance of other estimators

57 / 44

Page 103: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Bayesian treatment of SVD

X = µ+ ε = UDV ′ + ε

⇒ Overparametrization - priors and posteriors meeting constraints?

• Hoff 2009 uniform prior for U and V special cases of vonMises-Fisher distributions (Stiefel manifold); (dl)l=1...k ∼ N

(0, s2λ

)• conditional posterior of U and V are also VMF• draw from the posterior of µ. Point estimate µ, small MSE

• Josse et. 2013 uil ∼ N (0, 1) vjl ∼ N (0, 1) (dl) ∼ N(0, s2λ

)• posteriors do not meet the constraints• posterior for µ: draw µ1, ..., µB ⇒ postprocessing stepSVD on each matrix µ1 = U1D1V 1, ..., µB = UBDBV B

⇒ Point estimate µ (shrinked) - uncertainty (µ1, ..., µB), also on k.

58 / 44

Page 104: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Bayesian treatment of SVD

X = µ+ ε = UDV ′ + ε

⇒ Overparametrization - priors and posteriors meeting constraints?

• Hoff 2009 uniform prior for U and V special cases of vonMises-Fisher distributions (Stiefel manifold); (dl)l=1...k ∼ N

(0, s2λ

)• conditional posterior of U and V are also VMF• draw from the posterior of µ. Point estimate µ, small MSE

• Josse et. 2013 uil ∼ N (0, 1) vjl ∼ N (0, 1) (dl) ∼ N(0, s2λ

)• posteriors do not meet the constraints• posterior for µ: draw µ1, ..., µB ⇒ postprocessing stepSVD on each matrix µ1 = U1D1V 1, ..., µB = UBDBV B

⇒ Point estimate µ (shrinked) - uncertainty (µ1, ..., µB), also on k.

58 / 44

Page 105: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Random effect models: Probabilistic PCA⇒ A specific factor analysis model (Roweis, 1998, Tipping & Bishop, 1999)

xij = (Zn×kB′p×k)ij + εij , zi ∼ N (0, Ik), εi ∼ N (0, σ2Ip)

• Conditional independence : xi |zi ∼ N (Bzi , σ2Ip)• Distribution i.i.d : xi ∼ N (0,Σ = BB′ + σ2Ip)

• Max Lik.: σ2 = 1p−k

∑pl=k+1 λl B = V (D − σ2Ik) 1

2

⇒ BLUP E(zi |xi): Z = XB(B′B + σ2Ik)−1

µppca = Z B′ µppcaij =

k∑l=1

(dl −

σ2

dl

)uilvjl

⇒ Bayesian SVD with a priori on U: Regularized PCA59 / 44

Page 106: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

The log-bilinear model & CA⇒ The saturated log-linear models (Christensen, 1990; Agresti, 2013).

logµij = αi + βj + Γij

⇒ The RC association model (Goodman, 1985; Gower, 2011; GAMMI)

logµij = αi + βj +K∑

k=1dkuikvjk

Estimation: iterative weighted least squares, steps of GLM.

⇒ CA (Greenacre,1984) texts corpus, spectral clustering on graphs:

zij = xij/N − ricj√ricji.e. Z = D−1/2r (X/N − rcT )D−1/2c

if X adjacency matrix, Z is symmetric normalized graph Laplacian

60 / 44

Page 107: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

The log-bilinear model & CA⇒ The saturated log-linear models (Christensen, 1990; Agresti, 2013).

logµij = αi + βj + Γij

⇒ The RC association model (Goodman, 1985; Gower, 2011; GAMMI)

logµij = αi + βj +K∑

k=1dkuikvjk

Estimation: iterative weighted least squares, steps of GLM.⇒ CA (Greenacre,1984) texts corpus, spectral clustering on graphs:

zij = xij/N − ricj√ricji.e. Z = D−1/2r (X/N − rcT )D−1/2c

if X adjacency matrix, Z is symmetric normalized graph Laplacian60 / 44

Page 108: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

CA approximates the log-bilinear model

⇒ SVD Z = UKDK V TK . Standard row and col coord

UK = D−1/2r UK , VK = D−1/2c VK . If the low-rank approx is good:

UKDKV TK ≈ D−1/2r ZD−1/2c = D−1r (X/N − rcT )D−1c (1)

By “solving for X” in (??), we get the reconstruction formula:

X/N = rcT + Dr (UKDKV TK )Dc i.e. xij

N = ricj(1 +

K∑k=1

dkuikvjc)

(2)

⇒ Connection (Escofier, 1982) : when∑K

k=1 dkuikvjk << 1, eq. (??)is:

log(xij) ≈ log(N) + log(ri) + log(cj) +K∑

k=1dkuikvjk

61 / 44

Page 109: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

CA approximates the log-bilinear model

⇒ SVD Z = UKDK V TK . Standard row and col coord

UK = D−1/2r UK , VK = D−1/2c VK . If the low-rank approx is good:

UKDKV TK ≈ D−1/2r ZD−1/2c = D−1r (X/N − rcT )D−1c (1)

By “solving for X” in (??), we get the reconstruction formula:

X/N = rcT + Dr (UKDKV TK )Dc i.e. xij

N = ricj(1 +

K∑k=1

dkuikvjc)

(2)

⇒ Connection (Escofier, 1982) : when∑K

k=1 dkuikvjk << 1, eq. (??)is:

log(xij) ≈ log(N) + log(ri) + log(cj) +K∑

k=1dkuikvjk

61 / 44

Page 110: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

62 / 44

Page 111: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Alcohol dataINPES (Santé publique France)region sex age year edu drunk alcohol glassesIle de France :8120 F:29776 18_25: 6920 2005:27907 E1:12684 0 :44237 <1/m :12889 0 : 2812Rhone Alpes :5421 M:23165 26_34: 9401 2010:25034 E2:23521 1-2 : 4952 0 : 6133 0-2:37867Provence Alpes :4116 35_44:10899 E3:6563 10-19: 839 1-2/m: 7583 10+: 590Nord Pas de Calais :3819 45_54: 9505 E4:10100 20-29: 212 1-2/w: 9526 3-4: 9401Pays de Loire :3152 55_64: 9503 NA:73 3-5 : 1908 3-4/w: 6815 5-6: 1795Bretagne :3038 65_+ : 6713 30+ : 404 5-6/w: 3402 7-9: 391(Other) :25275 6-9 : 389 7/w : 6593 NA: 85

binge Pbsleep Tabac<2/m:10323 Never:20605 Frequent : 91760 :34345 Often: 10172 Never :390801/m : 6018 Rare :22134 Occasional: 45881/w : 1800 NA: 30 NA: 977/w : 374NA : 81

region sexe age year edu drunk alcohol glasses binge1 Rhône Alpes M 45_54 2005 1 0 0 0-2 02 Rhône Alpes M 45_54 2005 2 0 0 0-2 03 Rhône Alpes M 55_64 2005 2 0 0 0-2 04 Rhône Alpes M 18_25 2005 3 0 0 0-2 05 Rhône Alpes M 18_25 2005 2 0 0 0-2 06 Rhône Alpes M 26_34 2005 2 0 0 0-2 0

63 / 44

Page 112: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

64 / 44

Page 113: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)

• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

64 / 44

Page 114: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V

2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

64 / 44

Page 115: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k

3 column margins are updated

64 / 44

Page 116: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

64 / 44

Page 117: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

V1 V2 V3 … V14 V1_a V1_b V1_c V2_e V2_f V3_g V3_h …ind 1 a NA g … u ind 1 1 0 0 0.71 0.29 1 0 …ind 2 NA f g u ind 2 0.12 0.29 0.59 0 1 1 0 …ind 3 a e h v ind 3 1 0 0 1 0 0 1 …ind 4 a e h v ind 4 1 0 0 1 0 0 1 …ind 5 b f h u ind 5 0 1 0 0 1 0 1 …ind 6 c f h u ind 6 0 0 1 0 1 0 1 …ind 7 c f NA v ind 7 0 0 1 0 1 0.37 0.63 …

… … … … … … … … … … … … … …ind 1232 c f h v ind 1232 0 0 1 0 1 0 1 …

⇒ the imputed values can be seen as degree of membership

64 / 44

Page 118: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA

• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

Two ways to impute categories: majority or draw

64 / 44

Page 119: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized iterative MCA• Initialization: imputation of the indicator matrix (proportions)• Iterate until convergence

1 estimation: MCA on the completed data → U,D,V2 imputation with the fitted matrix µ = UkDkV ′k3 column margins are updated

Two ways to impute categories: majority or drawExtended mixed data (Audigier, Husson & Josse, 2014) - weighted SVD

64 / 44

Page 120: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multiple imputation with MCA1 Variability of the parameters: B sets (Un×k ,Dk×k ,V>J×k)

using a non-parametric bootstrapX1 X2 XB

1 0 . . . 1 0 01 0 . . . 1 0 01 0 . . .

0.01 0.80 0.19

0.25 0.750 0 1

0 1 0 0 1

1 0 . . . 1 0 01 0 . . . 1 0 01 0 . . .

0.60 0.2 0.20

0.26 0.740 0 1

0 1 0 0 1

. . .

1 0 . . . 1 0 01 0 . . . 1 0 01 0 . . .

0.11 0.74 0.06

0.20 0.800 0 1

0 1 0 0 1

2 Categories drawn from multinomial disribution using thevalues in

(Xb)1≤b≤B

A . . . AA . . . AA . . .

B

B. . . C

B . . . B

A . . . AA . . . AA . . .

A

B. . . C

B . . . B

. . .

A . . . AA . . . AA . . .

B

B. . . C

B . . . B

Audigier, Husson & Josse (2016)65 / 44

Page 121: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multiple imputation for categorical data⇒ Joint modeling:

• Log-linear model (Schafer, 1997): pb many levels• Latent class models (Vermunt J.K, 2014) - nonparametricBayesian (Y. Si & J. P. Reiter, 2014, Dunson, 2015)

⇒ Conditional model: logistic, multinomial logit, random forests

⇒ MIMCA provides valid inference (ex. logistic reg with missing)• reduce the dimensionality• applied to data of various size (many levels, rare levels)

Time (seconds) Titanic Galetas Incomerows-variables-levels (2000 - 4 - 4) (1000 - 4 -11) (6000 - 14 - 9)MIMCA 2.750 8.972 58.729Loglinear 0.740 4.597 NANonparametric bayes 10.854 17.414 143.652Cond logistic 4.781 38.016 881.188Cond forests 265.771 112.987 6329.514

66 / 44

Page 122: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulations

• Quantities of interest: θ = parameters of a logistic model• 200 simulations from real data sets

• the real data set is considered as a population• drawn one sample from the data set• generate 20% of missing values• multiple imputation using M = 5 imputed data

• Criteria• bias• CI width, coverage

67 / 44

Page 123: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Results - Inference

MIM

CA

5

Logl

inea

r

Late

nt c

lass

FC

S−

log

FC

S−

rf

0.80

0.85

0.90

0.95

1.00

Titanic

cove

rage ●

●●

MIM

CA

2

Logl

inea

r

Late

nt c

lass

FC

S−

log

FC

S−

rf

0.80

0.85

0.90

0.95

1.00

Galetas

cove

rage

MIM

CA

5

Late

nt c

lass

FC

S−

log

FC

S−

rf

0.80

0.85

0.90

0.95

1.00

Income

cove

rage

Titanic Galetas IncomeNumber of variables 4 4 14Number of categories ≤ 4 ≤ 11 ≤ 9

68 / 44

Page 124: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

MI using the loglinear model

• Hypothesis X = (xijk)i ,j,k :X |θ ∼M (n, θ) where:

log(θijk) = λ0 + λAi + λBj + λCS + λABij + λACik + λBCjk + λABCijk

1 Variability of the parameters• prior on θ : θ|θ ∈ Θ ∼ D(α)• posterior: θ|x , θ ∈ Θ ∼ D(α′)

• Data Augmentation (M.A. Tanner, W.H. Wong, 1987)2 Imputation according to the loglinear model using the set of

M parameters

• Implemented: R package cat (J.L. Schafer)

69 / 44

Page 125: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

MI using a DPMPM model (Si and Reiter, 2013)

• Hypothesis: P (X = (x1, . . . , xK ); θ) =L∑`=1

(θ`

K∏k=1

θ(`)xk

)1 Variability of the parameters:

• a hierarchic prior on θ:α ∼ G(.25, .25) ζ` ∼ B(1, α) θ` = ζ`

∏g<`

(1− ζg ) for ` in 1, . . . ,∞

• posterior on θ: untractable→ Gibbs sampler and Data Augmentation

2 Imputation according to the mixture model using the set of Mparameters

• Implemented: R package mi (Gelman et al.)

70 / 44

Page 126: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative Random Forests imputation

1 Initial imputation: mean imputation

2 Fit a RF X obs1 on X−1 and then predict Xmiss

1Fit a RF X obs

2 on X−2 and then predict Xmiss2

...cycling through variables

3 Repeat until convergence

⇒ Conditional modeling based on RF

• number of trees: 100• number of variables randomly selected at each node √p• number of iterations: 4-5

⇒ Good for complex relationships between variables

71 / 44

Page 127: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Random Forests versus PCAFeat1 Feat2 Feat3 Feat4 Feat5...

C1 1 1 1 1 1C2 1 1 1 1 1C3 2 2 2 2 2C4 2 2 2 2 2C5 3 3 3 3 3C6 3 3 3 3 3C7 4 4 4 4 4C8 4 4 4 4 4C9 5 5 5 5 5C10 5 5 5 5 5C11 6 6 6 6 6C12 6 6 6 6 6C13 7 7 7 7 7C14 7 7 7 7 7Igor 8 NA NA 8 8Frank 8 NA NA 8 8Bertrand 9 NA NA 9 9Alex 9 NA NA 9 9Yohann 10 NA NA 10 10Jean 10 NA NA 10 10

72 / 44

Page 128: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Random Forests versus PCAFeat1 Feat2 Feat3 Feat4 Feat5

C1 1 1.0 1.00 1 1C2 1 1.0 1.00 1 1C3 2 2.0 2.00 2 2C4 2 2.0 2.00 2 2C5 3 3.0 3.00 3 3C6 3 3.0 3.00 3 3C7 4 4.0 4.00 4 4C8 4 4.0 4.00 4 4C9 5 5.0 5.00 5 5C10 5 5.0 5.00 5 5C11 6 6.0 6.00 6 6C12 6 6.0 6.00 6 6C13 7 7.0 7.00 7 7C14 7 7.0 7.00 7 7Igor 8 6.87 6.87 8 8Frank 8 6.87 6.87 8 8Bertrand 9 6.87 6.87 9 9Alex 9 6.87 6.87 9 9Yohann 10 6.87 6.87 10 10Jean 10 6.87 6.87 10 10

⇒ with Random Forests

Feat1 Feat2 Feat3 Feat4 Feat5C1 1 1 1 1 1C2 1 1 1 1 1C3 2 2 2 2 2C4 2 2 2 2 2C5 3 3 3 3 3C6 3 3 3 3 3C7 4 4 4 4 4C8 4 4 4 4 4C9 5 5 5 5 5C10 5 5 5 5 5C11 6 6 6 6 6C12 6 6 6 6 6C13 7 7 7 7 7C14 7 7 7 7 7Igor 8 8 8 8 8Frank 8 8 8 8 8Bertrand 9 9 9 9 9Alex 9 9 9 9 9Yohann 10 10 10 10 10Jean 10 10 10 10 10

⇒ with PCA(Stekhoven, Buhlmann, 2011 - Bartlett, Carpenter, 2014)

⇒ Non linear relationship well handled by forests73 / 44

Page 129: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Bayesian PCA complete case

Xij = µij + εij , εij ∼ N (0, σ2)

=S∑l=1

µ(l)ij + εij =

S∑l=1

√dlRilQjl + εij

⇒ Priors on Q,D,R: overparametrization issue

• Hoff (2009): Uniform prior for Q and R von Mises-Fisherdistributions (Stiefel manifold); (dl)l=1...k ∼ N

(0, s2λ

)• Josse & Denis (2013): disregarding the constraints;Qil ∼ N (0, 1), Rjl ∼ N (0, 1), (dl)l=1...k ∼ N

(0, s2λ

)⇒ Priors on Q,D,R: prior on µ

74 / 44

Page 130: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Bayesian PCA complete case

Model: Xij =S∑l=1

µ(l)ij + εij , εij ∼ N (0, σ2)

Prior: µ(l)ij ∼ N (0, τ2l )

Posterior:(µ

(l)ij |X

(l)ij

)= N (ΦlX (l)

ij , σ2Φl) with Φl = τ2l

τ2l + σ2

Empirical Bayes:(X (s)ij

)∼ N (0, τ2l + σ2) max the likelihood τ2l :

τs2 =

(λs − σ2

)

Φl = λs − σ2

λs= signal variance

total variance

75 / 44

Page 131: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multiple imputation Bayes-PCA

Variability of the parameters, B plausible (µij)1, ..., (µij)B

⇒ Posterior distribution: Bayesian PCA(µ

(l)ij |X

(l)ij

)= N (ΦlX (l)

ij , σ2Φl)

⇒ Data Augmentation (Tanner and Wong, 1987)

(I) given µ and σ2, imputing the missing values Xij by a drawfrom the predictive distribution N

(µij , σ

2)(P) drawing µij from its posterior distribution

76 / 44

Page 132: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Properties

⇒ Joint model: Xi . ∼ N (µ,Σ)

1 Bootstrap rows: X 1, ... , XB

EM algorithm: (µ1, Σ1), ... , (µB, ΣB)2 Imputation: Xb

ij drawn from N(µb, Σb

)⇒ Conditional modeling: one model/variable

1 For a variable j2.1 (β−j , σ−j) drawn from a bootstrap or a posterior distribution2.2 Imputation: stochastic regression Xij from N

(X−jβ−j , σ−j

)2 Cycling through variables - Repeat B times

With continuous variables and a regression/variable: N (µ,Σ)

More flexible? Tedious?

77 / 44

Page 133: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Properties

⇒ Joint model: Xi . ∼ N (µ,Σ)

1 Bootstrap rows: X 1, ... , XB

EM algorithm: (µ1, Σ1), ... , (µB, ΣB)2 Imputation: Xb

ij drawn from N(µb, Σb

)⇒ Conditional modeling: one model/variable

1 For a variable j2.1 (β−j , σ−j) drawn from a bootstrap or a posterior distribution2.2 Imputation: stochastic regression Xij from N

(X−jβ−j , σ−j

)2 Cycling through variables - Repeat B times

With continuous variables and a regression/variable: N (µ,Σ)More flexible? Tedious?

77 / 44

Page 134: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulations

The simulated data N (µ,Σ)• 2 underlying dimensions (control k)• n (30,200), p (6,60), ρ (0.3,0.8), %NA (10,30) 0000

0000

0.80.80.80.8

0.80.80.80.8

0.80.80.80.8

0.80.80.80.8

⇒ Imputation with B = 100 imputed tables with PCA, JM, CM(F U ′)ik

Estimate (analysis model): θb, Var(θb): θ1 = E [Y ] , θ2 = β1

Rubin: θ = 1B∑B

b=1 θb T = 1B∑

b Var(θb)

+ 1B−1

∑b

(θb − θ

)2⇒ Bias, CI width, coverage - 1000 simulations

78 / 44

Page 135: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Results for the expectation

parameters confidence interval width coverage

n p ρ % Joint

Cond

MIPCA

Joint

Cond

MIPCA

1 30 6 0.3 0.1 0.803 0.805 0.781 0.955 0.953 0.9502 30 6 0.3 0.3 1.010 0.898 0.971 0.9493 30 6 0.9 0.1 0.763 0.759 0.756 0.952 0.95 0.9494 30 6 0.9 0.3 0.818 0.783 0.965 0.9535 30 60 0.3 0.1 0.775 0.9556 30 60 0.3 0.3 0.864 0.9527 30 60 0.9 0.1 0.742 0.9538 30 60 0.9 0.3 0.759 0.9549 200 6 0.3 0.1 0.291 0.294 0.292 0.947 0.947 0.94610 200 6 0.3 0.3 0.328 0.334 0.325 0.954 0.959 0.95211 200 6 0.9 0.1 0.281 0.281 0.281 0.953 0.95 0.95212 200 6 0.9 0.3 0.288 0.289 0.288 0.948 0.951 0.95113 200 60 0.3 0.1 0.304 0.289 0.957 0.94514 200 60 0.3 0.3 0.384 0.313 0.981 0.95815 200 60 0.9 0.1 0.282 0.279 0.951 0.94816 200 60 0.9 0.3 0.296 0.283 0.958 0.952

⇒ Good estimates of θ and coverage ≈ 0.95: variability due tomissing is taken into account⇒ PCA: small - large n/p; strong - weak relation; low-high % NA

79 / 44

Page 136: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Conclusion

Multiple imputation methods for continuous and categorical datausing dimensionality reduction method

Properties:• requires a small number of parameters• captures the relationships between variables• captures the similarities between individuals

From a practical point of view:• can be applied on data sets of various dimensions• provides correct inferences for analysis model based onrelationships between pairs of variables

• requires to choose the number of dimensions SPerspective:

• mixed data80 / 44

Page 137: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

To concludeTake home message:

• Principal component methods powerful to impute large datasets with continuous, categorical variables: reduce thedimensionality and take into account the similarities betweenrows and relationship between variables.

• Single imputation aims to complete a dataset as best aspossible (prediction). Multiple imputation aims to performother statistical methods after and to estimate parametersand their variability taking into account the missing valuesuncertainty.

• R packages FactoMineR, MissMDA, denoiseR“The idea of imputation is both seductive and dangerous. It is seductive because it can lull the user into the

pleasurable state of believing that the data are complete after all, and it is dangerous because it lumps together

situations where the problem is sufficiently minor that it can be legitimately handled in this way and situations

where standard estimators applied to the real and imputed data have substantial biases.”81 / 44

Page 138: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

82 / 44

Page 139: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Feature Noising = Ridge = Shrinkage (Proof)

Eε[‖X − (X + ε)B‖22

]= ‖X − XB‖22 + Eε

[‖εB‖22

]= ‖X − XB‖22 +

∑i ,j,k

B2ij Var [εjk ]

= ‖X − XB‖22 + nσ2 ‖B‖22

Let X = UDV> be the SVD of X , λ = nσ2

Bλ = V BλV T , where Bλ = argminB{‖D − DB‖22 + λ ‖B‖22

}Bii = argminBii

{(1− Bii)2D2

ii + λB2ii

}= D2

iiλ+ D2

ii

µ(k)λ =

n∑i=1

Ui .Dii

1 + λ/D2iiV>i .

83 / 44

Page 140: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulations X200×500 = µ+ ε

k SNR Bootstrap TSVD Adaptive (λ, γ) Asympt Soft (λ)SA ISA

(σ, k) (σ) (k) (σ) - SURE (σ) (σ) - SUREMSE

10 4 0.004 0.004 0.004 0.004 0.004 0.008100 4 0.037 0.036 0.037 0.037 0.037 0.04510 2 0.017 0.017 0.017 0.017 0.017 0.033

100 2 0.142 0.143 0.152 0.142 0.146 0.15610 1 0.067 0.067 0.072 0.067 0.067 0.116

100 1 0.511 0.775 0.733 0.448 0.600 0.44810 0.5 0.277 0.251 0.321 0.253 0.250 0.353

100 0.5 1.600 1.000 3.164 0.852 0.961 0.852Rank

10 4 10 11 10 65100 4 100 103 100 19310 2 10 11 10 63

100 2 100 114 100 18110 1 10 11 10 59

100 1 29.6 154 64 15410 0.5 10 15 10 51

100 0.5 0 87 15 8684 / 44

Page 141: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Genotypes - Environement data16 triticales (6 complete - 8 substituted - 2 checks)10 locations in Spain (heterogeneous pH)⇒ Adaptation of the triticales to the Spanish soil

⇒ AMMI (Additive Main effects and Multiplicative Interaction)

yij = µ+ αi + βj +∑k

l=1 dluilvjl + εij

Y11 G1 E1

Y1J

Y21

YIJ

G1

G2

GI

EJ

E2

EJ

Y G E

G1

GI

YJ

E1 EJ

⇒ PCA on the residuals matrix of the 2-way anova85 / 44

Page 142: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Expression data• 27 chickens submitted to 4 nutritional statuses• 12664 gene expressions• X = µ+ ε. PCA µk =

∑kl=1 ul dl v>l

PCA rPCA

PCAScores F = UD rPCA

86 / 44

Page 143: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Expression dataClustering on the denoised matrix µk =

∑kl=1 ul dl v>l

PCA rPCA

Bi-clustering from rPCA better find the 4 nutritional statuses!

87 / 44

Page 144: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Illustration of the regularization

X sim4×10 = µ4×10 + εsim4×10 sim = 1, ..., 1000

PCA rPCA

⇒ rPCA more biased less variable88 / 44

Page 145: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Rows representation

100 rows, 20 variables, 2 dimensions, SNR=0.8, d1/d2 = 4

True configuration PCA

89 / 44

Page 146: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Rows representation

100 rows, 20 variables, 2 dimensions, SNR=0.8, d1/d2 = 4

PCA Non-linear

90 / 44

Page 147: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Rows representation

100 rows, 20 variables, 2 dimensions, SNR=0.8, d1/d2 = 4

PCA Soft

90 / 44

Page 148: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Rows representation100 rows, 20 variables, 2 dimensions, SNR=0.8, d1/d2 = 4

True configuration Non-linear

⇒ Improve the recovery of the inner-product matrix µµ′

91 / 44

Page 149: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Variables representation

True configuration PCA

cor(X7, X9)= 1 0.68

92 / 44

Page 150: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Variables representation

PCA Non-linear

cor(X7, X9)= 0.68 0.81

93 / 44

Page 151: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Variables representation

PCA Soft

cor(X7, X9)= 0.68 0.82

93 / 44

Page 152: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Variables representationTrue configuration Non-linear

cor(X7, X9)= 1 0.81⇒ Improve the recovery of the covariance matrix µ′µ

94 / 44

Page 153: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Drop-out training

Drop-out (Hinton et al. 2012) randomly omits subsets of featuresat each iteration of a training algorithm (improves neural networks)

Wager, et al. (2013). Dropout training as adaptive regularization.

• GLM: equivalence between noising schemes and regularization• Full potential with drop-out noise: Xij = 0 with prob δ andXij/(1− δ) → nice penalty (ex. logistic reg - rare features)

Wager, et al. (2014). Altitude training: bounds for single layer.

• "marathon runner who practices on altitude: once a classifierlearns to perform well on training examples corrupted bydropout, it will do very well on the uncorrupted test"

95 / 44

Page 154: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Count data

⇒ Model: X ∈ Rn×p ∼ L(µ) with E [X ] = µ of low-rank k

Poisson noise: Xij ∼ Poisson (µij)

⇒ µk =∑k

l=1 ul dl v>l

⇒ Bootstap estimator = noising: Xij ∼ Poisson (Xij)

µboot = XB B = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]}⇒ Estimator robust to subsampling the obs used to build X

96 / 44

Page 155: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Count data

⇒ Model: X ∈ Rn×p ∼ L(µ) with E [X ] = µ of low-rank k

Poisson noise: Xij ∼ Poisson (µij)

⇒ µk =∑k

l=1 ul dl v>l

⇒ Bootstap estimator = noising: Xij ∼ Poisson (Xij)

µboot = XB B = argminB{EX∼L(X)

[∥∥∥X − XB∥∥∥22

]}⇒ Estimator robust to subsampling the obs used to build X

96 / 44

Page 156: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Bootstrap estimators

⇒ Feature noising = regularization

B = argminB{‖X − XB‖22 +

∥∥∥S 12B∥∥∥22

}Sjj =

n∑i=1

VarX∼L(X)

[Xij]

µ = X (X>X + S)−1X>X , S diagonal with row-sums of X

⇒ New estimator µ that does not reduce to singular valueshrinkage: new singular vectors!

⇒ ISA estimator - iterative algorithm1 µ = XB2 B = (µ>µ+ S)−1µ>µ

97 / 44

Page 157: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Perfume data set⇒ 12 luxury perfumes described by 39 words - N=1075

floral fruity strong soft light ...Angel 2 11 18 3 1 ...Aromatics Elixir 2 3 29 2 0 ...Chanel 5 5 0 19 3 1 ...Cinéma 14 14 3 12 9 ...Coco Mademoiselle 10 10 6 10 7 ......... . . . . .

⇒ Correspondence Analysis

M = R−12

(X − 1

N rc>)C−

12 , where R = diag (r) , C = diag (c)

If the low-rank approx is good:

µCAk = R12 Mk C

12 + 1

N rc>. (3)

xijN = ricj

(1 +

∑Kk=1 dkuikvjc

)Connection (Escofier, 1982) : when∑K

k=1 dkuikvjk << 1, eq. (??) is:

log(xij) log(N) + log(ri) + log(cj) +K∑

k=1dkuikvjk

98 / 44

Page 158: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Perfume data set

−1.5 −1.0 −0.5 0.0 0.5 1.0 1.5 2.0

−0.

50.

00.

51.

0CA

Dim 1 (52.04%)

Dim

2 (

17.8

5%)

Angel

Aromatics Elixir

Chanel 5

Cinema

Coco MademoiselleJ_adore

J_adore_et

L_instant

Lolita Lempika

Pleasures

Pure Poison

Shalimar

floral

fruity

strongsoftlight

sugary

freshdiscreet

soap

vanilla

old

agressive

male

toilets

alcohol

heavy

drugs

peppery

musky

eau.de.cologne

99 / 44

Page 159: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Perfume data set

−1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

FREE

Dim 1 (77.13%)

Dim

2 (

22.8

7%)

Angel

Aromatics Elixir

Chanel 5

cinema

Coco Mademoiselle

J_adore

J_adore_et

L_instant

Lolita Lempika

Pleasures

Pure Poison

Shalimar

floral

fruity

strong

soft

light

sugary

fresh

vanilla

old

agressive

male

toilets

alcohol

heavy

drugs

peppery

rose

eau.de.cologne

amber

shampoo

99 / 44

Page 160: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Regularized Correspondence Analysis

−1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

Factor map

Dim 1 (65.41%)

Dim

2 (

34.5

9%)

J_adore

J_adore_et

Pleasures

cinema

Pure Poison

Coco Mademoiselle

Lolita Lempika

L_instant

Angel

Shalimar

Chanel 5

Aromatics Elixir

cluster 1

cluster 2

cluster 3

cluster 4

cluster 1 cluster 2 cluster 3 cluster 4

−0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

Factor map

Dim 1 (77.13%)

Dim

2 (

22.8

7%)

J_adore

J_adore_et

Pleasures

cinema

Pure Poison

Coco Mademoiselle

Lolita Lempika

L_instant

Angel

Shalimar

Chanel 5

Aromatics Elixir

cluster 1

cluster 2

cluster 3

cluster 1 cluster 2 cluster 3

100 / 44

Page 161: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

101 / 44

Page 162: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Ozone data set

O3 T9 T12 T15 Ne9 Ne12 Ne15 Vx9 Vx12 Vx15 O3v0601 NA 15.6 18.5 18.4 4 4 8 NA -1.7101 -0.6946 840602 82 17 18.4 17.7 5 5 7 NA NA NA 870603 92 NA 17.6 19.5 2 5 4 2.9544 1.8794 0.5209 820604 114 16.2 NA NA 1 1 0 NA NA NA 920605 94 17.4 20.5 NA 8 8 7 -0.5 NA -4.3301 1140606 80 17.7 NA 18.3 NA NA NA -5.6382 -5 -6 940607 NA 16.8 15.6 14.9 7 8 8 -4.3301 -1.8794 -3.7588 800610 79 14.9 17.5 18.9 5 5 4 0 -1.0419 -1.3892 NA0611 101 NA 19.6 21.4 2 4 4 -0.766 NA -2.2981 790612 NA 18.3 21.9 22.9 5 6 8 1.2856 -2.2981 -3.9392 1010613 101 17.3 19.3 20.2 NA NA NA -1.5 -1.5 -0.8682 NA...

......

......

......

......

......

0919 NA 14.8 16.3 15.9 7 7 7 -4.3301 -6.0622 -5.1962 420920 71 15.5 18 17.4 7 7 6 -3.9392 -3.0642 0 NA0921 96 NA NA NA 3 3 3 NA NA NA 710922 98 NA NA NA 2 2 2 4 5 4.3301 960923 92 14.7 17.6 18.2 1 4 6 5.1962 5.1423 3.5 980924 NA 13.3 17.7 17.7 NA NA NA -0.9397 -0.766 -0.5 920925 84 13.3 17.7 17.8 3 5 6 0 -1 -1.2856 NA0927 NA 16.2 20.8 22.1 6 5 5 -0.6946 -2 -1.3681 710928 99 16.9 23 22.6 NA 4 7 1.5 0.8682 0.8682 NA0929 NA 16.9 19.8 22.1 6 5 3 -4 -3.7588 -4 990930 70 15.7 18.6 20.7 NA NA NA 0 -1.0419 -4 NA

102 / 44

Page 163: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Visualization with Multiple Correspondence Analysis

−1.0 −0.5 0.0 0.5 1.0 1.5

−1.

0−

0.5

0.0

0.5

1.0

MCA graph of the categories

Dim 1 (19.07%)

Dim

2 (

17.7

1%)

maxO3_m

maxO3_o

T9_m

T9_o

T12_m

T12_o

T15_m

T15_o

Ne9_m

Ne9_o

Ne12_m

Ne12_o

Ne15_m

Ne15_o

Vx9_m

Vx9_o

Vx12_m

Vx12_o

Vx15_mVx15_o

maxO3v_m

maxO3v_o

103 / 44

Page 164: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Single imputation methods

●●●●● ● ● ●

● ●●

● ●

●● ●●●●●

●● ●●

● ●

●●

●●

● ●●

● ●●

● ● ●●●

● ●

● ●●●● ●

●● ●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ● ●● ●●

●●

● ●●

● ● ●● ● ●

●●

● ●●

●●●

●●

●●●

● ●● ●● ●● ●●●●

●●● ●

● ●

● ● ●

●●

−3 −2 −1 0 1 2

−2

−1

01

2

Mean imputation

X

Y ●●●● ● ●● ●●● ●●● ●●●●● ●● ●● ●●● ●●●● ● ●●● ●● ● ●●● ●● ● ● ●● ●●●● ● ●● ●●● ●● ●● ●● ●● ●●● ●●● ● ● ●● ●●●● ● ●● ● ●● ● ●● ● ●● ●●● ● ●● ● ●●●● ●● ●●● ●●● ●●● ●● ●● ● ● ●●●

●●

● ●

●●●●●

● ●●

●●

●●●

●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2

−2

−1

01

2

Regression imputation

X

Y

●●●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

−3 −2 −1 0 1 2

−3

−2

−1

01

2

Stochastic regression imputation

X

Y

●●

●●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●●

●●

●● ●●

●●

●●

●●

µy = 0σy = 1ρ = 0.6CIµy95%

0.010.50.3039.4

0.010.720.7861.6

0.010.990.5970.8

The idea of imputation is both seductive and dangerous (Dempster and Rubin, 1983)

⇒ Standard errors of the parameters (σµy ) calculated from theimputed data set are underestimated

104 / 44

Page 165: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Single imputation methods

●●●●● ● ● ●

● ●●

● ●

●● ●●●●●

●● ●●

● ●

●●

●●

● ●●

● ●●

● ● ●●●

● ●

● ●●●● ●

●● ●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ● ●● ●●

●●

● ●●

● ● ●● ● ●

●●

● ●●

●●●

●●

●●●

● ●● ●● ●● ●●●●

●●● ●

● ●

● ● ●

●●

−3 −2 −1 0 1 2

−2

−1

01

2

Mean imputation

X

Y ●●●● ● ●● ●●● ●●● ●●●●● ●● ●● ●●● ●●●● ● ●●● ●● ● ●●● ●● ● ● ●● ●●●● ● ●● ●●● ●● ●● ●● ●● ●●● ●●● ● ● ●● ●●●● ● ●● ● ●● ● ●● ● ●● ●●● ● ●● ● ●●●● ●● ●●● ●●● ●●● ●● ●● ● ● ●●●●

●●

● ●

●●●●●

● ●●

●●

●●●

●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2

−2

−1

01

2

Regression imputation

XY

●●●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

−3 −2 −1 0 1 2

−3

−2

−1

01

2

Stochastic regression imputation

X

Y

●●

●●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●●

●●

●● ●●

●●

●●

●●

µy = 0σy = 1ρ = 0.6CIµy95%

0.010.50.3039.4

0.010.720.7861.6

0.010.990.5970.8

The idea of imputation is both seductive and dangerous (Dempster and Rubin, 1983)

⇒ Standard errors of the parameters (σµy ) calculated from theimputed data set are underestimated

104 / 44

Page 166: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Single imputation methods

●●●●● ● ● ●

● ●●

● ●

●● ●●●●●

●● ●●

● ●

●●

●●

● ●●

● ●●

● ● ●●●

● ●

● ●●●● ●

●● ●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ● ●● ●●

●●

● ●●

● ● ●● ● ●

●●

● ●●

●●●

●●

●●●

● ●● ●● ●● ●●●●

●●● ●

● ●

● ● ●

●●

−3 −2 −1 0 1 2

−2

−1

01

2

Mean imputation

X

Y ●●●● ● ●● ●●● ●●● ●●●●● ●● ●● ●●● ●●●● ● ●●● ●● ● ●●● ●● ● ● ●● ●●●● ● ●● ●●● ●● ●● ●● ●● ●●● ●●● ● ● ●● ●●●● ● ●● ● ●● ● ●● ● ●● ●●● ● ●● ● ●●●● ●● ●●● ●●● ●●● ●● ●● ● ● ●●●●

●●

● ●

●●●●●

● ●●

●●

●●●

●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2

−2

−1

01

2

Regression imputation

XY

●●●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

−3 −2 −1 0 1 2

−3

−2

−1

01

2

Stochastic regression imputation

X

Y

●●

●●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●●

●●

●● ●●

●●

●●

●●

µy = 0σy = 1ρ = 0.6CIµy95%

0.010.50.3039.4

0.010.720.7861.6

0.010.990.5970.8

The idea of imputation is both seductive and dangerous (Dempster and Rubin, 1983)

⇒ Standard errors of the parameters (σµy ) calculated from theimputed data set are underestimated

104 / 44

Page 167: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Single imputation methods

●●●●● ● ● ●

● ●●

● ●

●● ●●●●●

●● ●●

● ●

●●

●●

● ●●

● ●●

● ● ●●●

● ●

● ●●●● ●

●● ●

●●

●●

●● ●

●●

●●

●●●

● ●

●●

●●

● ● ●● ●●

●●

● ●●

● ● ●● ● ●

●●

● ●●

●●●

●●

●●●

● ●● ●● ●● ●●●●

●●● ●

● ●

● ● ●

●●

−3 −2 −1 0 1 2

−2

−1

01

2

Mean imputation

X

Y ●●●● ● ●● ●●● ●●● ●●●●● ●● ●● ●●● ●●●● ● ●●● ●● ● ●●● ●● ● ● ●● ●●●● ● ●● ●●● ●● ●● ●● ●● ●●● ●●● ● ● ●● ●●●● ● ●● ● ●● ● ●● ● ●● ●●● ● ●● ● ●●●● ●● ●●● ●●● ●●● ●● ●● ● ● ●●●●

●●

● ●

●●●●●

● ●●

●●

●●●

●●

●●●●

●●

●●

●●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

−3 −2 −1 0 1 2

−2

−1

01

2

Regression imputation

XY

●●●●●

●●

●●●

●●

●●

●●●●

●●

●●

●●

●●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●

●●●

●●

● ●

● ●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

●●

● ●●

●●

●●

●●

●●

−3 −2 −1 0 1 2

−3

−2

−1

01

2

Stochastic regression imputation

X

Y

●●

●●●

●●

●●

●●

●●

●● ●

● ●

●●

●●

●●●

●●

●● ●●

●●

●●

●●

µy = 0σy = 1ρ = 0.6CIµy95%

0.010.50.3039.4

0.010.720.7861.6

0.010.990.5970.8

The idea of imputation is both seductive and dangerous (Dempster and Rubin, 1983)⇒ Standard errors of the parameters (σµy ) calculated from theimputed data set are underestimated

104 / 44

Page 168: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

PCA reconstruction-2.00 -2.74-1.56 -0.77-1.11 -1.59-0.67 -1.13-0.22 -1.220.22 -0.520.67 1.461.11 0.631.56 1.102.00 1.00

XXXX

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

x1

x2

⇒ Minimizes distance between observations and their projection

⇒ Approx Xn×p with a low rank matrix S < p:

argminµ{‖X − µ‖22 : rank (µ) ≤ S

}

SVD X : µPCA = Un×SΛ12S×SV

′p×S

= Fn×SV′p×S

F = UΛ 12 PC - scores

V principal axes - loadings

105 / 44

Page 169: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

PCA reconstruction-2.00 -2.74-1.56 -0.77-1.11 -1.59-0.67 -1.13-0.22 -1.220.22 -0.520.67 1.461.11 0.631.56 1.102.00 1.00

-2.16 -2.58-0.96 -1.35-1.15 -1.55-0.70 -1.09-0.53 -0.920.04 -0.341.24 0.891.05 0.691.50 1.151.67 1.33

XXXX

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

x1

x2

XXXX

⇒ Minimizes distance between observations and their projection⇒ Approx Xn×p with a low rank matrix S < p:

argminµ{‖X − µ‖22 : rank (µ) ≤ S

}

SVD X : µPCA = Un×SΛ12S×SV

′p×S

= Fn×SV′p×S

F = UΛ 12 PC - scores

V principal axes - loadings

105 / 44

Page 170: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

PCA reconstruction-2.00 -2.74-1.56 -0.77-1.11 -1.59-0.67 -1.13-0.22 -1.220.22 -0.520.67 1.461.11 0.631.56 1.102.00 1.00

-2.16 -2.58-0.96 -1.35-1.15 -1.55-0.70 -1.09-0.53 -0.920.04 -0.341.24 0.891.05 0.691.50 1.151.67 1.33

XXXX

-3 -2 -1 0 1 2 3

-3-2

-10

12

3

x1

x2

XXXX

⇒ Minimizes distance between observations and their projection⇒ Approx Xn×p with a low rank matrix S < p:

argminµ{‖X − µ‖22 : rank (µ) ≤ S

}

SVD X : µPCA = Un×SΛ12S×SV

′p×S

= Fn×SV′p×S

F = UΛ 12 PC - scores

V principal axes - loadings105 / 44

Page 171: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Missing values in PCA

⇒ PCA: least squares

argminµ{‖X − µ‖22 : rank (µ) ≤ S

}

⇒ PCA with missing values: weighted least squares

argminµ{‖Wn×p ∗ (X − µ)‖22 : rank (µ) ≤ S

}with Wij = 0 if Xij is missing, Wij = 1 otherwise

Many algorithms: weighted alternating least squares (Gabriel &Zamir, 1979); iterative PCA (Kiers, 1997)

106 / 44

Page 172: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA

-2 -1 0 1 2 3

-2-1

01

23

x1

x2

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 NA2.0 1.98

107 / 44

Page 173: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA

-2 -1 0 1 2 3

-2-1

01

23

x1

x2

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 NA2.0 1.98

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.002.0 1.98

Initialization ` = 0: X 0 (mean imputation)107 / 44

Page 174: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA

-2 -1 0 1 2 3

-2-1

01

23

x1

x2

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 NA2.0 1.98

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.002.0 1.98

x1 x2-1.98 -2.04-1.44 -1.560.15 -0.181.00 0.572.27 1.67

PCA on the completed data set → (U`,Λ`,V `);107 / 44

Page 175: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA

-2 -1 0 1 2 3

-2-1

01

23

x1

x2

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 NA2.0 1.98

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.002.0 1.98

x1 x2-1.98 -2.04-1.44 -1.560.15 -0.181.00 0.572.27 1.67

Missing values imputed with the fitted matrix µ` = U`Λ1/2`V `′107 / 44

Page 176: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA

-2 -1 0 1 2 3

-2-1

01

23

x1

x2

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 NA2.0 1.98

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.002.0 1.98

x1 x2-1.98 -2.04-1.44 -1.560.15 -0.181.00 0.572.27 1.67

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.572.0 1.98

The new imputed dataset is X ` = W ∗ X + (1−W ) ∗ µ`107 / 44

Page 177: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCAx1 x2

-2.0 -2.01-1.5 -1.480.0 -0.011.5 NA2.0 1.98

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.572.0 1.98

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.572.0 1.98

-2 -1 0 1 2 3

-2-1

01

23

x1

x2

107 / 44

Page 178: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCAx1 x2

-2.0 -2.01-1.5 -1.480.0 -0.011.5 NA2.0 1.98

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.572.0 1.98

x1 x2-2.00 -2.01-1.47 -1.520.09 -0.111.20 0.902.18 1.78

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.902.0 1.98

-2 -1 0 1 2 3

-2-1

01

23

x1

x2

107 / 44

Page 179: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCAx1 x2

-2.0 -2.01-1.5 -1.480.0 -0.011.5 NA2.0 1.98

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.002.0 1.98

x1 x2-1.98 -2.04-1.44 -1.560.15 -0.181.00 0.572.27 1.67

x1 x2-2.0 -2.01-1.5 -1.480.0 -0.011.5 0.572.0 1.98

-2 -1 0 1 2 3

-2-1

01

23

x1

x2

Steps are repeated until convergence107 / 44

Page 180: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative PCA x1 x2

-2.0 -2.01

-1.5 -1.48

0.0 -0.01

1.5 NA

2.0 1.98

x1 x2

-2.0 -2.01

-1.5 -1.48

0.0 -0.01

1.5 1.46

2.0 1.98

-2 -1 0 1 2 3

-2

-1

0

1

2

3

x1

x2

PCA on the completed data set → (U`,Λ`,V `)Missing values imputed with the fitted matrix µ` = U`Λ1/2`V `′ 107 / 44

Page 181: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Principal component method for mixed data (complete)Factorial Analysis Mixed Data FAMD (Escofier, 1979), PCAMIX (Kiers, 1991)

Categoricalvariables

Continuousvariables

0 1 0 1 0

centring &scaling

I1I2 Ik

division by and centring

√I/Ik

0 1 0 1 0

0 1 0 0 1

51 100 19070 96 196

38 69 166

0 11 0

1 0

1 0 00 1 0

0 1 0

Indicator matrix

Matrix which balances the

influence of each variable

A PCA is performed on the weighted matrix: SVD (X , D−1Σ , 1I II), with X the

matrix with the continuous variables and the indicator matrix, DΣ, the diagonalmatrix with the standard deviation and the weights

√(IS/I). 108 / 44

Page 182: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Properties of FAMD (complete)

Benzecri, 1973 :"All in all, doing a data analysis, in good mathematics, issimply searching eigenvectors; all the science (or the art) of it is just tofind the right matrix to diagonalize"

• The distance between observations is:

d2(i , l) =Kcont∑k=1

1σS

(xik − xlk)2 +Q∑

q=1

Kq∑k=1

1Ikq

(xiq − xlq)2

• The principal component Fs maximises:

Kcont∑k=1

r2(Fs , vS) +Qcat∑q=1

η2(Fs , vq)

109 / 44

Page 183: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Iterative FAMD algorithm1 Initialization: imputation mean (continuous) and proportion (dummy)

2 Iterate until convergence

(a) estimation: FAMD on the completed data ⇒ U,Λ,V(b) imputation of the missing values with the fitted matrix

X = USΛ1/2S V ′S

(c) means, standard deviations and column margins are updated

age weight size alcohol sex snore tobaccoNA 100 190 NA M yes no70 96 186 1-2 gl/d M NA <=1NA 104 194 No W no NA62 68 165 1-2 gl/d M no <=1

age weight size alcohol sex snore tobacco51 100 190 1-2 gl/d M yes no70 96 186 1-2 gl/d M no <=148 104 194 No W no <=162 68 165 1-2 gl/d M no <=1

51 100 190 0.2 0.7 0.1 1 0 0 1 1 0 070 96 186 0 1 0 1 0 0.8 0.2 0 1 048 104 194 1 0 0 0 1 1 0 0.1 0.8 0.162 68 165 0 1 0 1 0 1 0 0 1 0

NA 100 190 NA NA NA 1 0 0 1 1 0 070 96 186 0 1 0 1 0 NA NA 0 1 0NA 104 194 1 0 0 0 1 1 0 NA NA NA62 68 165 0 1 0 1 0 1 0 0 1 0

imputeAFDM

⇒ Imputed values can be seen as degrees of membership110 / 44

Page 184: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multi-blocks data set

L’OREAL: 100 000 women in different countries - 300 questions• Self-assessment questionnaire: life style, skin and haircharacteristics, care and consumer habits

• Clinical assessments by a dermatologist: facial skincomplexion, wrinkles, scalp dryness, greasiness

• Hair assessments by a hair dresser: abundance, volume,breakage, curliness

• Skin and Hair photographs and measurements: sebumquantity, etc.

111 / 44

Page 185: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Multi-blocks data set

• Sensory analysis: products described by people and byphysico-chemical measurements(each judge can’t taste more than 8 products: Plannedmissing products per judge, experimental design: BIB)

• Biology. DNA/RNA (samples without expression data)

Continuous / categorical / contingency sets of variables⇒ Missing rows per subtable⇒ Regularized iterative Multiple Factor Analysis (Husson & Josse, 2013)

112 / 44

Page 186: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Journal impact factors

journalmetrics.com provides 27000 journals/ 15 years of metrics.

443 journals (Computer Science, Statistics, Probability andMathematics). 45 metrics, some may be NA, 15 years by 3 typesof measures:

• IPP - Impact Per Publication (like the ISI impact factor butfor 3 (rather than 2) years.

• SNIP - Source Normalized Impact Per Paper: Tries to weightby the number of citations per subject field to adjust fordifferent citation cultures.

• SJR - SCImago Journal Rank: Tries to capture averageprestige per publication.

113 / 44

Page 187: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

MFA with missing values

-5 0 5 10 15 20

-4-2

02

46

Journals

Dim 1 (74.03%)

Dim

2 (8

.29%

)

ACM Transactions on Autonomous and Adaptive Systems

ACM Transactions on Mathematical SoftwareACM Transactions on Programming Languages and Systems

ACM Transactions on Software Engineering and Methodology

Ad Hoc Networks

Advances in Engineering Software (1978)

Annals of Applied Probability

Annals of Probability

Annals of Statistics

Bioinformatics

Biometrics

Biometrika

Biostatistics

Computer Vision and Image Understanding

Finance and Stochastics

IBM Systems Journal IEEE Micro

IEEE Network

IEEE Pervasive Computing

IEEE Transactions on Affective Computing IEEE Transactions on Evolutionary Computation

IEEE Transactions on Image Processing

IEEE Transactions on Medical Imaging

IEEE Transactions on Mobile ComputingIEEE Transactions on Neural Networks

IEEE Transactions on Pattern Analysis and Machine IntelligenceIEEE Transactions on Software Engineering

IEEE Transactions on Systems, Man and Cybernetics Part B: Cybernetics

IEEE Transactions on Visualization and Computer Graphics

IEEE/ACM Transactions on Networking

Information Systems International Journal of Computer Vision

International Journal of Robotics Research

Journal of Business

Journal of Business and Economic Statistics

Journal of Cryptology

Journal of Informetrics

Journal of Machine Learning Research

Journal of the ACMJournal of the American Society for Information Science and Technology

Journal of the American Statistical Association

Journal of the Royal Statistical Society. Series B: Statistical Methodology

Machine Learning

Mathematical Programming, Series B

Multivariate Behavioral Research

New Zealand Statistician

Pattern Recognition

Physical Review E - Statistical, Nonlinear, and Soft Matter Physics

Probability Surveys

Probability Theory and Related Fields

Journal of Computational and Graphical Statistics

R Journal

Annals of Applied Statistics

Journal of Statistical Software

114 / 44

Page 188: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

MFA with missing values

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlation circle

Dim 1 (74.03%)

Dim

2 (

8.29

%)

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlation circle

Dim 1 (74.03%)

Dim

2 (

8.29

%)

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006

SNIP_2007SNIP_2008 SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

114 / 44

Page 189: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

MFA with missing values

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlation circle

Dim 1 (74.03%)

Dim

2 (

8.29

%)

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

IPP_1999 IPP_2000

IPP_2001

IPP_2002

IPP_2003IPP_2004

IPP_2005

IPP_2006 IPP_2007

IPP_2008IPP_2009

IPP_2010

IPP_2011IPP_2012IPP_2013

−1.0 −0.5 0.0 0.5 1.0

−1.

0−

0.5

0.0

0.5

1.0

Correlation circle

Dim 1 (74.03%)

Dim

2 (

8.29

%)

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

SNIP_1999

SNIP_2000

SNIP_2001SNIP_2002

SNIP_2003SNIP_2004

SNIP_2005

SNIP_2006SNIP_2007SNIP_2008

SNIP_2009

SNIP_2010

SNIP_2011 SNIP_2012SNIP_2013

114 / 44

Page 190: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

MFA with missing valuesRows: 47000 journals / Groups: 15 years of data/ Variables: 3scores each year. Many missing...ACM Transactions on Networking trajectory.pdf

−20 −10 0 10 20 30 40 50

−20

−10

010

2030

40

Individual factor map

Dim 1 (74.03%)

Dim

2 (

8.29

%)

●●

●●

IEEE/ACM Transactions on Networking●

year_1999year_2000year_2001year_2002year_2003year_2004year_2005year_2006year_2007year_2008year_2009year_2010year_2011year_2012year_2013

114 / 44

Page 191: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Outline

1 Low-rank matrix estimationSingular values shrinkageBootstrap framework

2 Inference with missing values using PC methodsSingle and multiple imputation with PCAImputation with MCA

3 A multilogit bilinear model for MCA

115 / 44

Page 192: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Models for categorical variables

• Log-linear models - gold standard (Christensen, 1990; Agresti, 2013)⇒ Pb with high dimensional data.

• Latent variables models:• categoricals: latent class models (Goodman, 1974) -unsupervised clustering for one latent variable.Nonparametric Bayesian extensions (Dunson, 2009, 2012)

• continuous: latent-trait models (Lazar, 1968) - itemresponse theory (psychology & education, Van der Linden, 1997)

• fixed: often one latent variable, difficulties to estimate.• random: Gaussian distribution on the latent variables

(Moustaki, 2000; Sanchez, 2013)

⇒ Binary data: Collins, Dasgupta, & Schapire (2001), Buntine(2002), Hoff (2009), De Leeuw (2006), Li & Tao (2013)

116 / 44

Page 193: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Relationship with MCALemmaLet G ∈ Rn×n,H1 ∈ Rn×n,H2 ∈ Rm×m, with H1,H2 � 0.

argmaxΓ: rank(Γ)≤K 〈Γ,G〉 −12‖H1ΓH2‖2F

Γ∗ = H−11

[SVDK (H−11 GH−12 )

]H−12

Thus, using Lemma ??, the solution 〈Γ,A− 1pT 〉 − 12‖ΓD

1/2p ‖2F is

given by the rank K SVD of (A− 1pT )D−1/2p which is preciselythe SVD performed in MCA.

TheoremThe one-step likelihood estimate for the MultiLogit Bilinear modelwith rank constraint K, obtained by expanding around theindependence model (β0 = log p,Γ0 = 0), is (β0, ΓMCA).

117 / 44

Page 194: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Relationship with MCA

`(β,Γ;A) = βj(Aji ) + Γji (A

ji )− log

Cj∑c=1

eβj (c)+Γji (c)

∂`

∂Γji (c)= 1xij=c −

eβj (c)+Γji (c)∑Cj

c′=1 eβj (c′)+Γj

i (c′)= Aj

ic − πijc (4)

∂`

∂Γji (c)∂Γj′

i ′(c ′)={πijcπijc′ − πijc1c=c′ j = j ′, i = i ′

0 o.w. (5)

Evaluating (??) at ζ0 = (β0 = log(p), 0) gives Ajic − pj(c) - idem

(??)

˜(β,Γ) ≈ 〈Γ,A− 1pT 〉 − 12‖ΓD

1/2p ‖2F

118 / 44

Page 195: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Majorization

• Majorization (or MM) algorithms (De Leeuw & Heiser, 1977; Lange,2004) use in each iteration a majorizing function g(θ, θ0).

• Current estimate θ0 is called supporting point.• Requirements:

1 f (θ0) = g(θ0, θ0).2 f (θ) ≤ g(θ, θ0).

• Sandwich inequality : f (θ+) ≤ g(θ+, θ0) ≤ g(θ0, θ0) = f (θ0)with θ+ = argming(θ, θ0)

• Any majorization algorithm is guaranteed to descent.

119 / 44

Page 196: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Selecting λ

⇒ 2 steps : QUT to select the rank K - Shrinkage with CV

Rationale with Lasso:

• Lasso often used for screening (Buhlmann van de Geer, 2011)

• Selecting λ with CV or STEIN focuses on predictive properties• Optimal threshold for prediction 6= optimal for selecting var

⇒ Quantile Universal Threshold (Sardy, 2016) : select the thresholdat the bulk edge of what a threshold should be under the null.Guaranteed variable screening with high proba. Be careful, biaised!

120 / 44

Page 197: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Quantile Universal Threshold

Ex PCA. X = µ+ ε, with εij ∼ N(0, σ2)→ µK =∑K

k=1 uk dk v>k

Soft-threshold: argminµ ‖X − µ‖22 + λ‖µ‖∗ → dk max(1− λ

dk , 0)

⇒ Selecting λ to have good estimation of the rank

1 Generate data under the null hypothesis of no signal, µ = 02 Compute the first singular value d13 Repeat 1000 times 1 and 24 Use the (1− α)-quantile of the distribution of d1 as threshold

(Exact results Zanella 2009; Asymptotic results, random matrix theory Shabalin 2013,Paul 2007, Baik 2006...)

⇒ Suppose to know σ!

121 / 44

Page 198: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Selecting λ

⇒ 2 steps : QUT to select the rank K - Shrinkage with CV

Model: πijc = exp(βj (c)+∑K

k=1 dkuikvjk(c))∑Cjc′=1 exp(βj (c′)+

∑Kk=1 dkuikvjk(c′))

Lik:L(β,U,D,V) = −∑n

i=1∑m

j=1∑Cj

c=1 Ajic log(πijc) + λ

∑K?k=1 dk

1 Generate under the null of no interaction and take λ thequantile of the distribution of d1: good rank recovery

2 For a rank KQUT, estimate λ with Cross-Validation todetermine the amount of shrinkage (Lasso + LS)k-fold CV, λ with the best out-of-sample deviance is chosen.

122 / 44

Page 199: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulations• n: 50, 100, 300• m: 20, 100, 300 - 3 categories/variables• Interaction K : 2, 6• (d1/d2): 2, 1• the strength of the interaction (low, strong).

ui ∼ NK

(0,(d1 00 dK

))

vj(c) ∼ NK

(0,(d1 00 dK

))

θcij = −12‖ui − vj(c)‖2

P(xij = c) ∝ eθcij

123 / 44

Page 200: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulations resultsn p rank ratio strength model MCA

1 50 20 2 1 0.1 0.044 0.0352 50 20 2 1 1 0.020 0.0453 50 20 2 2 0.1 0.048 0.0364 50 20 2 2 1 0.0206 0.0425 50 20 6 1 0.1 0.111 0.0646 50 20 6 1 1 0.045 0.0267 50 20 6 2 0.1 0.115 (0.028) 0.0718 50 20 6 2 1 0.032 0.0519 300 100 2 1 0.1 0.005 0.006

10 300 100 2 1 1 0.004 0.04211 300 100 2 2 0.1 0.0047 0.00512 300 100 2 2 1 0.0037 (0.00369) 0.04013 300 300 2 1 0.1 0.003 0.00414 300 300 2 1 1 0.002 0.03915 300 300 2 2 0.1 0.003 0.00416 300 300 2 2 1 0.002 0.03917 300 100 6 1 0.1 0.019 0.01518 300 100 6 1 1 0.011 0.02319 300 100 6 2 0.1 0.018 (0.010) 0.01720 300 100 6 2 1 0.010 0.05621 300 300 6 1 0.1 0.011 0.00822 300 300 6 1 1 0.006 0.02223 300 300 6 2 0.1 0.009 0.01224 300 300 6 2 1 0.006 0.061

Table: Root mean square error (RMSE) between true probabilities andestimated ones by the multilogit model and MCA. In bold the smallestMSEs; penalized version in brackets.

Strength of the interaction is low (0.1), both methods agree: theparameters estimated by MCA and by the majorization algorithmare very correlated to the true parameters, the MSEs are of thesame order of magnitude. Thus, MCA is a straightforwardalternative, computationally fast and easy to run, to accuratelyestimate the parameters of the multilogit-bilinear model. Signal isstrong different patterns occur. HorseshoeMCA can provide better estimates than the model ones. Thisoccurs in what can be considered as difficult settings with small nand p and/or noisy data where the strength of the relationship isweak and/or the rank K is large (cases 1, 3, 5, 6, 7). In suchsituations, the majorization algorithm may encounter difficultiesand it is necessary to resort to regularization. If we use aregularized version with the amount of shrinkage determined bycross-validation groen , the error for case 7 is then equal to 0.028instead of 0.115 and improves on MCA. On the contrary, theimpact of regularization is less crucial in "easy" frameworks (case12).

124 / 44

Page 201: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulations resultsn p rank ratio strength model MCA

1 50 20 2 1 0.1 0.044 0.0352 50 20 2 1 1 0.020 0.0453 50 20 2 2 0.1 0.048 0.0364 50 20 2 2 1 0.0206 0.0425 50 20 6 1 0.1 0.111 0.0646 50 20 6 1 1 0.045 0.0267 50 20 6 2 0.1 0.115 (0.028) 0.0718 50 20 6 2 1 0.032 0.0519 300 100 2 1 0.1 0.005 0.006

10 300 100 2 1 1 0.004 0.04211 300 100 2 2 0.1 0.0047 0.00512 300 100 2 2 1 0.0037 (0.00369) 0.04013 300 300 2 1 0.1 0.003 0.00414 300 300 2 1 1 0.002 0.03915 300 300 2 2 0.1 0.003 0.00416 300 300 2 2 1 0.002 0.03917 300 100 6 1 0.1 0.019 0.01518 300 100 6 1 1 0.011 0.02319 300 100 6 2 0.1 0.018 (0.010) 0.01720 300 100 6 2 1 0.010 0.05621 300 300 6 1 0.1 0.011 0.00822 300 300 6 1 1 0.006 0.02223 300 300 6 2 0.1 0.009 0.01224 300 300 6 2 1 0.006 0.061

Table: Root mean square error (RMSE) between true probabilities andestimated ones by the multilogit model and MCA. In bold the smallestMSEs; penalized version in brackets.

Strength of the interaction is low (0.1), both methods agree: theparameters estimated by MCA and by the majorization algorithmare very correlated to the true parameters, the MSEs are of thesame order of magnitude. Thus, MCA is a straightforwardalternative, computationally fast and easy to run, to accuratelyestimate the parameters of the multilogit-bilinear model. Signal isstrong different patterns occur. HorseshoeMCA can provide better estimates than the model ones. Thisoccurs in what can be considered as difficult settings with small nand p and/or noisy data where the strength of the relationship isweak and/or the rank K is large (cases 1, 3, 5, 6, 7). In suchsituations, the majorization algorithm may encounter difficultiesand it is necessary to resort to regularization. If we use aregularized version with the amount of shrinkage determined bycross-validation groen , the error for case 7 is then equal to 0.028instead of 0.115 and improves on MCA. On the contrary, theimpact of regularization is less crucial in "easy" frameworks (case12).

124 / 44

Page 202: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulations resultsn p rank ratio strength model MCA

1 50 20 2 1 0.1 0.044 0.0352 50 20 2 1 1 0.020 0.0453 50 20 2 2 0.1 0.048 0.0364 50 20 2 2 1 0.0206 0.0425 50 20 6 1 0.1 0.111 0.0646 50 20 6 1 1 0.045 0.0267 50 20 6 2 0.1 0.115 (0.028) 0.0718 50 20 6 2 1 0.032 0.0519 300 100 2 1 0.1 0.005 0.006

10 300 100 2 1 1 0.004 0.04211 300 100 2 2 0.1 0.0047 0.00512 300 100 2 2 1 0.0037 (0.00369) 0.04013 300 300 2 1 0.1 0.003 0.00414 300 300 2 1 1 0.002 0.03915 300 300 2 2 0.1 0.003 0.00416 300 300 2 2 1 0.002 0.03917 300 100 6 1 0.1 0.019 0.01518 300 100 6 1 1 0.011 0.02319 300 100 6 2 0.1 0.018 (0.010) 0.01720 300 100 6 2 1 0.010 0.05621 300 300 6 1 0.1 0.011 0.00822 300 300 6 1 1 0.006 0.02223 300 300 6 2 0.1 0.009 0.01224 300 300 6 2 1 0.006 0.061

Table: Root mean square error (RMSE) between true probabilities andestimated ones by the multilogit model and MCA. In bold the smallestMSEs; penalized version in brackets.

Strength of the interaction is low (0.1), both methods agree: theparameters estimated by MCA and by the majorization algorithmare very correlated to the true parameters, the MSEs are of thesame order of magnitude. Thus, MCA is a straightforwardalternative, computationally fast and easy to run, to accuratelyestimate the parameters of the multilogit-bilinear model. Signal isstrong different patterns occur. HorseshoeMCA can provide better estimates than the model ones. Thisoccurs in what can be considered as difficult settings with small nand p and/or noisy data where the strength of the relationship isweak and/or the rank K is large (cases 1, 3, 5, 6, 7). In suchsituations, the majorization algorithm may encounter difficultiesand it is necessary to resort to regularization. If we use aregularized version with the amount of shrinkage determined bycross-validation groen , the error for case 7 is then equal to 0.028instead of 0.115 and improves on MCA. On the contrary, theimpact of regularization is less crucial in "easy" frameworks (case12).

124 / 44

Page 203: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulations resultsn p rank ratio strength model MCA

1 50 20 2 1 0.1 0.044 0.0352 50 20 2 1 1 0.020 0.0453 50 20 2 2 0.1 0.048 0.0364 50 20 2 2 1 0.0206 0.0425 50 20 6 1 0.1 0.111 0.0646 50 20 6 1 1 0.045 0.0267 50 20 6 2 0.1 0.115 (0.028) 0.0718 50 20 6 2 1 0.032 0.0519 300 100 2 1 0.1 0.005 0.006

10 300 100 2 1 1 0.004 0.04211 300 100 2 2 0.1 0.0047 0.00512 300 100 2 2 1 0.0037 (0.00369) 0.04013 300 300 2 1 0.1 0.003 0.00414 300 300 2 1 1 0.002 0.03915 300 300 2 2 0.1 0.003 0.00416 300 300 2 2 1 0.002 0.03917 300 100 6 1 0.1 0.019 0.01518 300 100 6 1 1 0.011 0.02319 300 100 6 2 0.1 0.018 (0.010) 0.01720 300 100 6 2 1 0.010 0.05621 300 300 6 1 0.1 0.011 0.00822 300 300 6 1 1 0.006 0.02223 300 300 6 2 0.1 0.009 0.01224 300 300 6 2 1 0.006 0.061

Table: Root mean square error (RMSE) between true probabilities andestimated ones by the multilogit model and MCA. In bold the smallestMSEs; penalized version in brackets.

Strength of the interaction is low (0.1), both methods agree: theparameters estimated by MCA and by the majorization algorithmare very correlated to the true parameters, the MSEs are of thesame order of magnitude. Thus, MCA is a straightforwardalternative, computationally fast and easy to run, to accuratelyestimate the parameters of the multilogit-bilinear model. Signal isstrong different patterns occur. HorseshoeMCA can provide better estimates than the model ones. Thisoccurs in what can be considered as difficult settings with small nand p and/or noisy data where the strength of the relationship isweak and/or the rank K is large (cases 1, 3, 5, 6, 7). In suchsituations, the majorization algorithm may encounter difficultiesand it is necessary to resort to regularization. If we use aregularized version with the amount of shrinkage determined bycross-validation groen , the error for case 7 is then equal to 0.028instead of 0.115 and improves on MCA. On the contrary, theimpact of regularization is less crucial in "easy" frameworks (case12).

124 / 44

Page 204: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulation

125 / 44

Page 205: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Simulation

125 / 44

Page 206: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Overfitting⇒ n = 50, p = 20, r = 6, strength = 0.1

0 5000 10000 15000

300

400

500

600

700

800

Index

essa

i$de

v

126 / 44

Page 207: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Overfitting⇒ n = 50, p = 20, r = 6, strength = 0.1

0 100 200 300 400 500 600

2000

021

000

2200

023

000

2400

025

000

2600

0

Index

essa

i$de

v

126 / 44

Page 208: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[,1]

X2[

,2]

−1 0 1 2 3

−4

−2

02

46

X2[,1]

X2[

,2]

♦ – regression – zonoid depth N – random forest127 / 44

Page 209: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

X2[,1]

X2[

,2]

♦ – regression – zonoid depth N – random forest128 / 44

Page 210: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

X2[,1]

X2[

,2]

♦ – regression – zonoid depth N – random forest129 / 44

Page 211: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

X2[,1]

X2[

,2]

♦ – regression – zonoid depth N – random forest130 / 44

Page 212: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

X2[,1]

X2[

,2]

♦ – regression – zonoid depth N – random forest131 / 44

Page 213: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

X2[,1]

X2[

,2]

♦ – regression – zonoid depth N – random forest132 / 44

Page 214: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

vol = 0.419775

X1[rowSums(is.na(X.miss1)) < 0.5, ][,1]

X1[

row

Sum

s(is

.na(

X.m

iss1

)) <

0.5

, ][,2

]

♦ – regression – zonoid depth N – random forest133 / 44

Page 215: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

vol = 0.419775

X1[rowSums(is.na(X.miss1)) < 0.5, ][,1]

X1[

row

Sum

s(is

.na(

X.m

iss1

)) <

0.5

, ][,2

]

♦ – regression – zonoid depth N – random forest134 / 44

Page 216: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

vol = 0.419775

X1[rowSums(is.na(X.miss1)) < 0.5, ][,1]

X1[

row

Sum

s(is

.na(

X.m

iss1

)) <

0.5

, ][,2

]

♦ – regression – zonoid depth N – random forest135 / 44

Page 217: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

vol = 0.42061

X1[rowSums(is.na(X.miss1)) < 0.5, ][,1]

X1[

row

Sum

s(is

.na(

X.m

iss1

)) <

0.5

, ][,2

]

♦ – regression – zonoid depth N – random forest136 / 44

Page 218: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Imputation close to data and extrapolation

150 observasions of X ∼ Normal((

11

),

(1 11 4

))Missing completely at random Missing (not) at random

−1 0 1 2 3

−4

−2

02

46

X2[rowSums(is.na(X2.miss)) < 0.5, ][,1]

X2[

row

Sum

s(is

.na(

X2.

mis

s))

< 0

.5, ]

[,2]

−1 0 1 2 3

−4

−2

02

46

vol = 0.42061

X1[rowSums(is.na(X.miss1)) < 0.5, ][,1]

X1[

row

Sum

s(is

.na(

X.m

iss1

)) <

0.5

, ][,2

]

♦ – regression – zonoid depth N – random forest137 / 44

Page 219: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Robust imputation

15% (left) and 100% (right) from Cauchy((

11

),

(1 11 4

))Robust to outliers (MCAR) Robust to disribution (MCAR)

−1 0 1 2 3 4

−4

−2

02

46

X3[rowSums(is.na(X3.miss)) < 0.5, ][,1]

X3[

row

Sum

s(is

.na(

X3.

mis

s))

< 0

.5, ]

[,2]

−20 −10 0 10 20

−40

−20

020

40

X4[rowSums(is.na(X4.miss)) < 0.5, ][,1]

X4[

row

Sum

s(is

.na(

X4.

mis

s))

< 0

.5, ]

[,2]

♦ – regression – Tukey depth138 / 44

Page 220: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Robust imputation

15% (left) and 100% (right) from Cauchy((

11

),

(1 11 4

))Robust to outliers (MCAR) Robust to disribution (MCAR)

−1 0 1 2 3 4

−4

−2

02

46

X3[rowSums(is.na(X3.miss)) < 0.5, ][,1]

X3[

row

Sum

s(is

.na(

X3.

mis

s))

< 0

.5, ]

[,2]

−20 −10 0 10 20

−40

−20

020

40

X4[rowSums(is.na(X4.miss)) < 0.5, ][,1]

X4[

row

Sum

s(is

.na(

X4.

mis

s))

< 0

.5, ]

[,2]

♦ – regression – Tukey depth139 / 44

Page 221: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Robust imputation

15% (left) and 100% (right) from Cauchy((

11

),

(1 11 4

))Robust to outliers (MCAR) Robust to disribution (MCAR)

−1 0 1 2 3 4

−4

−2

02

46

X3[rowSums(is.na(X3.miss)) < 0.5, ][,1]

X3[

row

Sum

s(is

.na(

X3.

mis

s))

< 0

.5, ]

[,2]

−20 −10 0 10 20

−40

−20

020

40

X4[rowSums(is.na(X4.miss)) < 0.5, ][,1]

X4[

row

Sum

s(is

.na(

X4.

mis

s))

< 0

.5, ]

[,2]

♦ – regression – Tukey depth140 / 44

Page 222: Contribution to missing values & principal ... - Julie jossejuliejosse.com/wp-content/uploads/2017/09/miss_josse_HDR_comp_… · •Josse&Sardy(2014).Adaptiveestimator.Finitesample

Low-rank matrix estimation PC imputation A Model for MCA

Robust imputation

15% (left) and 100% (right) from Cauchy((

11

),

(1 11 4

))Robust to outliers (MCAR) Robust to disribution (MCAR)

−1 0 1 2 3 4

−4

−2

02

46

X3[rowSums(is.na(X3.miss)) < 0.5, ][,1]

X3[

row

Sum

s(is

.na(

X3.

mis

s))

< 0

.5, ]

[,2]

−20 −10 0 10 20

−40

−20

020

40

X4[rowSums(is.na(X4.miss)) < 0.5, ][,1]

X4[

row

Sum

s(is

.na(

X4.

mis

s))

< 0

.5, ]

[,2]

♦ – regression – Tukey depth141 / 44