Robust estimation of a mean [6pt] in a multivariate ... · Robust estimation of a mean in a...

Robust estimation of a mean

in a multivariate Gaussian model: Part 1

Frejus, December 17, 2018

Arnak S. Dalalyan

ENSAE ParisTech / CREST

1. Various models of contamination

General notation

We first introduce the notation that are common to all the models ofcontamination considered in this talk.

Number of observations : n.

Dimension of the unknown parameter µ∗: p.

Observations (X1, . . . ,Xn) ∼ P n.

Number of outliers (possibly random): s ∈ {1, . . . , n}.

Set of outliers: S ⊂ {1, . . . , n}.

Proportion of outliers: ε = E[s/n] = E[|S|/n].

Setting (informal)

Among the n observations X1, . . . ,Xn, there is a small number s ofoutliers. If we remove the outliers, all the other Xi’s are iid drawn froma reference distribution Pµ∗ .

Dalalyan, A.S. Dec 17, 2018 3

Gaussian model with unknown mean

Assumption (model for inliers)

Throughout this presentation, we assume that the reference distributionPµ∗ is p-variate Gaussian Np(µ∗, Ip). The goal is to estimate theparameter µ∗ ∈ Rp.

0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.60.5

inliersoutlierscontour plot

n = 30, s = 5, µ∗ = [1, 1]>

Gaussian model with unknown mean

Assumption (model for inliers)

Throughout this presentation, we assume that the reference distributionPµ∗ is p-variate Gaussian Np(µ∗, Ip). The goal is to estimate theparameter µ∗ ∈ Rp.

0.7 0.8 0.9 1 1.1 1.2 1.3 1.4 1.5 1.60.5

observationscontour plot

n = 30, s = 5, µ∗ = [1, 1]>

Huber’s contamination

Assumption (HC model for outliers)

There are unobserved iid random variables Z1, . . . , Zn ∼ B(ε) and adistribution Q, such that

L (Xi|Zi = 0) = Np(µ∗, Ip), L (Xi|Zi = 1) = Q,

the observations Xi corresponding to different i’s are independent.This is equivalent to

P n ={

(1− ε)Np(µ∗, Ip) + εQ}⊗n

In this model,

S = {i : Zi = 1}︸︷︷︸set of outliers

and s ∼ B(n, ε)︸︷︷︸nb of outliers

are both random.

P n ={

(1− ε)Np(µ∗, Ip) + εQ}⊗n

In this model,

are both random.

i Zi Xi ∼

1 0 Np(µ∗, Ip)

2 0 Np(µ∗, Ip)

3 1 Q4 0 Np(µ

∗, Ip)5 1 Q6 0 Np(µ

∗, Ip)7 0 Np(µ

∗, Ip)

.30 0 Np(µ

∗, Ip)

s = 5 ∼B(30, 0.2)

P n ={

(1− ε)Np(µ∗, Ip) + εQ}⊗n

In this model,

are both random.

i Zi Xi ∼

1 0 Np(µ∗, Ip)

2 1 Q3 0 Np(µ

∗, Ip)4 0 Np(µ

∗, Ip)5 0 Np(µ

∗, Ip)6 0 Np(µ

∗, Ip)7 1 Q

.30 0 Np(µ

∗, Ip)

s = 6 ∼B(30, 0.2)

P n ={

(1− ε)Np(µ∗, Ip) + εQ}⊗n

In this model,

are both random.

We write

Pn ∈MHCn (p, ε,µ∗).

for the model of Huber’scontamination.

Huber’s deterministic contamination

Assumption (HDC model for outliers)

There is a set S ⊂ {1, . . . , n} of cardinality s = [nε] and a distributionQ, such that

{Xi : i ∈ Sc} iid∼ Np(µ∗, Ip) ⊥⊥ {Xi : i ∈ S} iid∼ Q.

Similar to HC: the outliers are iid.

Different from HC: the set of outliers is determenistic.

Remark The number of outliers s should be smaller than n/2,otherwise Q would be the reference distribution and Np(µ∗, Ip) thecontamination.

We write Pn ∈MHDCn (p, ε,µ∗).

Parameter contamination

Assumption (PC model for outliers)

There is a set S ⊂ {1, . . . , n} of cardinality s = [nε] and a collection ofvectors {µi : i ∈ S}, such that

{Xi : i ∈ Sc} iid∼ Np(µ∗, Ip) ⊥⊥ {Xi : i ∈ S} ∼⊗i∈SNp(µi, Ip).

Similar to HC & HDC: the outliers are independent.

Different from HC & HDC: the outliers might have differentdistributions.

We write Pn ∈MPCn (p, ε,µ∗).

Adversarial contamination

Assumption (AC model for outliers)

For a sequence Y iiid∼ Np(µ∗, Ip), i = 1, . . . , n, and a random set

S ⊂ {1, . . . , n} of cardinality s = [nε] we have

Xi = Y i, ∀i ∈ Sc.

The set S is not independent of {Y i : i = 1, . . . , n}.

The observations {Xi : i ∈ S} may have arbitrary dependencestructure.

We write Pn ∈MACn (p, ε,µ∗).

Relation between the models

MHCn (p, ε,µ∗)

MHDCn (p, 2ε,µ∗)

MPCn (p, 2ε,µ∗)

MACn (p, 2ε,µ∗)

2. Problem formulation and overview of results

Historical approachBreakdown point

Assume the unknown parameter µ∗ is in Rp.

Let µ̂ be an estimator of µ∗. Thus,

µ̂ :

∞⋃n=1

Xn → Rp.

The breakdown point ε∗n of µ̂ is defined by

ε∗n =1

{s ∈ {1, . . . , n} : sup

y1,...,ys

‖µ̂(x1:(n−s),y1:s)‖ = +∞}.

Drawbacks:

does not take into account the impact of “mild” outliers,meaningless if the parameter space is bounded,does not depend on the norm under consideration,...

Minimax approachIn expectation

A more informative way of quantifying the robustness is theevaluation of the worst-case risk and its comparison to theminimax risk.

Worst-case risk of an estimator µ̂n:

R?n,p,ε(µ̂n) = sup

µ∗sup

Pn∈M?n(p,ε,µ∗)

EX∼Pn [‖µ̂n(X)− µ∗‖22].

Here,M?n(p, ε,µ∗) is one of the 4 models of contamination

considered in previous slides.For instance, RHC

n,p,ε(µ̂n) is the minimax risk for Huber’scontamination model.

Minimax risk:R?

n,p,ε = infµ̂n

R?n,p,ε(µ̂n).

RHCn,p,ε(µ̂n) = sup

µ∗sup

Pn∈MHCn (p,ε,µ∗)

EX∼Pn[‖µ̂n(X)− µ∗‖22].

Minimax risk:RHC

n,p,ε = infµ̂n

RHCn,p,ε(µ̂n).

RHDCn,p,ε(µ̂n) = sup

µ∗sup

Pn∈MHDCn (p,ε,µ∗)

EX∼Pn[‖µ̂n(X)− µ∗‖22].

Minimax risk:RHDC

n,p,ε = infµ̂n

RHDCn,p,ε(µ̂n).

RPCn,p,ε(µ̂n) = sup

µ∗sup

Pn∈MPCn (p,ε,µ∗)

EX∼Pn[‖µ̂n(X)− µ∗‖22].

Minimax risk:RPC

n,p,ε = infµ̂n

RPCn,p,ε(µ̂n).

RACn,p,ε(µ̂n) = sup

µ∗sup

Pn∈MACn (p,ε,µ∗)

EX∼Pn[‖µ̂n(X)− µ∗‖22].

Minimax risk:RAC

n,p,ε = infµ̂n

RACn,p,ε(µ̂n).

Minimax approachIn deviation

Most results in the literature provide bounds on the deviation, notfor the expectation.

Fix a confidence level δ ∈ (0, 1).

Worst-case deviation of an estimator µ̂n: r?n,p,ε(µ̂n) is solution to

minimize r

subject to PX∼Pn

(‖µ̂n(X)− µ∗‖22 > r

)≤ δ

∀µ∗ ∈ Rp, ∀P n ∈M?n(p, ε,µ∗).

Clearly, r?n,p,ε(µ̂n) depends on δ, but we will not be interested inthis dependence.

Minimax risk:r?n,p,ε = inf

r?n,p,ε(µ̂n).

Tchebychev’s inequality yields δr?n,p,ε(µ̂n) ≤ R?n,p,ε(µ̂n).

Common robust estimators of the mean

The most common robust estimators of the mean are perhaps thecoordinatewise median, the geometric median and the Huber’sestimator.

All these estimators can be defined as an M -estimator:

µ̂n ∈ arg minµ∈Rp

n∑i=1

Ψ(Xi − µ)

Ψ(x) =

‖x‖1, coordinatewise median,‖x‖2, geometric median,‖x‖22

2 ∧ λ(‖x‖2 − 0.5λ), Huber’s estimator.

In all the three cases, the function Ψ is convex and the estimatoris computable in polynomial time.

Overview of the resultsMinimax rates in deviation

Overview of the resultsTractable estimators

We will present three tractable estimators that improve on thecoordinatewise median.

1 The ellipsoid method (Diakonikolas et al., 2016).

2 The spectral method (Lai et al., 2016).

3 The iterative soft thresholding (Collier and Dalalyan, 2017).

3. The minimax rate

Minimax lower bound

Theorem 1 (Chen et al., 2015)

There is a constant c > 0 such that for every ε ∈ [0, 1] and everyδ ∈ (0, 1/2), it holds that

rHCn,p,ε ≥ c

n+ ε2

Some remarks

By Tchebychev’s inequality, pn + ε2 is also a lower bound for the

minimax risk in expectation.

By inclusion, pn + ε2 is also a lower bound for the minimax risk in

models HDC and AC.

The same lower bound pn + ε2 holds true for the model PC.

Proof of the lower bound 1

1 From the classic parametric minimax theory: rHCn,p,ε &

2 Thus, we need only to show that rHCn,p,ε & ε

3 Main steps of the proof:

Reduction to dimension 1: rHCn,p,ε ≥ rHC

n,1,ε.Construct a probability density function fε such that

f⊗nε ∈MHCn (1, ε, 0)

f⊗nε ∈MHCn (1, ε,∆ε)

with ∆ε � ε.

Parameter values µ∗ = 0 and µ∗ = ∆ε are indistinguishablefrom the observations X1, . . . ,Xn ∼ f⊗nε .

Therefore rHCn,p,ε & ‖∆ε − 0‖22 � ε2.

For a ∆ > 0, define f◦∆ = ϕ0 ∨ ϕ∆.

We have S∆ =∫f◦∆(x) dx = 1 + a∆ +O(∆2).

For a ∆ > 0, define f◦∆ = ϕ0 ∨ ϕ∆.

We have S∆ =∫f◦∆(x) dx = 1 + a∆ +O(∆2).

Then, f∆ = f◦∆/S∆ is a pdf.

For a ∆ > 0, define f◦∆ = ϕ0 ∨ ϕ∆.

We have S∆ =∫f◦∆(x) dx = 1 + a∆ +O(∆2).

Then, f∆ = f◦∆/S∆ is a pdf.

We choose ∆ε so that 1/S∆ε= 1− ε

and set fε = f∆ε .

fε = (1− ε)(ϕ0 ∨ ϕ∆ε)

(1− ε)ϕ0

fε =(1− ε)ϕ0 + εq

(1− ε)ϕ∆ε+ εq′

fε = (1− ε)(ϕ0 ∨ ϕ∆ε)

(1− ε)ϕ0

fε =(1− ε)ϕ0 + εq

(1− ε)ϕ∆ε+ εq′

fε = (1− ε)(ϕ0 ∨ ϕ∆ε)

(1− ε)ϕ0

fε =(1− ε)ϕ0 + εq

(1− ε)ϕ∆ε+ εq′

fε = (1− ε)(ϕ0 ∨ ϕ∆ε)

(1− ε)ϕ0

fε =(1− ε)ϕ0 + εq

(1− ε)ϕ∆ε+ εq′

Minimax upper bound

Theorem 2 (Chen et al., 2015)There are two constants C1, C2 > 0 such that

for every ε ≤ 1/5for every p ≤ C1nfor every δ ≥ e−C1n,

it holds that

rHCn,p,ε ≤ C2

n+ ε2 +

log 1/δ

Some remarks:

The upper bound is attained by Tukey’s median.

The condition ε ≤ 1/5 can be replaced by ε ≤ 1/3− c′, with anarbitrarily small c′ > 0.

The estimator does not rely on the knowledge of ε.

Tukey’s median

The upper bound is attained byTukey’s median.

Tukey’s median is any maximaizer ofTukey’s depth:

µ̂TMn ∈ arg max

µ∈Rp

D(µ, {X1:n}).

Tukey’s (halfspace) depth is

D(µ,X1:n) = minu∈S1

n∑i=1

1(u>Xi ≤ u>µ).

µ̂TMn is computationally intractable for

large p.

Tukey’s median

µ̂TMn ∈ arg max

µ∈Rp

D(µ, {X1:n}).

n∑i=1

1(u>Xi ≤ u>µ).

large p.

depth of µ in X1:10?

Tukey’s median

µ̂TMn ∈ arg max

µ∈Rp

D(µ, {X1:n}).

n∑i=1

1(u>Xi ≤ u>µ).

large p.

depth of µ in X1:10?

Tukey’s median

µ̂TMn ∈ arg max

µ∈Rp

D(µ, {X1:n}).

n∑i=1

1(u>Xi ≤ u>µ).

large p.

depth of µ in theforest ?

Tukey’s median

µ̂TMn ∈ arg max

µ∈Rp

D(µ, {X1:n}).

n∑i=1

1(u>Xi ≤ u>µ).

large p.

D(µ,X1:n) = 1.

Tukey’s median

µ̂TMn ∈ arg max

µ∈Rp

D(µ, {X1:n}).

n∑i=1

1(u>Xi ≤ u>µ).

large p.

depth of µ in theforest ?

Tukey’s median

µ̂TMn ∈ arg max

µ∈Rp

D(µ, {X1:n}).

n∑i=1

1(u>Xi ≤ u>µ).

large p.

D(µ,X1:n) = 3.

Summary

We introduced four models of contamination by outliers:

Huber’s contaminationMHCn (p, ε,µ∗).

Huber’s deterministic contaminationMHDCn (p, ε,µ∗).

Parameter contaminationMPCn (p, ε,µ∗).

Adversarial contaminationMACn (p, ε,µ∗).

We have defined the worst case risks in expectation and indeviation, R?

n,p,ε(µ̂) and r?n,p,ε(µ̂).

We have defined the minimax risks R?n,p,ε = infµ̂R

?n,p,ε(µ̂) .

For every ε < 1/3−�, we have r?n,p,ε �pn + ε2.

This minimax rate is obtained by Tukey’s median, which is hard tocompute for large p.

QuestionWhat is the smallest rate of the worst-case risk that can be obtained byan estimator computable in poly(n, p, 1/ε) time?

Summary

We have defined the minimax risks r?n,p,ε = infµ̂ r?n,p,ε(µ̂) .

Summary

References I

M. Chen, C. Gao, and Z. Ren. Robust Covariance and Scatter MatrixEstimation under Huber’s Contamination Model. ArXiv e-prints, to appear inthe Annals of Statistics, 2015.

Olivier Collier and Arnak S. Dalalyan. Minimax estimation of amultidimensional linear functional in sparse gaussian models and robustestimation of the mean. submitted 1712.05495, arXiv, December 2017.URL https://arxiv.org/abs/1712.05495.

Ilias Diakonikolas, Gautam Kamath, Daniel M. Kane, Jerry Li, Ankur Moitra,and Alistair Stewart. Robust estimators in high dimensions without thecomputational intractability. In IEEE 57th Annual Symposium onFoundations of Computer Science, FOCS 2016, USA, pages 655–664,2016.

K. A. Lai, A. B. Rao, and S. Vempala. Agnostic estimation of mean andcovariance. In 2016 IEEE 57th Annual Symposium on Foundations ofComputer Science (FOCS), pages 665–674, Oct 2016.

Robust estimation of a mean [6pt] in a multivariate ... · Robust estimation of a mean in a...

Documents

Robust Estimation for Multivariate Wrapped ModelsRobust Estimation for Multivariate Wrapped Models Giovanni Saraceno 1, Claudio Agostinelli 1, and Luca Greco 2 1 Department of Mathematics,

Robust tests for multivariate factorial designs under ...The multivariate analysis of variance (MANOVA) is a natural generalization of the univariate analysis of variance (ANOVA) to

Robust Location and Scatter Estimators for Multivariate ... · PDF fileRobust Location and Scatter Estimation 15.06.2006 1 Robust Location and Scatter Estimators for Multivariate Data

Robust Statistics Part 1: Introduction and univariate dataRobustness for univariate data Multivariate location and scatter Linear regression Principal component analysis Advanced topics

Multivariate Mx Exercise D Posthuma Files: \\danielle\Multivariate

Diagnostic Plots for Robust Multivariate Methods · example, for principal component analysis (Croux and Haesbroeck 2000), canonical corre-lation analysis (Croux and Dehon 2001),

A Multivariate Robust Control Chart for Individual ... · A Multivariate Robust Control Chart for Individual Observations SHOJA’EDDIN CHENOURI and STEFAN H. STEINER University of

The Multivariate Normal Distribution Edps/Soc 584 and ...courses.education.illinois.edu/EdPsy584/lectures/Multivariate... · Multivariate Normal Distribution Slide 1 of 54 The Multivariate

Modelling Data Dispersion Degree in Automatic Robust Estimation for Multivariate Gaussian Mixture Models with an Application to Noisy Speech Processing

Multivariate Robust clustering via mixture models Robust clustering via mixture models INRIA-LJK équipe MISTIS (Grenoble) Florence Forbes , Darren Wraith, SenanDoyle, Eric Frichot

Introduction to Multivariate Analysis and Multivariate Distances

Robust MEWMA-type Control Charts for Monitoring the ... · Robust MEWMA-type Control Charts for Monitoring the Covariance Matrix of Multivariate Processes Pei Xiao Abstract In multivariate

Multivariate Robust Estimation of DCC-GARCH

Exercices Multivariate Data Analysis. Topic 1 Multivariate Data Analysis Topic 1 Theory: Multivariate Data Analysis Introduction to Multivariate Data

Gaussian Scale Mixture Models For Robust Linear Multivariate Regression With Missing … · GAUSSIAN SCALE MIXTURE MODELS FOR ROBUST LINEAR MULTIVARIATE REGRESSION WITH MISSING DATA

Fuzzy c- means cluster analysis, a robust multivariate

STATISTIQUE - certis.enpc.frcertis.enpc.fr/~dalalyan/Links/Stat-DESS.pdf · MASTER 2 : MATHEMATIQUES POUR L´ ’ENTREPRISE Arnak S. DALALYAN. Table des matieres` ... x1,. . ., xn

Info INS - insse.ro · includ analiza datelor multivariate, statisticile computaționale, robust statistics, statisticile oﬁ ciale și sis-temele informatice statistice. Valen-tin

Robust M-estimators of multivariate location and scatter

Multivariate L1 Statistical Methods: The Package MNM1. Introduction Classical multivariate statistical inference methods (Hotelling’s T2, multivariate analysis of variance, multivariate