Download pdf - Robust Statistics, Revisited - People | MIT CSAILpeople.csail.mit.edu/moitra/docs/robust2.pdf · 2017. 3. 10. · Robust estimation is high-dimensions is algorithmically possible!

RobustStatistics,Revisited

AnkurMoitra(MIT)

jointworkwithIlias Diakonikolas,JerryLi,Gautam Kamath,DanielKaneandAlistairStewart

CLASSICPARAMETERESTIMATIONGivensamplesfromanunknowndistributioninsomeclass

e.g.a1-DGaussian

canweaccuratelyestimateitsparameters?


e.g.a1-DGaussian

canweaccuratelyestimateitsparameters? Yes!


e.g.a1-DGaussian


empiricalmean: empiricalvariance:

Yes!

Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)

R.A.Fisher

Themaximumlikelihoodestimatorisasymptoticallyefficient(1910-1920)

R.A.Fisher J.W.Tukey

Whatabouterrors inthemodelitself?(1960)

ROBUSTSTATISTICS

Whatestimatorsbehavewellinaneighborhood aroundthe model?

ROBUSTSTATISTICS

Whatestimatorsbehavewellinaneighborhood aroundthe model?

Let’sstudyasimpleone-dimensionalexample….

ROBUSTPARAMETERESTIMATIONGivencorrupted samplesfroma1-DGaussian:


=+idealmodel noise observedmodel

Howdoweconstrainthenoise?


Equivalently:

L1-normofnoiseatmostO(ε)


Equivalently:

L1-normofnoiseatmostO(ε) ArbitrarilycorruptO(ε)-fractionofsamples(inexpectation)


Equivalently:

ThisgeneralizesHuber’sContaminationModel:Anadversarycanadd anε-fractionofsamples



Equivalently:

ThisgeneralizesHuber’sContaminationModel:Anadversarycanadd anε-fractionofsamples


Outliers:Pointsadversaryhascorrupted,Inliers:Pointshehasn’t

Inwhatnormdowewanttheparameterstobeclose?


Definition:Thetotalvariationdistancebetweentwodistributionswithpdfs f(x)andg(x)is


FromtheboundontheL1-normofthenoise,wehave:

observedideal




estimate ideal

Goal:Finda1-DGaussianthatsatisfies


estimate observed


Equivalently,finda1-DGaussianthatsatisfies

Dotheempiricalmeanandempiricalvariancework?


No!


No!



No!


Asinglecorruptedsamplecanarbitrarilycorrupttheestimates


No!



Butthemedian andmedianabsolutedeviationdowork


No!



Butthemedian andmedianabsolutedeviationdowork

Fact[Folklore]:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoa1-DGaussian

themedianandMADrecoverestimatesthatsatisfy

where



where

Alsocalled(properly)agnosticallylearninga1-DGaussian



where

Whataboutrobustestimationinhigh-dimensions?

Whataboutrobustestimationinhigh-dimensions?

e.g.microarrayswith10kgenes



where

PartI:Introduction

� RobustEstimationinOne-dimension� Robustnessvs.HardnessinHigh-dimensions

� OurResults

PartII:AgnosticallyLearningaGaussian

� ParameterDistance� DetectingWhenanEstimatorisCompromised

� FilteringandConvexProgramming� UnknownCovariance

OUTLINE

PartIII:ExperimentsandExtensions

PartI:Introduction


� OurResults




OUTLINE


MainProblem:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

giveanefficientalgorithmtofindparametersthatsatisfy

MainProblem:Givensamplesfromadistributionthatareε-closeintotalvariationdistancetoad-dimensionalGaussian

giveanefficientalgorithmtofindparametersthatsatisfy

SpecialCases:

(1)Unknownmean

(2)Unknowncovariance

ACOMPENDIUMOFAPPROACHES

ErrorGuarantee

RunningTime

UnknownMean


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard

GeometricMedian poly(d,N)O(ε√d)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard


Tournament O(ε) NO(d)


ErrorGuarantee

RunningTime

TukeyMedian

UnknownMean

O(ε) NP-Hard



O(ε√d)Pruning O(dN)


ErrorGuarantee

RunningTime

TukeyMedian O(ε) NP-Hard

GeometricMedian O(ε√d) poly(d,N)


O(ε√d)Pruning O(dN)

UnknownMean

…

ThePriceofRobustness?

Allknownestimatorsarehardtocomputeorlosepolynomial factorsinthedimension



Equivalently:Computationallyefficientestimatorscanonlyhandle

fractionoferrorsandgetnon-trivial(TV<1)guarantees









Isrobustestimationalgorithmicallypossibleinhigh-dimensions?

PartI:Introduction


� OurResults




OUTLINE


PartI:Introduction


� OurResults




OUTLINE


OURRESULTS

Theorem[Diakonikolas,Li,Kamath,Kane,Moitra,Stewart‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy

Robustestimationishigh-dimensionsisalgorithmicallypossible!

Moreoverthealgorithmrunsintimepoly(N,d)

OURRESULTS

Theorem[Diakonikolas,Li,Kamath,Kane,Moitra,Stewart‘16]:Thereisanalgorithmwhengivensamplesfromadistributionthatisε-closeintotalvariationdistancetoad-dimensionalGaussianfindsparametersthatsatisfy

Robustestimationishigh-dimensionsisalgorithmicallypossible!

Moreoverthealgorithmrunsintimepoly(N,d)

Alternatively:CanapproximatetheTukeymedian,etc,ininterestingsemi-randommodels

Simultaneously[Lai,Rao,Vempala ‘16]gaveagnosticalgorithmsthatachieve:

andworkfornon-Gaussiandistributionstoo

Simultaneously[Lai,Rao,Vempala ‘16]gaveagnosticalgorithmsthatachieve:

andworkfornon-Gaussiandistributionstoo

Manyotherapplicationsacrossbothpapers:productdistributions,mixturesofsphericalGaussians,SVD,ICA

AGENERALRECIPE

Robustestimationinhigh-dimensions:

� Step#1:Findanappropriateparameterdistance

� Step#2:Detectwhenthenaïveestimatorhasbeencompromised

� Step#3:Findgoodparameters,ormakeprogressFiltering:FastandpracticalConvexProgramming:Bettersamplecomplexity

AGENERALRECIPE





Let’sseehowthisworksforunknownmean…

PartI:Introduction


� OurResults




OUTLINE


PartI:Introduction


� OurResults




OUTLINE


PARAMETERDISTANCE

Step#1:FindanappropriateparameterdistanceforGaussians

PARAMETERDISTANCE


ABasicFact:

(1)

PARAMETERDISTANCE


ABasicFact:

(1)

ThiscanbeprovenusingPinsker’s Inequality

andthewell-knownformulaforKL-divergencebetweenGaussians

PARAMETERDISTANCE


ABasicFact:

(1)

PARAMETERDISTANCE


ABasicFact:

(1)

Corollary:Ifourestimate(intheunknownmeancase)satisfies

then

PARAMETERDISTANCE


ABasicFact:

(1)

Corollary:Ifourestimate(intheunknownmeancase)satisfies

then

OurnewgoalistobecloseinEuclideandistance

PartI:Introduction


� OurResults




OUTLINE


PartI:Introduction


� OurResults




OUTLINE


DETECTINGCORRUPTIONS

Step#2:Detectwhenthenaïveestimatorhasbeencompromised



=uncorrupted=corrupted



=uncorrupted=corrupted

Thereisadirectionoflarge(>1)variance

KeyLemma:IfX1,X2,…XN comefromadistributionthatisε-closetoandthenfor

(1) (2)

withprobabilityatleast1-δ

KeyLemma:IfX1,X2,…XN comefromadistributionthatisε-closetoandthenfor

(1) (2)

withprobabilityatleast1-δ

Take-away:Anadversaryneedstomessupthesecondmomentinordertocorruptthefirstmoment

PartI:Introduction


� OurResults




OUTLINE


PartI:Introduction


� OurResults




OUTLINE


OURALGORITHM(S)

Step#3:Eitherfindgoodparameters,orremovemanyoutliers

OURALGORITHM(S)


FilteringApproach:Supposethat:

OURALGORITHM(S)



Wecanthrowoutmorecorruptedthanuncorruptedpoints:

v

wherevisthedirectionoflargestvariance

OURALGORITHM(S)




v

wherevisthedirectionoflargestvariance,andThasaformula

OURALGORITHM(S)




v

T

wherevisthedirectionoflargestvariance,andThasaformula

OURALGORITHM(S)



Wecanthrowoutmorecorruptedthanuncorruptedpoints

OURALGORITHM(S)




Ifwecontinuetoolong,we’dhavenocorruptedpointsleft!

OURALGORITHM(S)





Eventuallywefind(certifiably)goodparameters

OURALGORITHM(S)






RunningTime: SampleComplexity:

OURALGORITHM(S)






RunningTime: SampleComplexity:ConcentrationofLTFs

PartI:Introduction


� OurResults




OUTLINE


PartI:Introduction


� OurResults




OUTLINE


AGENERALRECIPE





AGENERALRECIPE





Howaboutforunknowncovariance?

PARAMETERDISTANCE


PARAMETERDISTANCE


AnotherBasicFact:

(2)

PARAMETERDISTANCE


AnotherBasicFact:

Again,provenusingPinsker’s Inequality

(2)

PARAMETERDISTANCE


AnotherBasicFact:


(2)

Ournewgoalistofindanestimatethatsatisfies:

PARAMETERDISTANCE


AnotherBasicFact:


(2)

Ournewgoalistofindanestimatethatsatisfies:

Distanceseemsstrange,butit’stherightonetousetoboundTV

UNKNOWNCOVARIANCE

Whatifwearegivensamplesfrom?

UNKNOWNCOVARIANCE


Howdowedetectifthenaïveestimatoriscompromised?

UNKNOWNCOVARIANCE



KeyFact:Let and

Thenrestrictedtoflattenings ofdxdsymmetricmatrices

UNKNOWNCOVARIANCE



KeyFact:Let and


ProofusesIsserlis’s Theorem

UNKNOWNCOVARIANCE

needtoprojectout



KeyFact:Let and


KeyIdea: Transformthedata,lookforrestrictedlargeeigenvalues



Ifwerethetruecovariance,wewouldhaveforinliers


Ifwerethetruecovariance,wewouldhaveforinliers,inwhichcase:

wouldhavesmallrestrictedeigenvalues


Ifwerethetruecovariance,wewouldhaveforinliers,inwhichcase:

wouldhavesmallrestrictedeigenvalues

Take-away:Anadversaryneedstomessupthe(restricted)fourthmomentinordertocorruptthesecondmoment

ASSEMBLINGTHEALGORITHM

Givensamplesthatareε-closeintotalvariationdistancetoad-dimensionalGaussian



Step#1:Doublingtrick




Nowusealgorithmforunknowncovariance





Step#2:(Agnostic)isotropicposition






rightdistance,ingeneralcase






Nowusealgorithmforunknownmeanrightdistance,ingeneralcase

PartI:Introduction


� OurResults




OUTLINE


PartI:Introduction


� OurResults




OUTLINE


FURTHERRESULTS

Userestrictedeigenvalueproblemstodetectoutliers

FURTHERRESULTS


BinaryProductDistributions:

FURTHERRESULTS



MixturesofTwoc-BalancedBinaryProductDistributions:

FURTHERRESULTS



MixturesofTwoc-BalancedBinaryProductDistributions:

MixturesofkSphericalGaussians:

SYNTHETICEXPERIMENTS

Errorratesonsyntheticdata(unknownmean):

+10%noise


Errorratesonsyntheticdata(unknownmean):

100 200 300 400

0

0.5

1

1.5

dimension

excess` 2

error

Filtering

LRVMean

Sample mean w/ noise

Pruning

RANSAC Geometric Median

100 200 300 400

0.04

0.06

0.08

0.1

0.12

0.14

dimension

excess` 2

error


Errorratesonsyntheticdata(unknowncovariance,isotropic):

+10%noise

closetoidentity


20 40 60 80 100

0

0.5

1

1.5

dimension

excess` 2

error

Filtering

LRVCov

Sample covariance w/ noise

Pruning

RANSAC

20 40 60 80 100

0

0.1

0.2

0.3

0.4

dimension

excess` 2

error

Errorratesonsyntheticdata(unknowncovariance,isotropic):


Errorratesonsyntheticdata(unknowncovariance,anisotropic):

+10%noise

farfromidentity


20 40 60 80 100

0

50

100

150

200

dimension

excess` 2

error

Filtering

LRVCov

Sample covariance w/ noise

Pruning

RANSAC

20 40 60 80 100

0

0.5

1

dimension

excess` 2

error

Errorratesonsyntheticdata(unknowncovariance,anisotropic):

REALDATAEXPERIMENTS

Famousstudyof[Novembre etal.‘08]:TaketoptwosingularvectorsofpeoplexSNPmatrix(POPRES)

REALDATAEXPERIMENTS


-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

REALDATAEXPERIMENTS


-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

REALDATAEXPERIMENTS


-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

“GenesMirrorGeographyinEurope”

REALDATAEXPERIMENTS

Canwefindsuchpatternsinthepresenceofnoise?

REALDATAEXPERIMENTS


-0.2 -0.1 0 0.1 0.2 0.3-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2Pruning Projection

10%noise

WhatPCAfinds

REALDATAEXPERIMENTS


-0.2 -0.1 0 0.1 0.2 0.3-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2Pruning Projection

10%noise

WhatPCAfinds

-0.2 -0.1 0 0.1 0.2 0.3

-0.15

-0.1

-0.05

0

0.05

0.1

0.15

0.2RANSAC Projection

REALDATAEXPERIMENTS


10%noise

WhatRANSACfinds

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

XCS Projection

REALDATAEXPERIMENTS


10%noise

WhatrobustPCA(viaSDPs)finds

-0.2

-0.1

0

0.1

0.2

0.3

-0.15-0.1-0.0500.050.10.150.2

Filter Projection

REALDATAEXPERIMENTS


10%noise

Whatourmethodsfind

-0.2

-0.1

0

0.1

0.2

0.3

-0.15-0.1-0.0500.050.10.150.2

Filter Projection

-0.2

-0.1

0

0.1

0.2

0.3

-0.15 -0.1 -0.05 0 0.05 0.1 0.15 0.2

Original Data

REALDATAEXPERIMENTS

10%noise

Whatourmethodsfind

nonoise

Thepowerofprovablyrobustestimation:

LOOKINGFORWARD

CanalgorithmsforagnosticallylearningaGaussianhelpinexploratorydataanalysisinhigh-dimensions?

LOOKINGFORWARD

CanalgorithmsforagnosticallylearningaGaussianhelpinexploratorydataanalysisinhigh-dimensions?

Isn’tthiswhatwewouldhavebeendoingwithrobuststatisticalestimators,ifwehadthemallalong?

Thanks!AnyQuestions?

Summary:� Nearlyoptimalalgorithmforagnosticallylearningahigh-dimensionalGaussian

� Generalrecipeusingrestrictedeigenvalueproblems� Furtherapplicationstoothermixturemodels� Ispractical,robuststatisticswithinreach?