High-dimensional Error Analysis of Regularized M-Estimators Ehsan AbbasiChristos ThrampoulidisBabak Hassibi Allerton Conference Wednesday September 30,

1

High-dimensional Error Analysis of Regularized M-Estimators

Ehsan AbbasiChristos Thrampoulidis Babak Hassibi

Allerton ConferenceWednesday September 30, 2015

2

Linear Regression ModelEstimate unknown signal from noisy linear measurements:

measurement/design matrix

unknown signal

noise vector

3

M-estimatorsFor some convex loss function solve:

• Maximum Likelihood (ML) estimators

?

• least-squares, least-absolute deviationsHuber-loss, etc…

Fisher information, consistency, asymptotic normality,Cramer-Rao bound, ML, robust statistics, Huber loss, optimal loss …

4

Why revisit & what changes?

• Modern: n is increasingly large machine learning, image processing, sensor/social networks, DNA microarrays, ...

• Structured signals: sparse, low-rank, block-sparse, low-varying …

Regularized M-estimators

• Compressive sensing:

• Traditional: but the ambient dimension n is fixed

• Regularizer is structure inducing, convex, typically non-smoothL1 , nuclear, L1/L2 norms, total variation …atomic norms

atomic norms

5

Classical question - Modern regime: New results & phenomena

• High-dimensional Proportional regime

?

• Question goes back to 50’s (Huber, Kolmogorov…)• Only very recent advances, special instances, strict assumptions• No general theory!

has entries iid GaussianAssumption:

• benchmark in CS/statistics theory• universality

6

Contribution

• at a rate Assume

• has entries iid Gaussian

• mild regularity conditions on , pz, f, and px0

Then, with probability one,

where is the unique solution to a system of four nonlinear equationsin four unknowns :

7

The Equations

Let’s parse them,to get some insight …

8

The Explicit ones

and appear in the equations explicitly.

9

The Loss and the Regularizer

The loss function and the regularizer appear through their Moureau envelope approximations.

In the traditional regime instead of the Moureau envelopes the functions themselves appear

10

The Distributions

The convolution of the pdf of the noise with a gaussian is a completely new phenomenon compared to the traditional regime

11

The Expected Moureau Envelope• The role of and is summarized in

• how they affect error performance of the M-estimator • (strictly) convex and continuously differentiable

even if is non-differentiable!

• generalizes the “Gaussian width” or “Gaussian distance squared” or “statistical dimension”.

• same for and

12

Reminder: Moureau EnvelopesMoureau-Yoshida envelope of evaluated at with parameter :

• always underestimates f at x. The smaller the τ the closer to f

• smooth approximation always continuously differentiable in both x and τ

( even if f is non-differentiable )• jointly convex in x and τ

• optimal v is unique (proximal operator)

• everything extends to vector-valued function f

13

Examples

14

Set Indicator Function

Gaussian width

15

Summarizing Key Features

• Squared error of general Regularized M-estimators• Minimal and generic regularity assumptions

– non-smooth, heavy-tails, non-separable, …• Key role of Expected Moureau envelopes

– strictly convex and smooth– generalize known geometric summary parameters

• Observation: fast solution by simple iterative scheme!

16

Simulations

Optimal tuning?

17

Non-smooth losses

18

Non-smooth losses

Optimal loss?

19

Non-smooth losses

Consistent Estimators?

20

Heavy-tailed noise• Huber loss function + noise iid Cauchy Robustness?

21

Non-separable loss

Square-root LASSO

22

Beyond Gaussian Designs

• analysis framework directly applies to elliptically distributed• For the LASSO we have extended ideas to IRO matrices

• Universality over iid entries (Empirical observation) modifiedequations

23

Convex Gaussian Min-max Theorem

Apply CGMT to

(PO)

(AO)

Theorem (CGMT) [TAH’15,TOH’15]

24

Proof Diagram

M-estimator (PO)Duality

(AO)

(DO)Deterministic min-max

Optimization in 4 variablesCGMT

The Equations

First-order optimality conditions

25

Related Literature

• [El Karoui 2013,2015]• Ridge regularization, smooth loss, no structured x0

• Ellpitical distributions• iid entries beyond Gaussian

• [Donoho, Montanari 2013]• No regularizer• smooth+strongly convex, bounded noise

26

Conclusions• Master Theorem for general M-estimators

– Minimal assumptions– 4 nonlinear equations, unique solution, fast iterative solution (why?)– Summary parameters: Expected Moureau envelopes

• Opportunities, lots to be asked…• Optimal loss-function? optimal Regularizer? • When can we be consistent?• Optimally tuning tuning parameter?

LASSO: Linear = Non-linear[TAH’15 NIPS]

• CGMT framework is powerful• non-linear measurements, y=g(Ax0)

• Beyond squared error analysis… Apply CGMT for different set S…[TAYH’15 ICASSP]

Thank You!

Documents

High-dimensional Error Analysis of Regularized M-Estimators Ehsan AbbasiChristos ThrampoulidisBabak Hassibi Allerton Conference Wednesday September 30,