Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch

Sketched Learning from Random Features Moments

Nicolas Keriven

Ecole Normale Supérieure (Paris)

CFM-ENS chair in Data Science

(thesis with Rémi Gribonval at Inria Rennes)

ISMP, July 6th 2018

Compressive learning

Nicolas Keriven 1/15


Nicolas Keriven

Compression Learning

Linear sketch

• Sketched learning: First compress data in a linear sketch [Cormode

2011], then learn

1/15


Nicolas Keriven


Linear sketch


2011], then learn• Hash tables, count sketches, histograms…

1/15


Nicolas Keriven


Linear sketch



• Advantages: one-pass, streaming, distributed compression, data privacy…

1/15


Nicolas Keriven


Linear sketch



• Advantages: one-pass, streaming, distributed compression, data privacy…

• In this talk: unsupervised learning

1/15

How-to: build a sketch

Nicolas Keriven

What is a sketch ?

Any linear sketch = empirical moments

2/15


Nicolas Keriven

What is a sketch ?


2/15

What is contained in a sketch ?


Nicolas Keriven

What is a sketch ?


2/15


• : mean


Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment


Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment

• : histogram


Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment

• : histogram

• Proposed: kernel random features [Rahimi 2007]

(random proj. + non-linearity)


Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment

• : histogram



Questions:

• What information is preserved by the sketching ?

• How to retrieve this information ?

• What is a sufficient number of features ?


Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment

• : histogram



Questions:




Intuition: sketching as a linear embedding


Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment

• : histogram



Questions:




- Assumption:



Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment

• : histogram



Questions:




- Assumption:

- Linear operator:



Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment

• : histogram



Questions:




- Assumption:

- Linear operator:

- « Noisy » linear measurement:

Noise small



Nicolas Keriven

What is a sketch ?


2/15


• : mean

• : moment

• : histogram



Questions:




- Assumption:

- Linear operator:

- « Noisy » linear measurement:

Noise small


Dimensionality-reducing, random, linear embedding: Compressive Sensing?

Retrieving mixture of Diracsfrom a sketch= k-means

Example of applications [Keriven 2016,2017]




Nicolas Keriven

Application: Spectral clusteringfor MNIST classification

3/15



Nicolas Keriven


3/15

- Twice faster than k-means- 4 orders of magnitude more

memory efficient



Nicolas Keriven


3/15

- Twice faster than k-means- 4 orders of magnitude more

memory efficient

Retrieving GMMs from a sketch

Application: speaker verification [Reynolds 2000]

Error:

• EM on 300 000 samples : 29.53• 20kB sketch computed on 50GB database: 28.96

In this talk

Nicolas Keriven

Q: Theoretical guarantees ?

• Inspired by Compressive Sensing:

• 1: with the Restricted Isometry Property (RIP)

• 2: with dual certificates

4/15

Outline

Nicolas Keriven

Information-preservation guarantees: a RIP analysis

Joint work with R. Gribonval, G. Blanchard, Y. Traonmilin

Total variation regularization:a dual certificate analysis

Conclusion, outlooks

Recall: Linear inverse problem


True distribution:



Sketch:

True distribution:


Nicolas Keriven

• Estimation problem = linear inverse problem on measures

• Extremely ill-posed !

5/15

Sketch:

True distribution:


Nicolas Keriven

• Estimation problem = linear inverse problem on measures

• Extremely ill-posed !

• Feasibility? (information-preservation)

5/15

Best algorithmpossible

Sketch:

Information preservation guarantees


: Model set of « simple » distributions (eg. GMMs)











Nicolas Keriven

GoalProve the existence of a decoder robustto noise and stable to modeling error.

« Instance-optimal » decoder

6/15



Nicolas Keriven


Lower Restricted Isometry Property


6/15



Nicolas Keriven

Generalized Method of Moments




6/15



Nicolas Keriven

New goal: find/construct models and operators that satisfy the LRIP (w.h.p.)

Generalized Method of Moments




6/15

Proving the LRIP

Nicolas Keriven

Goal: LRIP

7/15

Proving the LRIP

Nicolas Keriven

Construction of :- Kernel mean [Gretton 2006, Borgwardt 2006]

- Random features [Rahimi 2007]

Pointwise LRIP

Goal: LRIP

7/15

Proving the LRIP

Nicolas Keriven



Pointwise LRIP

Goal: LRIP

Extension to LRIP

Covering numbers (compacity) of the normalized secant set

7/15

Proving the LRIP

Nicolas Keriven



Pointwise LRIP

Goal: LRIP

Extension to LRIP


Subset of a unit ball (infinite dimension)that only depends on

7/15

Proving the LRIP

Nicolas Keriven



Pointwise LRIP

Goal: LRIP

Extension to LRIP


Subset of a unit ball (infinite dimension)that only depends on

7/15

Main result [Keriven 2016]

Nicolas Keriven

Main hypothesis

The normalized secant set has finite covering numbers.

8/15

Result

For ,


Nicolas Keriven

Main hypothesis


8/15

Result

For ,


Nicolas Keriven

Main hypothesis


Pointwise concentration Dimensionality of the model

8/15

Result

For ,

W.h.p.


Nicolas Keriven

Main hypothesis



8/15

Result

For ,

W.h.p.


Nicolas Keriven

Main hypothesis



Modeling error Empirical noise

8/15

Result

For ,

W.h.p.


Nicolas Keriven

Main hypothesis



Modeling error Empirical noise

8/15

Does not depend on !


Result

For ,

W.h.p.


Nicolas Keriven

Main hypothesis



Modeling error

- Classic Compressive Sensing: finite dimension: Known- Here: infinite dimension: Technical

Empirical noise

8/15



Application

Nicolas Keriven

k-means with mixtures of Diracs

9/15

Application

Nicolas Keriven


Hypotheses- - separated centroids- - bounded domain for centroids

9/15

Application

Nicolas Keriven



(no assumptionon the data)

9/15

Application

Nicolas Keriven



Sketch- Adjusted Random Fourier features (for

technical reasons)


9/15

Application

Nicolas Keriven




technical reasons)

Result- W.r.t. k-means usual cost (SSE)


9/15

Application

Nicolas Keriven




technical reasons)


Sketch size


9/15

Application

Nicolas Keriven

k-means with mixtures of Diracs GMM with known covariance



technical reasons)


Sketch size


9/15

Application

Nicolas Keriven




technical reasons)


Sketch size

Hypotheses- Sufficiently separated means- Bounded domain for means


9/15

Application

Nicolas Keriven




technical reasons)


Sketch size


Sketch- Fourier features


9/15

Application

Nicolas Keriven




technical reasons)


Sketch size



Result- With respect to log-likelihood


9/15

Application

Nicolas Keriven




technical reasons)


Sketch size




Sketch size


9/15

Application

Nicolas Keriven




technical reasons)


Sketch size




Sketch size


9/15

Compared to Generalized Method of moments, different guarantees

Outline

Nicolas Keriven


Total variation regularization:a dual certificate analysisJoint work with C. Poon, G. Peyré


Total Variation regularization

Nicolas Keriven

Previously: RIP analysis

Minimization: moment matching

11/15


Nicolas Keriven


• Must know

• Non-convex !


11/15


Nicolas Keriven


• Must know

• Non-convex !


Convex relaxation (« super resolution »): Beurling-LASSO (BLASSO) [DeCastro 2015]

• : Radon measure

• : Total variation (« L1 norm »)

11/15


Nicolas Keriven


• Must know

• Non-convex !


Convex relaxation (« super resolution »): Beurling-LASSO (BLASSO) [DeCastro 2015]

• : Radon measure

• : Total variation (« L1 norm »)

Questions:• Is the measure sparse ?

• Does it have the right number of components ?

• Does it recover the true ?

11/15

Dual certificates


Dual certificate analysis:

( = Lagrange multiplier)

Dual certificates




Function

Dual certificates




Such that:

•

• otherwise•

Function

Dual certificates

Nicolas Keriven

Step 1: study full kernel [Candes 2013]

Assume sufficiently separated

12/15



Such that:

•

• otherwise•

Function

Dual certificates

Nicolas Keriven


Step 2: bounding the deviations


12/15



Such that:

•

• otherwise•

Function

Dual certificates

Nicolas Keriven




m=10

12/15



Such that:

•

• otherwise•

Function

Dual certificates

Nicolas Keriven




m=10 m=20

12/15



Such that:

•

• otherwise•

Function

Dual certificates

Nicolas Keriven




m=50m=10 m=20

12/15



Such that:

•

• otherwise•

Function

Results for separated GMM

Nicolas Keriven

1: Ideal scaling in sparsity

13/15


Nicolas Keriven


13/15


Nicolas Keriven


In progress…

13/15


Nicolas Keriven


In progress…

• not necessarily right number of components, but:

• Mass of concentrated around true• (weak) robustness to modelling error

• Proof: infinite-dimensional golfingscheme (new)

13/15


Nicolas Keriven


In progress…




2: Minimal norm certificate[Duval, Peyré 2015]

In progress…

13/15

Assumption: data are actually drawn from a GMM…


Nicolas Keriven


In progress…




2: Minimal norm certificate[Duval, Peyré 2015]

In progress…

• when n high enough: sparse, withright number of components

•

• Proof: adaptation of [Tang, Recht 2013]

13/15

Assumption: data are actually drawn from a GMM…

Outline

Nicolas Keriven


Total variation regularization:a dual certificate analysis


Sketch learning

Nicolas Keriven

• Sketching :• Streaming, distributed learning

• Original view on data compression and generalized moments

• Combines random features and kernel mean with infinitedimensional Compressive sensing

14/15

Summary, outlooks

Nicolas Keriven

• RIP analysis• Information preservation guarantees• Fine control on noise, modeling error (instance optimal decoder) and

recovery metrics• Necessary and sufficient conditions

15/15

Summary, outlooks

Nicolas Keriven

• Dual certificate analysis• Convex minimization• In some cases, automatically guess the right number of components



15/15

Summary, outlooks

Nicolas Keriven

• Dual certificate analysis• Convex minimization• In some cases, automatically guess the right number of components



15/15

• Outlooks• Algorithms for TV minimization• Other features (not necessarily random…)• Other « sketched » learning tasks• Multilayer sketches ?

Thank you !

Nicolas Keriven

• Gribonval, Blanchard, Keriven, Traonmilin. Compressive Statistical Learning with Random Feature Moments. 2017. <arXiv:1706.07180>

• Keriven. Sketching for Large-Scale Learning of Mixture Models. PhD Thesis. <tel-01620815>

• Poon, Keriven, Peyré. A Dual Certificates Analysis of Compressive Off-the-Grid Recovery. 2018. <arXiv:1802.08464>

• Code, applications: nkeriven.github.io

Documents

Sketched Learning from Random Features Moments · Compressive learning Nicolas Keriven Compression Learning Linear sketch • Sketched learning: First compress data in a linear sketch