34
Lasso & Compressed Sensing Neural Networks Compressed Sensing and Neural Networks Jan Vyb´ ıral (Charles University & Czech Technical University Prague, Czech Republic) NOMAD Summer Berlin, September 25-29, 2017 1 / 31

Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

  • Upload
    others

  • View
    8

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Compressed Sensing and Neural Networks

Jan Vybıral

(Charles University & Czech Technical University

Prague, Czech Republic)

NOMAD Summer

Berlin, September 25-29, 2017

1 / 31

Page 2: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Outline

Lasso & Compressed Sensing

I Least squares & Regularization

I Convexity, P vs. NP

I Sparsity & `1-minimization

I Compressed Sensing

Neural Networks

I Introduction

I Notation

I Training the network

I Applications

2 / 31

Page 3: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Part I

Lasso & Compressed Sensing

I Least squares & Regularization

I Convexity, P vs. NP

I Sparsity & `1-minimization

I Compressed Sensing

Neural Networks

I Introduction

I Notation

I Training the network

I Applications

3 / 31

Page 4: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Least squares

Fitting a cloud of points by a linear hyperplane

Considered already by Gauss and Legendre in 18th century

In 2D:

4 / 31

Page 5: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Least squares

Objects (=points) described by Ω real numbers:

d1 = (d1,1, . . . , d1,Ω) ∈ RΩ

...

dN = (dN,1, . . . , dN,Ω) ∈ RΩ

N - number of objects; D - N × Ω matrix with rows d1, . . . ,dN

P = (P1, . . . ,PN) are properties of interest

We look for a linear dependence P = f (d) with a linear f , i.e.

Pi =Ω∑j=1

cjdi ,j or P = Dc

5 / 31

Page 6: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Least squares

Objects (=points) described by Ω real numbers:

d1 = (d1,1, . . . , d1,Ω) ∈ RΩ

...

dN = (dN,1, . . . , dN,Ω) ∈ RΩ

N - number of objects; D - N × Ω matrix with rows d1, . . . ,dN

P = (P1, . . . ,PN) are properties of interest

We look for a linear dependence P = f (d) with a linear f , i.e.

Pi =Ω∑j=1

cjdi ,j or P = Dc

5 / 31

Page 7: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Least squares

The solution is found by minimizing the least-square error:

c = arg minc∈RΩ

N∑i=1

(Pi −

Ω∑j=1

cjdi ,j

)2= arg min

c∈RΩ

‖P−Dc‖22

I Closed formula exists

I Convex objective function

I c with all coordinates occupied

I Absolute term incorporated by an additional column full ofones

6 / 31

Page 8: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Least squares

The solution is found by minimizing the least-square error:

c = arg minc∈RΩ

N∑i=1

(Pi −

Ω∑j=1

cjdi ,j

)2= arg min

c∈RΩ

‖P−Dc‖22

I Closed formula exists

I Convex objective function

I c with all coordinates occupied

I Absolute term incorporated by an additional column full ofones

6 / 31

Page 9: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Regularization

How to include preknowledge on c?

Say, we prefer linear fit with small coefficients. We just weight theerror of the fit against the size of the coefficient!

λ > 0 - regularization parameter

c = arg minc∈RΩ

‖P−Dc‖22 + λ‖c‖2

2

I λ→ 0: least squares

I λ→∞: c = 0

7 / 31

Page 10: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Tractability

Convexity

I The minimizer is unique

I Local minimum of a convex function is also a global one

I Many effective methods exist (convex optimization)

P vs. NP

I P-problems: solvable in polynomial time (in dependence onthe size of the input)

I NP-problems: solution verifiable in polynomial time; P⊂NP

I One million dollar problem: P=NP?

I Computational Complexity

8 / 31

Page 11: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Tractability

Convexity

I The minimizer is unique

I Local minimum of a convex function is also a global one

I Many effective methods exist (convex optimization)

P vs. NP

I P-problems: solvable in polynomial time (in dependence onthe size of the input)

I NP-problems: solution verifiable in polynomial time; P⊂NP

I One million dollar problem: P=NP?

I Computational Complexity

8 / 31

Page 12: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Sparsity

If Ω is large (especially Ω N), we are often interested in“selecting features”, i.e. in c with many coordinates equal to zero.

‖c‖0 := #i : ci 6= 0 - the number of non-zero coordinates of c

Looking for a linear fit using only two features:

c = arg minc∈RΩ,‖c‖0≤2

‖P−Dc‖22

Regularized version:

c = arg minc∈RΩ

‖P−Dc‖22 + λ‖c‖0

NP-hard!9 / 31

Page 13: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

`1-minimization

Other ways to measure the size of c: the `p-norms

‖c‖p =( Ω∑j=1

|cj |p)1/p

I Unit balls in `p in R2

I p =∞: ‖c‖∞ = maxj=1,...,Ω

|cj |

I p ≥ 1 - convex problem

I p ≤ 1 - promotes sparsity

10 / 31

Page 14: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

`1-minimization

p ≤ 1 - promotes sparsity

Solution of Sp = arg minz∈R2

‖z‖p s.t. Az = y for p = 1, p = 2

11 / 31

Page 15: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

`1-minimization

Take p = 1 (Lasso - Tibshirani, 1996)

c = arg minc∈RΩ

‖P−Dc‖22 + λ‖c‖1

I Chen, Donoho, Saunders: Basis pursuit (1998)

I λ→ 0 : least squares

I λ→∞: c = 0

I In between: λ selects sparsity

12 / 31

Page 16: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

`1-minimization

Effect of λ > 0 on the support of ω

13 / 31

Page 17: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Compressed Sensing (aka Compressive Sensing, Compressive Sampling)

Theorem: Let D ∈ RN×Ω with independent gaussian entries!Let 0 < ε < 1, s a natural number and

N ≥ C(

s log(Ω) + log(1/ε)), C a universal constant.

If c ∈ RΩ is s-sparse, P = Dc and c is the minimizer of

c = arg minu∈RΩ

‖u‖1, s.t. P = Du,

then c = c with prob. at least 1− ε.

14 / 31

Page 18: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Compressed Sensing (aka Compressive Sensing, Compressive Sampling)

I Candes, Romberg, Tao (2006); Donoho (2006)

I Extensive theory of recovery of sparse vectors from linearmeasurements

I Optimal conditions on the number of measurements (i.e. datapoints) N ≈ Cs log Ω

I Only true, if most of the features (i.e. the columns of D) areincoherent with the majority of the others (if two features arevery similar, it is difficult to distinguish between them)

I H. Boche, R. Calderbank, G. Kutyniok, J.V.,A Survey of Compressed Sensing,First chapter in Compressed Sensing and its Applications,Birkhauser, Springer, 2015

15 / 31

Page 19: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Dictionaries

Real-life signals are (almost) never sparse in the canonical basis ofRΩ, more often they are sparse in some orthonormal basis, i.e.

x = Bc,

where c ∈ RΩ is sparse and columns (and rows) of B ∈ RΩ×Ω areorthonormal vectors - wavelets, Fourier basis, etc.

Compressed Sensing applies then without any essential change!...just replace D with DB. . . i.e. you rotate the problem. . .

16 / 31

Page 20: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Dictionaries

Even more often, the signal is represented in an overcompletedictionary/lexicon:

x = Lc,

where c ∈ R` is sparse and columns of L ∈ RΩ×` is thedictionary/lexicon - its columns form an overcomplete system(` > Ω)

x is a sparse combination of non-orthogonal vectors - the columnsof L.

Examples: Unions of two or more orthonormal bases, eachcapturing different features

17 / 31

Page 21: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

Dictionaries

I Compressed sensing can be adapted also to this situation

I Optimization:

x = arg minu∈RΩ

‖L∗u‖1, s.t. P = Du

I We do not recover the (non-unique!) sparse coefficients c, butthe (approximation of) the signal x.

I Error bound involves L∗x, is reasonably small for examplewhen L∗L is nearly diagonal . . . not too many features in thedictionary are too correlated. . .

18 / 31

Page 22: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

Least squares & RegularizationConvexity, P vs. NPSparsity & `1-minimizationCompressed Sensing

`1-based optimalization

I `1-SVM: Support vector machines are a standard tool forclassification problems. `1-penalty term leads to sparseclassifiers.

I Nuclear norm: Minimizing nuclear norm (=sum of absolutevalues of eigenvalues) of a matrix leads to low-rank matrices.

I TV(=total variation)-norm: Minimizing∑

i ,j |ui ,j+1 − ui ,j |over images u gives images with edges and flat parts.

I L1: Minimizing the L1-norm (=integral of the absolute value)of a function leads to functions with small support

I TV-norm of f : Minimizing∫|∇f | leads to functions with

jumps along curves.

19 / 31

Page 23: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Part II

Lasso & Compressed Sensing

I Least squares & Regularization

I Convexity, P vs. NP

I Sparsity & `1-minimization

I Compressed Sensing

Neural Networks

I Introduction

I Notation

I Training the network

I Applications

20 / 31

Page 24: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks

W. McCulloch, W. Pitts (1943)Motivated by biological research on human brain and neurons

Neural network is a graph of nodes, partially connected. Nodesrepresents neurons, oriented connections between the nodesrepresent the transfer of outputs of some neurons to inputs ofother neurons.

21 / 31

Page 25: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks

I In 70’s and 80’s a number of obstacles appeared - insufficientcomputer power to train large neural networks, theoreticalproblems of processing exclusive-or, etc.

I Support vector machines (and other simple algorithms) tookover the field of machine learning

I 2010’s: Algorithmic advances and higher computational powerallowed to train large neural networks to human (andsuperhuman) performance in pattern recognition

I Large neural networks (a.k.a. deep learning) used successfullyin many tasks

22 / 31

Page 26: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks: Artificial Neuron

Artificial Neuron:. . . gets activated if a linear combination of its inputs grows over acertain threshold. . .

I Inputs x = (x1, . . . , xn) ∈ Rn

I Weights w = (w1, . . . ,wn) ∈ Rn

I Comparing 〈w , x〉 with a threshold b ∈ RI Plugging the result into the “activation function” - jump (or

smoothed jump) function σ

Artificial neuron is a functionx → σ(〈x ,w〉 − b),

where σ : R→ R might be σ(x) = sgn(x) or σ(x) = ex/(1 + ex),etc.

23 / 31

Page 27: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks: Layers

Artificial neural network is a directed, acyclic graph of artificialneuronsThe neurons are grouped by their distance to the input into layers

24 / 31

Page 28: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks: Layers

I Input: x = (x1, . . . , xn) ∈ Rn

I First layer of neurons:y1 = σ(〈x ,w 1

1 〉 − b11), . . . , yn1 = σ(〈x ,w 1

n1〉 − b1

n1)

I The outputs y = (y1, . . . , yn1) become inputs for the nextlayer . . . ; last layer outputs y ∈ R

I Training the network: given inputs x1, . . . , xN and outputsy 1, . . . , yN and optimize over weights w ’s and b’s

25 / 31

Page 29: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks: Training

I The parameters p of the network are initialized (for examplein a random way) =⇒ Np

I For a set of pairs input/output (x i , y i ) we calculate theoutput of the neural network with current parameters=⇒ z i = Np(x i ).

I In an optimal case, z i = y i for all inputs

I Update the parameters of the neural networks tominimize/decrease the loss function, i.e.∑

i

|y i − z i |2

I . . . and repeat . . .

26 / 31

Page 30: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks: Training

I Non-convex minimization over a huge space!

I Huge number of local minimizers exist

I Initialization of the minimization algorithm is important

I Backpropagation algorithm: the error at the output isredistributed to the neurons of the last hidden layer, then tothe previous one, etc.

I The error is distributed back through the network and used toupdate the parameters of each neuron by a gradient descentmethod

27 / 31

Page 31: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks: Training

I Discovered in 1960’s

I Applied to neural networks 1970’s

I Theoretical progress in 1980’s and 1990’s

I Profited from increased computational power in 2010’s, whichallowed applications to large data sets and neural networks oftens or hundreds of layers

I Achieved human and super-human powers in patternrecognition and later on in many other applications

28 / 31

Page 32: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Neural Networks: Deep learning

I Training of a layer with large number (∼ 100) layers

I Made possible by the use of GPU’s (Nvidia), whichaccelerated the speed of deep learning by ca. 100times

I Use of many parameters makes it sensitive to overfitting(=too exact adaptation to the training data, not observed inother data from the same area)

I Overfitting reduced by regularization methods: `2 (decay) or`1 (sparsity) of weights

I Further tricks used to accelerate the learning algorithm

29 / 31

Page 33: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Applications

I Pattern recognition

I Computer vision

I Speech recognition

I Social network filtering

I Recommendation systems

I Bioinformatics

I AlphaGo

I . . .

30 / 31

Page 34: Compressed Sensing and Neural Networksmeetings.nomad-coe.eu/nomad-summer-2017/uploads/Meeting/... · 2017-10-02 · Lasso & Compressed Sensing Neural Networks Least squares & Regularization

Lasso & Compressed SensingNeural Networks

IntroductionNotationTraining the networkApplications

Thank you for your attention!

31 / 31