Support Recovery for Orthogonal Matching Pursuit...s)kvk 2 2 kAvk 2 2 (1 + s)kvk 2 2 8v s.t. kvk 0 s. Null space property-8S 2[d] s.t. jSj s, if v 2Null(A) nf0g, then kv Sk 1 kv Sck

Support Recovery for Orthogonal Matching PursuitUpper and Lower bounds

Raghav Somani, Chirag Gupta, Prateek Jain and Praneeth Netrapalli

September 24, 2018

Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 1 / 18

Sparse Regression

Sparse Regression

x̄ = arg min‖x‖0≤s∗

f(x) (1.1)

x ∈ Rd and s∗ << d.`0 norm counts the number of non-zero elements.

Applications

Resource constrained Machine LearningHigh dimensional StatisticsBioinformatics


Sparse Regression

Sparse Regression


f(x) (1.1)

x ∈ Rd and s∗ << d.`0 norm counts the number of non-zero elements.

Applications

Resource constrained Machine LearningHigh dimensional StatisticsBioinformatics


Sparse Linear Regression (SLR)


Sparse Linear Regression is a representative problem. Results typicallyextend easily to general case.With f(x) = ‖Ax− y‖22, SLR’s objective is to find


‖Ax− y‖22 (2.1)

where A ∈ Rn×d,x ∈ Rd and y = Rn.Unconditionally, it is NP hard (reduction to 3 set cover problem).






‖Ax− y‖22 (2.1)







‖Ax− y‖22 (2.1)




Assumptions of interest

Despite being NP hard, SLR is tractable under certain assumptions.

Incoherence -If Σ = ATA, then max

i 6=j|Σij | ≤M

If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,

and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT

SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)

=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -

∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒

{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0

Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s

Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.






i 6=j|Σij | ≤M



SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)



{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0








i 6=j|Σij | ≤M



SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)



{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0








i 6=j|Σij | ≤M



SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)



{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0








i 6=j|Σij | ≤M



SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)



{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0








i 6=j|Σij | ≤M



SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)



{v ∈ Rd | Av = 0

}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1

}= 0





Goals of SLR

SLR can be modelled asy = Ax̄ + η (2.2)

where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.

=⇒ y = AS∗ x̄S∗ + η (2.3)

Model with deterministic conditions on η can also be analyzed.

Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1

n ‖A(x− x̄)‖22where the rows of A are i.i.d.

2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.



Goals of SLR



=⇒ y = AS∗ x̄S∗ + η (2.3)







Goals of SLR



=⇒ y = AS∗ x̄S∗ + η (2.3)







Goals of SLR



=⇒ y = AS∗ x̄S∗ + η (2.3)







Algorithms to solve SLR

The literature mainly studies 3 classes of algorithms

Existing SLR algorithms

`1 minimization based (LASSO based). E.g. - Dantzig selectorNon-convex penalty based. E.g. - IHT, SCAD penalty, Log-sum penaltyGreedy methods. E.g. - Orthogonal Matching Pursuit (OMP)

We study SLR under RSC assumption for OMP algorithm.



Algorithms to solve SLR

The literature mainly studies 3 classes of algorithms

Existing SLR algorithms

`1 minimization based (LASSO based). E.g. - Dantzig selectorNon-convex penalty based. E.g. - IHT, SCAD penalty, Log-sum penaltyGreedy methods. E.g. - Orthogonal Matching Pursuit (OMP)

We study SLR under RSC assumption for OMP algorithm.



Orthogonal Matching Pursuit for SLR

Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)

From the left-over columns of A (in ASck−1

), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .

∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]

Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min

supp(x)⊆Sk

‖y −Ax‖22 (Simple least squares).

Update residual: rk = y −Axk.







∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk









∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1

, rk−1⟩∣∣ ]


supp(x)⊆Sk






Result: OMP sparse estimate x̂OMPs = xs

S0 = φ, x0 = 0, r0 = yfor k = 1, 2, . . . , s do

j ← arg maxi 6∈Sk−1

|ATi rk−1| (Greedy selection)

Sk ← Sk−1 ∪ {j}xk ← arg min

supp(x)⊆Sk

‖Ax− y‖22rk ← y −Axk

endAlgorithm 1: Orthogonal Matching Pursuit (OMP) for SLR

Note that ATi rk−1 ∝ [∇f(xk−1)]i for f(x) = ‖Ax− y‖22





S0 = φ, x0 = 0, r0 = yfor k = 1, 2, . . . , s do

j ← arg maxi 6∈Sk−1

|ATi rk−1| (Greedy selection)

Sk ← Sk−1 ∪ {j}xk ← arg min

supp(x)⊆Sk

‖Ax− y‖22rk ← y −Axk

endAlgorithm 1: Orthogonal Matching Pursuit (OMP) for SLR

Note that ATi rk−1 ∝ [∇f(xk−1)]i for f(x) = ‖Ax− y‖22


Orthogonal Matching Pursuit

Orthogonal Matching Pursuit for general f(x)


S0 = φ, x0 = 0for k = 1, 2, . . . , s do

j := arg maxi 6∈Sk−1

|[∇f(xk−1)]i|

Sk := Sk−1 ∪ {j}xk := arg min

supp(x)⊆Sk

f(x)

endAlgorithm 2: OMP for a general function f(x)


Sparse Linear Regression (SLR) for OMP

Key quantities

Restricted Smoothness (ρ+) & Restricted Strong Convexity (ρ−)

ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)

∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s.Restricted condition number (κ̃s)

κ̃s =ρ+1ρ−s

(4.2)

We also define

κs =ρ+sρ−s≥ κ̃s (4.3)



Key quantities


ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)


κ̃s =ρ+1ρ−s

(4.2)

We also define

κs =ρ+sρ−s≥ κ̃s (4.3)



Key quantities


ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)


κ̃s =ρ+1ρ−s

(4.2)

We also define

κs =ρ+sρ−s≥ κ̃s (4.3)



Lower bounds for Fast rates

If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow

sup‖x̄‖0≤s∗

1

nE[‖A(x̂`0 − x̄)‖22

].σ2s∗

n(4.4)

Non-tractable since computing x̂`0 involves searching all(ds∗

)subsets.

(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies


1

nE[∥∥A(x̂poly − x̄)

∥∥22

]&σ2s∗1−δκ̃s∗

n∀ δ > 0 (4.5)

Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.

. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18





1

nE[‖A(x̂`0 − x̄)‖22

].σ2s∗

n(4.4)


)subsets.



1


∥∥22

]&σ2s∗1−δκ̃s∗

n∀ δ > 0 (4.5)







1

nE[‖A(x̂`0 − x̄)‖22

].σ2s∗

n(4.4)


)subsets.



1


∥∥22

]&σ2s∗1−δκ̃s∗

n∀ δ > 0 (4.5)







1

nE[‖A(x̂`0 − x̄)‖22

].σ2s∗

n(4.4)


)subsets.



1


∥∥22

]&σ2s∗1−δκ̃s∗

n∀ δ > 0 (4.5)



Sparse Linear Regression (SLR) for OMP Upper bounds

Upper bounds on Generalization error

Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗

iterations, then with high probability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)

With slight modification to T. Zhang’s analysis we get

Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability

1

n‖A(xs − x̄)‖22 .

1

nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)

Matches the fast rate lower bound up to log factors.






1

n‖A(xs − x̄)‖22 .

1




1

n‖A(xs − x̄)‖22 .

1








1

n‖A(xs − x̄)‖22 .

1




1

n‖A(xs − x̄)‖22 .

1








1

n‖A(xs − x̄)‖22 .

1




1

n‖A(xs − x̄)‖22 .

1





Support Recovery upper bound

Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.

Large decrease in objective

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ

and |x̄min| &σγ√ρ+1

ρ−s+s∗

then with high probability

‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18







ρ−s+s∗


‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).








ρ−s+s∗


‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).








ρ−s+s∗


‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).








ρ−s+s∗


‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).




Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.

Support recovery and infinity norm bound

If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.

|x̄min| &σγ√ρ+1

ρ−s+s∗

, then with high probability

1 S∗ ⊆ supp(x̂s)

2 ‖x̂s − x̄‖∞ . σ√

log s

ρ−s

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).

Condition on |x̄min| scales as 1√n

since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least

√κ̃ than that in other recent works.

γ is allowed to be very large.








ρ−s+s∗



2 ‖x̂s − x̄‖∞ . σ√

log s

ρ−s

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).












ρ−s+s∗



2 ‖x̂s − x̄‖∞ . σ√

log s

ρ−s

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).












ρ−s+s∗



2 ‖x̂s − x̄‖∞ . σ√

log s

ρ−s

where∥∥∥AT

S∗\SAS

(AT

SAS

)−1∥∥∥∞≤ γ and S = supp(x̂s).






Sparse Linear Regression (SLR) for OMP Lower bounds

Lower bound instance construction

(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗

iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄

x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}

Construct M(ε) ∈ Rn×d parameterized by ε

M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.

∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j

]+√εgi ∀ i 6∈ [s∗] where gi’s are

orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.






x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}



∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j








x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}



∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j








x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}



∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j








x̄i =

{1√s∗

if 1 ≤ i ≤ s∗

0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}



∥∥∥M(ε)i

∥∥∥22= n ∀ i ∈ [s∗].

M(ε)i =

√1− ε

[1√s∗

∑s∗

j=1 M(ε)j






M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3

Visualizing in d = 3 withs∗ = 2

ε = 0.25

Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.




M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3


ε = 0.25





M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3


ε = 0.25





M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3


ε = 0.25





M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3


ε = 0.25





M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3


ε = 0.25





M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3


ε = 0.25





M(0.25)1

M(0.25)2

1√2

∑2i=1 M

(0.25)i

g3

M(0.25)3

e1

e2

e3


ε = 0.25




Lower bounds

Noiseless caseFor s∗ ≤ d ≤ n, ∃ ε > 0 s.t. when OMP is executed on SLR problem withy = M(ε)x̄ for s ≤ d− s∗ iterations

κ̃s(M(ε)) . s

s∗ and γ ≤√

23

S∗ ∩ supp(x̂s) = φ

Noisy case

For s∗ ≤ s ≤ d1−α where α ∈ (0, 1), ∃ ε > 0 s.t. when OMP is executed onSLR problem with y = M(ε)x̄ + η where η ∼ N

(0, σ2In×n

), then

κ̃s(M(ε)) . s

s∗ and γ ≤ 12

with high probability 1n ‖Ax̂s −Ax̄‖22 & 1

nσ2κ̃s+s∗s

∗

with high probability S∗ ∩ supp(x̂s) = φ

=⇒ s & κ̃ss∗ iterations are indeed necessary.

Addition of noise can only help.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 17 / 18


Lower bounds


κ̃s(M(ε)) . s

s∗ and γ ≤√

23


Noisy case


(0, σ2In×n

), then

κ̃s(M(ε)) . s

s∗ and γ ≤ 12


nσ2κ̃s+s∗s

∗





Lower bounds


κ̃s(M(ε)) . s

s∗ and γ ≤√

23


Noisy case


(0, σ2In×n

), then

κ̃s(M(ε)) . s

s∗ and γ ≤ 12


nσ2κ̃s+s∗s

∗




Sparse Linear Regression (SLR) for OMP Simulations

Simulations

We perform simulations on the lower bound instance class.M(ε) ∈ R1000×100 and s∗ = 10.

(a) Varying condition number (b) Varying noise variance

Figure: Number of iterations required for recovering the full support of x̄ with respect tothe restricted condition number (κ̃s+s∗ ) of design matrix and the variance of noise (σ2).


Documents

Support Recovery for Orthogonal Matching Pursuit...s)kvk 2 2 kAvk 2 2 (1 + s)kvk 2 2 8v s.t. kvk 0 s. Null space property-8S 2[d] s.t. jSj s, if v 2Null(A) nf0g, then kv Sk 1 kv Sck