Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
Support Recovery for Orthogonal Matching PursuitUpper and Lower bounds
Raghav Somani, Chirag Gupta, Prateek Jain and Praneeth Netrapalli
September 24, 2018
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 1 / 18
Sparse Regression
Sparse Regression
x̄ = arg min‖x‖0≤s∗
f(x) (1.1)
x ∈ Rd and s∗ << d.`0 norm counts the number of non-zero elements.
Applications
Resource constrained Machine LearningHigh dimensional StatisticsBioinformatics
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 2 / 18
Sparse Regression
Sparse Regression
x̄ = arg min‖x‖0≤s∗
f(x) (1.1)
x ∈ Rd and s∗ << d.`0 norm counts the number of non-zero elements.
Applications
Resource constrained Machine LearningHigh dimensional StatisticsBioinformatics
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 2 / 18
Sparse Linear Regression (SLR)
Sparse Linear Regression (SLR)
Sparse Linear Regression is a representative problem. Results typicallyextend easily to general case.With f(x) = ‖Ax− y‖22, SLR’s objective is to find
x̄ = arg min‖x‖0≤s∗
‖Ax− y‖22 (2.1)
where A ∈ Rn×d,x ∈ Rd and y = Rn.Unconditionally, it is NP hard (reduction to 3 set cover problem).
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 3 / 18
Sparse Linear Regression (SLR)
Sparse Linear Regression (SLR)
Sparse Linear Regression is a representative problem. Results typicallyextend easily to general case.With f(x) = ‖Ax− y‖22, SLR’s objective is to find
x̄ = arg min‖x‖0≤s∗
‖Ax− y‖22 (2.1)
where A ∈ Rn×d,x ∈ Rd and y = Rn.Unconditionally, it is NP hard (reduction to 3 set cover problem).
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 3 / 18
Sparse Linear Regression (SLR)
Sparse Linear Regression (SLR)
Sparse Linear Regression is a representative problem. Results typicallyextend easily to general case.With f(x) = ‖Ax− y‖22, SLR’s objective is to find
x̄ = arg min‖x‖0≤s∗
‖Ax− y‖22 (2.1)
where A ∈ Rn×d,x ∈ Rd and y = Rn.Unconditionally, it is NP hard (reduction to 3 set cover problem).
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 3 / 18
Sparse Linear Regression (SLR)
Assumptions of interest
Despite being NP hard, SLR is tractable under certain assumptions.
Incoherence -If Σ = ATA, then max
i 6=j|Σij | ≤M
If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,
and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT
SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)
=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -
∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒
{v ∈ Rd | Av = 0
}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1
}= 0
Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s
Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18
Sparse Linear Regression (SLR)
Assumptions of interest
Despite being NP hard, SLR is tractable under certain assumptions.
Incoherence -If Σ = ATA, then max
i 6=j|Σij | ≤M
If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,
and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT
SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)
=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -
∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒
{v ∈ Rd | Av = 0
}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1
}= 0
Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s
Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18
Sparse Linear Regression (SLR)
Assumptions of interest
Despite being NP hard, SLR is tractable under certain assumptions.
Incoherence -If Σ = ATA, then max
i 6=j|Σij | ≤M
If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,
and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT
SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)
=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -
∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒
{v ∈ Rd | Av = 0
}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1
}= 0
Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s
Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18
Sparse Linear Regression (SLR)
Assumptions of interest
Despite being NP hard, SLR is tractable under certain assumptions.
Incoherence -If Σ = ATA, then max
i 6=j|Σij | ≤M
If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,
and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT
SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)
=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -
∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒
{v ∈ Rd | Av = 0
}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1
}= 0
Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s
Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18
Sparse Linear Regression (SLR)
Assumptions of interest
Despite being NP hard, SLR is tractable under certain assumptions.
Incoherence -If Σ = ATA, then max
i 6=j|Σij | ≤M
If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,
and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT
SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)
=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -
∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒
{v ∈ Rd | Av = 0
}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1
}= 0
Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s
Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18
Sparse Linear Regression (SLR)
Assumptions of interest
Despite being NP hard, SLR is tractable under certain assumptions.
Incoherence -If Σ = ATA, then max
i 6=j|Σij | ≤M
If M ≤ 12s∗−1 and y = Ax̄ =⇒ x̄ is unique sparsest solution,
and OMP can recover x̄ in s∗ steps.Restricted Isometry Property (RIP) -∥∥AT
SAS − I∥∥2≤ δ|S| (δs ≤M(s− 1) ∀ s ≥ 2)
=⇒ (1− δs) ‖v‖22 ≤ ‖Av‖22 ≤ (1 + δs) ‖v‖22 ∀ v s.t. ‖v‖0 ≤ s.Null space property -
∀ S ∈ [d] s.t. |S| ≤ s, if v ∈ Null(A) \ {0}, then ‖vS‖1 ≤ ‖vSc‖1=⇒
{v ∈ Rd | Av = 0
}∩{v ∈ Rd | ‖vSc‖1 ≤ ‖vS‖1
}= 0
Restricted Strong Convexity (RSC) -‖Ax−Az‖22 ≥ ρ−s ‖x− z‖22 ∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s
Incoherence =⇒ RIP =⇒ Null space property =⇒ RSCRSC is the weakest and the most popular assumption.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 4 / 18
Sparse Linear Regression (SLR)
Goals of SLR
SLR can be modelled asy = Ax̄ + η (2.2)
where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.
=⇒ y = AS∗ x̄S∗ + η (2.3)
Model with deterministic conditions on η can also be analyzed.
Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1
n ‖A(x− x̄)‖22where the rows of A are i.i.d.
2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 5 / 18
Sparse Linear Regression (SLR)
Goals of SLR
SLR can be modelled asy = Ax̄ + η (2.2)
where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.
=⇒ y = AS∗ x̄S∗ + η (2.3)
Model with deterministic conditions on η can also be analyzed.
Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1
n ‖A(x− x̄)‖22where the rows of A are i.i.d.
2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 5 / 18
Sparse Linear Regression (SLR)
Goals of SLR
SLR can be modelled asy = Ax̄ + η (2.2)
where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.
=⇒ y = AS∗ x̄S∗ + η (2.3)
Model with deterministic conditions on η can also be analyzed.
Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1
n ‖A(x− x̄)‖22where the rows of A are i.i.d.
2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 5 / 18
Sparse Linear Regression (SLR)
Goals of SLR
SLR can be modelled asy = Ax̄ + η (2.2)
where η ∼ N (0, σ2In×n), supp(x̄) = S∗ and |S∗| = s∗.
=⇒ y = AS∗ x̄S∗ + η (2.3)
Model with deterministic conditions on η can also be analyzed.
Goals of SLR1 Bounding Generalization error - Upper bound G(x) := 1
n ‖A(x− x̄)‖22where the rows of A are i.i.d.
2 Support Recovery - Recover the true features of A, i.e., find a S ⊇ S∗.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 5 / 18
Sparse Linear Regression (SLR)
Algorithms to solve SLR
The literature mainly studies 3 classes of algorithms
Existing SLR algorithms
`1 minimization based (LASSO based). E.g. - Dantzig selectorNon-convex penalty based. E.g. - IHT, SCAD penalty, Log-sum penaltyGreedy methods. E.g. - Orthogonal Matching Pursuit (OMP)
We study SLR under RSC assumption for OMP algorithm.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 6 / 18
Sparse Linear Regression (SLR)
Algorithms to solve SLR
The literature mainly studies 3 classes of algorithms
Existing SLR algorithms
`1 minimization based (LASSO based). E.g. - Dantzig selectorNon-convex penalty based. E.g. - IHT, SCAD penalty, Log-sum penaltyGreedy methods. E.g. - Orthogonal Matching Pursuit (OMP)
We study SLR under RSC assumption for OMP algorithm.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 6 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Set initial support set S0 = φ & x0 = 0.∴ residual r0 = y −Ax0 = y.At kth iteration (k ≥ 1)
From the left-over columns of A (in ASck−1
), find the column withmaximum absolute inner product with rk−1.[|〈Ai1 , rk−1〉| |〈Ai2 , rk−1〉| . . .
∣∣⟨Aij , rk−1⟩∣∣ . . . ∣∣⟨Aid−k+1
, rk−1⟩∣∣ ]
Include ij into the set: Sk = Sk−1 ∪ {ij}.Fully optimize on Sk: xk = arg min
supp(x)⊆Sk
‖y −Ax‖22 (Simple least squares).
Update residual: rk = y −Axk.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 7 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Result: OMP sparse estimate x̂OMPs = xs
S0 = φ, x0 = 0, r0 = yfor k = 1, 2, . . . , s do
j ← arg maxi 6∈Sk−1
|ATi rk−1| (Greedy selection)
Sk ← Sk−1 ∪ {j}xk ← arg min
supp(x)⊆Sk
‖Ax− y‖22rk ← y −Axk
endAlgorithm 1: Orthogonal Matching Pursuit (OMP) for SLR
Note that ATi rk−1 ∝ [∇f(xk−1)]i for f(x) = ‖Ax− y‖22
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 8 / 18
Sparse Linear Regression (SLR)
Orthogonal Matching Pursuit for SLR
Result: OMP sparse estimate x̂OMPs = xs
S0 = φ, x0 = 0, r0 = yfor k = 1, 2, . . . , s do
j ← arg maxi 6∈Sk−1
|ATi rk−1| (Greedy selection)
Sk ← Sk−1 ∪ {j}xk ← arg min
supp(x)⊆Sk
‖Ax− y‖22rk ← y −Axk
endAlgorithm 1: Orthogonal Matching Pursuit (OMP) for SLR
Note that ATi rk−1 ∝ [∇f(xk−1)]i for f(x) = ‖Ax− y‖22
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 8 / 18
Orthogonal Matching Pursuit
Orthogonal Matching Pursuit for general f(x)
Result: OMP sparse estimate x̂OMPs = xs
S0 = φ, x0 = 0for k = 1, 2, . . . , s do
j := arg maxi 6∈Sk−1
|[∇f(xk−1)]i|
Sk := Sk−1 ∪ {j}xk := arg min
supp(x)⊆Sk
f(x)
endAlgorithm 2: OMP for a general function f(x)
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 9 / 18
Sparse Linear Regression (SLR) for OMP
Key quantities
Restricted Smoothness (ρ+) & Restricted Strong Convexity (ρ−)
ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)
∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s.Restricted condition number (κ̃s)
κ̃s =ρ+1ρ−s
(4.2)
We also define
κs =ρ+sρ−s≥ κ̃s (4.3)
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 10 / 18
Sparse Linear Regression (SLR) for OMP
Key quantities
Restricted Smoothness (ρ+) & Restricted Strong Convexity (ρ−)
ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)
∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s.Restricted condition number (κ̃s)
κ̃s =ρ+1ρ−s
(4.2)
We also define
κs =ρ+sρ−s≥ κ̃s (4.3)
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 10 / 18
Sparse Linear Regression (SLR) for OMP
Key quantities
Restricted Smoothness (ρ+) & Restricted Strong Convexity (ρ−)
ρ−s ‖x− z‖22 ≤ ‖Ax−Az‖22 ≤ ρ+s ‖x− z‖22 (4.1)
∀ x, z ∈ Rd s.t. ‖x− z‖0 ≤ s.Restricted condition number (κ̃s)
κ̃s =ρ+1ρ−s
(4.2)
We also define
κs =ρ+sρ−s≥ κ̃s (4.3)
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 10 / 18
Sparse Linear Regression (SLR) for OMP
Lower bounds for Fast rates
If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow
sup‖x̄‖0≤s∗
1
nE[‖A(x̂`0 − x̄)‖22
].σ2s∗
n(4.4)
Non-tractable since computing x̂`0 involves searching all(ds∗
)subsets.
(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies
sup‖x̄‖0≤s∗
1
nE[∥∥A(x̂poly − x̄)
∥∥22
]&σ2s∗1−δκ̃s∗
n∀ δ > 0 (4.5)
Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.
. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18
Sparse Linear Regression (SLR) for OMP
Lower bounds for Fast rates
If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow
sup‖x̄‖0≤s∗
1
nE[‖A(x̂`0 − x̄)‖22
].σ2s∗
n(4.4)
Non-tractable since computing x̂`0 involves searching all(ds∗
)subsets.
(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies
sup‖x̄‖0≤s∗
1
nE[∥∥A(x̂poly − x̄)
∥∥22
]&σ2s∗1−δκ̃s∗
n∀ δ > 0 (4.5)
Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.
. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18
Sparse Linear Regression (SLR) for OMP
Lower bounds for Fast rates
If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow
sup‖x̄‖0≤s∗
1
nE[‖A(x̂`0 − x̄)‖22
].σ2s∗
n(4.4)
Non-tractable since computing x̂`0 involves searching all(ds∗
)subsets.
(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies
sup‖x̄‖0≤s∗
1
nE[∥∥A(x̂poly − x̄)
∥∥22
]&σ2s∗1−δκ̃s∗
n∀ δ > 0 (4.5)
Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.
. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18
Sparse Linear Regression (SLR) for OMP
Lower bounds for Fast rates
If x̂`0 is the best `0 estimate in the set of s∗-sparse vectors then one canshow
sup‖x̄‖0≤s∗
1
nE[‖A(x̂`0 − x̄)‖22
].σ2s∗
n(4.4)
Non-tractable since computing x̂`0 involves searching all(ds∗
)subsets.
(Y. Zhang, Wainwright & Jordan’15) ∃ A ∈ Rn×d s.t. any poly-timealgorithm satisfies
sup‖x̄‖0≤s∗
1
nE[∥∥A(x̂poly − x̄)
∥∥22
]&σ2s∗1−δκ̃s∗
n∀ δ > 0 (4.5)
Consequence - Any estimator x̂ achieving fast rate must either not bepoly-time or must return x̂poly that is not s∗-sparse.
. and & are inequalities up to constant & poly-log d factorsSomani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 11 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Upper bounds on Generalization error
Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗
iterations, then with high probability
1
n‖A(xs − x̄)‖22 .
1
nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)
With slight modification to T. Zhang’s analysis we get
Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability
1
n‖A(xs − x̄)‖22 .
1
nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)
Matches the fast rate lower bound up to log factors.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 12 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Upper bounds on Generalization error
Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗
iterations, then with high probability
1
n‖A(xs − x̄)‖22 .
1
nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)
With slight modification to T. Zhang’s analysis we get
Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability
1
n‖A(xs − x̄)‖22 .
1
nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)
Matches the fast rate lower bound up to log factors.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 12 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Upper bounds on Generalization error
Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗
iterations, then with high probability
1
n‖A(xs − x̄)‖22 .
1
nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)
With slight modification to T. Zhang’s analysis we get
Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability
1
n‖A(xs − x̄)‖22 .
1
nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)
Matches the fast rate lower bound up to log factors.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 12 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Upper bounds on Generalization error
Tightest known upper bounds for poly-time algorithms like IHT, OMP andLasso were at least κ̃ times worse than known lower bounds (Jain’14, T.Zhang’10, Y. Zhang’17).(T. Zhang’10) If x̂s be the output of OMP after s & s∗κ̃s+s∗ log κs+s∗
iterations, then with high probability
1
n‖A(xs − x̄)‖22 .
1
nσ2s∗κ̃2s+s∗ log κs+s∗ . (4.6)
With slight modification to T. Zhang’s analysis we get
Generalization error for OMPIf x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, then with highprobability
1
n‖A(xs − x̄)‖22 .
1
nσ2s∗κ̃s+s∗ log κs+s∗ . (4.7)
Matches the fast rate lower bound up to log factors.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 12 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.
Large decrease in objective
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ
and |x̄min| &σγ√ρ+1
ρ−s+s∗
then with high probability
‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.
Large decrease in objective
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ
and |x̄min| &σγ√ρ+1
ρ−s+s∗
then with high probability
‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.
Large decrease in objective
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ
and |x̄min| &σγ√ρ+1
ρ−s+s∗
then with high probability
‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.
Large decrease in objective
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ
and |x̄min| &σγ√ρ+1
ρ−s+s∗
then with high probability
‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Support recovery results are known for SCAD/MCP penalty basedmethods under bounded incoherence (Loh’14).For greedy algorithms like HTP and PHT, known support recovery resultsrequire poor dependence of κ̃ on |x̄min| (Shen’17).If S is the support set of the sth OMP iterate x̂s, and if S∗ \ S 6= φ, thenthere is a large additive decrease in objective if |x̄min| is larger than theappropriate noise level.
Large decrease in objective
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations s.t. S∗ \ S 6= φ
and |x̄min| &σγ√ρ+1
ρ−s+s∗
then with high probability
‖Ax̂s − y‖22 − ‖Ax̂s+1 − y‖22 & σ2 (4.8)
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
γ is similar to standard incoherence condition.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 13 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.
Support recovery and infinity norm bound
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.
|x̄min| &σγ√ρ+1
ρ−s+s∗
, then with high probability
1 S∗ ⊆ supp(x̂s)
2 ‖x̂s − x̄‖∞ . σ√
log s
ρ−s
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
Condition on |x̄min| scales as 1√n
since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least
√κ̃ than that in other recent works.
γ is allowed to be very large.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 14 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.
Support recovery and infinity norm bound
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.
|x̄min| &σγ√ρ+1
ρ−s+s∗
, then with high probability
1 S∗ ⊆ supp(x̂s)
2 ‖x̂s − x̄‖∞ . σ√
log s
ρ−s
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
Condition on |x̄min| scales as 1√n
since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least
√κ̃ than that in other recent works.
γ is allowed to be very large.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 14 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.
Support recovery and infinity norm bound
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.
|x̄min| &σγ√ρ+1
ρ−s+s∗
, then with high probability
1 S∗ ⊆ supp(x̂s)
2 ‖x̂s − x̄‖∞ . σ√
log s
ρ−s
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
Condition on |x̄min| scales as 1√n
since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least
√κ̃ than that in other recent works.
γ is allowed to be very large.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 14 / 18
Sparse Linear Regression (SLR) for OMP Upper bounds
Support Recovery upper bound
Since ‖Ax− y‖22 ≥ 0 ∀ x ∈ Rd, the number of extra iterations cannot be toolarge.
Support recovery and infinity norm bound
If x̂s is the output of OMP after s & s∗κ̃s+s∗ log κs+s∗ iterations, s.t.
|x̄min| &σγ√ρ+1
ρ−s+s∗
, then with high probability
1 S∗ ⊆ supp(x̂s)
2 ‖x̂s − x̄‖∞ . σ√
log s
ρ−s
where∥∥∥AT
S∗\SAS
(AT
SAS
)−1∥∥∥∞≤ γ and S = supp(x̂s).
Condition on |x̄min| scales as 1√n
since both ρ−s+s∗ and ρ+1 have a factor ofn. Also it is better by at-least
√κ̃ than that in other recent works.
γ is allowed to be very large.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 14 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗
iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄
x̄i =
{1√s∗
if 1 ≤ i ≤ s∗
0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}
Construct M(ε) ∈ Rn×d parameterized by ε
M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.
∥∥∥M(ε)i
∥∥∥22= n ∀ i ∈ [s∗].
M(ε)i =
√1− ε
[1√s∗
∑s∗
j=1 M(ε)j
]+√εgi ∀ i 6∈ [s∗] where gi’s are
orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗
iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄
x̄i =
{1√s∗
if 1 ≤ i ≤ s∗
0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}
Construct M(ε) ∈ Rn×d parameterized by ε
M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.
∥∥∥M(ε)i
∥∥∥22= n ∀ i ∈ [s∗].
M(ε)i =
√1− ε
[1√s∗
∑s∗
j=1 M(ε)j
]+√εgi ∀ i 6∈ [s∗] where gi’s are
orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗
iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄
x̄i =
{1√s∗
if 1 ≤ i ≤ s∗
0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}
Construct M(ε) ∈ Rn×d parameterized by ε
M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.
∥∥∥M(ε)i
∥∥∥22= n ∀ i ∈ [s∗].
M(ε)i =
√1− ε
[1√s∗
∑s∗
j=1 M(ε)j
]+√εgi ∀ i 6∈ [s∗] where gi’s are
orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗
iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄
x̄i =
{1√s∗
if 1 ≤ i ≤ s∗
0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}
Construct M(ε) ∈ Rn×d parameterized by ε
M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.
∥∥∥M(ε)i
∥∥∥22= n ∀ i ∈ [s∗].
M(ε)i =
√1− ε
[1√s∗
∑s∗
j=1 M(ε)j
]+√εgi ∀ i 6∈ [s∗] where gi’s are
orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
(Y. Zhang’15)’s lower bounds were for algorithms that output s∗-sparsesolutions which does not apply for OMP when it is run for more than s∗
iterations.We provide matching lower bounds for Support recovery as well asgeneralization error for OMP.Fool OMP into picking incorrect indexes. Large support size =⇒ largegeneralization error.Construct an evenly distributed x̄
x̄i =
{1√s∗
if 1 ≤ i ≤ s∗
0 if i > s∗=⇒ supp(x̄) = {1, 2, . . . , s∗}
Construct M(ε) ∈ Rn×d parameterized by ε
M(ε)1:s∗ are random s∗ orthogonal column vectors s.t.
∥∥∥M(ε)i
∥∥∥22= n ∀ i ∈ [s∗].
M(ε)i =
√1− ε
[1√s∗
∑s∗
j=1 M(ε)j
]+√εgi ∀ i 6∈ [s∗] where gi’s are
orthogonal to each other and M(ε)1:s∗ with ‖gi‖22 = n.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 15 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
M(0.25)1
M(0.25)2
1√2
∑2i=1 M
(0.25)i
g3
M(0.25)3
e1
e2
e3
Visualizing in d = 3 withs∗ = 2
ε = 0.25
Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
M(0.25)1
M(0.25)2
1√2
∑2i=1 M
(0.25)i
g3
M(0.25)3
e1
e2
e3
Visualizing in d = 3 withs∗ = 2
ε = 0.25
Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
M(0.25)1
M(0.25)2
1√2
∑2i=1 M
(0.25)i
g3
M(0.25)3
e1
e2
e3
Visualizing in d = 3 withs∗ = 2
ε = 0.25
Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
M(0.25)1
M(0.25)2
1√2
∑2i=1 M
(0.25)i
g3
M(0.25)3
e1
e2
e3
Visualizing in d = 3 withs∗ = 2
ε = 0.25
Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
M(0.25)1
M(0.25)2
1√2
∑2i=1 M
(0.25)i
g3
M(0.25)3
e1
e2
e3
Visualizing in d = 3 withs∗ = 2
ε = 0.25
Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
M(0.25)1
M(0.25)2
1√2
∑2i=1 M
(0.25)i
g3
M(0.25)3
e1
e2
e3
Visualizing in d = 3 withs∗ = 2
ε = 0.25
Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
M(0.25)1
M(0.25)2
1√2
∑2i=1 M
(0.25)i
g3
M(0.25)3
e1
e2
e3
Visualizing in d = 3 withs∗ = 2
ε = 0.25
Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bound instance construction
M(0.25)1
M(0.25)2
1√2
∑2i=1 M
(0.25)i
g3
M(0.25)3
e1
e2
e3
Visualizing in d = 3 withs∗ = 2
ε = 0.25
Smaller ε =⇒ more correlation.∴ ε is a proxy for κ̃s.
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 16 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bounds
Noiseless caseFor s∗ ≤ d ≤ n, ∃ ε > 0 s.t. when OMP is executed on SLR problem withy = M(ε)x̄ for s ≤ d− s∗ iterations
κ̃s(M(ε)) . s
s∗ and γ ≤√
23
S∗ ∩ supp(x̂s) = φ
Noisy case
For s∗ ≤ s ≤ d1−α where α ∈ (0, 1), ∃ ε > 0 s.t. when OMP is executed onSLR problem with y = M(ε)x̄ + η where η ∼ N
(0, σ2In×n
), then
κ̃s(M(ε)) . s
s∗ and γ ≤ 12
with high probability 1n ‖Ax̂s −Ax̄‖22 & 1
nσ2κ̃s+s∗s
∗
with high probability S∗ ∩ supp(x̂s) = φ
=⇒ s & κ̃ss∗ iterations are indeed necessary.
Addition of noise can only help.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 17 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bounds
Noiseless caseFor s∗ ≤ d ≤ n, ∃ ε > 0 s.t. when OMP is executed on SLR problem withy = M(ε)x̄ for s ≤ d− s∗ iterations
κ̃s(M(ε)) . s
s∗ and γ ≤√
23
S∗ ∩ supp(x̂s) = φ
Noisy case
For s∗ ≤ s ≤ d1−α where α ∈ (0, 1), ∃ ε > 0 s.t. when OMP is executed onSLR problem with y = M(ε)x̄ + η where η ∼ N
(0, σ2In×n
), then
κ̃s(M(ε)) . s
s∗ and γ ≤ 12
with high probability 1n ‖Ax̂s −Ax̄‖22 & 1
nσ2κ̃s+s∗s
∗
with high probability S∗ ∩ supp(x̂s) = φ
=⇒ s & κ̃ss∗ iterations are indeed necessary.
Addition of noise can only help.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 17 / 18
Sparse Linear Regression (SLR) for OMP Lower bounds
Lower bounds
Noiseless caseFor s∗ ≤ d ≤ n, ∃ ε > 0 s.t. when OMP is executed on SLR problem withy = M(ε)x̄ for s ≤ d− s∗ iterations
κ̃s(M(ε)) . s
s∗ and γ ≤√
23
S∗ ∩ supp(x̂s) = φ
Noisy case
For s∗ ≤ s ≤ d1−α where α ∈ (0, 1), ∃ ε > 0 s.t. when OMP is executed onSLR problem with y = M(ε)x̄ + η where η ∼ N
(0, σ2In×n
), then
κ̃s(M(ε)) . s
s∗ and γ ≤ 12
with high probability 1n ‖Ax̂s −Ax̄‖22 & 1
nσ2κ̃s+s∗s
∗
with high probability S∗ ∩ supp(x̂s) = φ
=⇒ s & κ̃ss∗ iterations are indeed necessary.
Addition of noise can only help.Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 17 / 18
Sparse Linear Regression (SLR) for OMP Simulations
Simulations
We perform simulations on the lower bound instance class.M(ε) ∈ R1000×100 and s∗ = 10.
(a) Varying condition number (b) Varying noise variance
Figure: Number of iterations required for recovering the full support of x̄ with respect tothe restricted condition number (κ̃s+s∗ ) of design matrix and the variance of noise (σ2).
Somani, Gupta, Jain and Netrapalli Support Recovery of OMP September 24, 2018 18 / 18