Compressed sensing of streaming data
Nick Freris Orhan Ocal Martin Vetterli
Ecole Polytechnique Federale de Lausanne
3 October 2013
51st Annual Allerton Conference
Outline
Background
Recursive Compressed SensingRecursive samplingRecursive estimation
Analysis
Simulations
Outline
Background
Recursive Compressed SensingRecursive samplingRecursive estimation
Analysis
Simulations
Compressed sensing
Sampling:
m << n
support(x) := i : xi 6= 0‖x‖0 := | support(x)|x k-sparse ⇔ ‖x‖0 ≤ k
Goal: Recover sparse vector x from measurement y
Restricted Isometry Property (RIP)
(1− δk)‖x‖22 ≤ ‖Ax‖2
2 ≤ (1 + δk)‖x‖22, ∀x k − sparse
Random matrices: Gaussian, Bernoulli, etc.
1
Compressed sensing
Sampling:
m << n
support(x) := i : xi 6= 0‖x‖0 := | support(x)|
x k-sparse ⇔ ‖x‖0 ≤ k
Goal: Recover sparse vector x from measurement y
Restricted Isometry Property (RIP)
(1− δk)‖x‖22 ≤ ‖Ax‖2
2 ≤ (1 + δk)‖x‖22, ∀x k − sparse
Random matrices: Gaussian, Bernoulli, etc.
1
Compressed sensing
Sampling:
m << n
support(x) := i : xi 6= 0‖x‖0 := | support(x)|x k-sparse ⇔ ‖x‖0 ≤ k
Goal: Recover sparse vector x from measurement y
Restricted Isometry Property (RIP)
(1− δk)‖x‖22 ≤ ‖Ax‖2
2 ≤ (1 + δk)‖x‖22, ∀x k − sparse
Random matrices: Gaussian, Bernoulli, etc.
1
Compressed sensing
Sampling:
m << n
support(x) := i : xi 6= 0‖x‖0 := | support(x)|x k-sparse ⇔ ‖x‖0 ≤ k
Goal: Recover sparse vector x from measurement y
Restricted Isometry Property (RIP)
(1− δk)‖x‖22 ≤ ‖Ax‖2
2 ≤ (1 + δk)‖x‖22, ∀x k − sparse
Random matrices: Gaussian, Bernoulli, etc.
1
Compressed sensing
Sampling:
m << n
support(x) := i : xi 6= 0‖x‖0 := | support(x)|x k-sparse ⇔ ‖x‖0 ≤ k
Goal: Recover sparse vector x from measurement y
Restricted Isometry Property (RIP)
(1− δk)‖x‖22 ≤ ‖Ax‖2
2 ≤ (1 + δk)‖x‖22, ∀x k − sparse
Random matrices: Gaussian, Bernoulli, etc.1
CS - Noiseless case
Given:
y = Axy ∈ Rm, A ∈ Rm×n, m << n
Goal: Recover sparse vector x
`0 Minimization:
minimize ‖x‖0
subject to Ax = y(P0)
(Combinatorial - Intractable)
⇔Basis Pursuit:
minimize ‖x‖1
subject to Ax = y(BP)
(Linear Program)
Theoremevery k-sparse vector x is exactly recovered by (BP) if δ2k (A) <
√2− 1.
2
CS - Noiseless case
Given:
y = Axy ∈ Rm, A ∈ Rm×n, m << n
Goal: Recover sparse vector x
`0 Minimization:
minimize ‖x‖0
subject to Ax = y(P0)
(Combinatorial - Intractable)
⇔Basis Pursuit:
minimize ‖x‖1
subject to Ax = y(BP)
(Linear Program)
Theoremevery k-sparse vector x is exactly recovered by (BP) if δ2k (A) <
√2− 1.
2
CS - Noiseless case
Given:
y = Axy ∈ Rm, A ∈ Rm×n, m << n
Goal: Recover sparse vector x
`0 Minimization:
minimize ‖x‖0
subject to Ax = y(P0)
(Combinatorial - Intractable)
⇔
Basis Pursuit:
minimize ‖x‖1
subject to Ax = y(BP)
(Linear Program)
Theoremevery k-sparse vector x is exactly recovered by (BP) if δ2k (A) <
√2− 1.
2
CS - Noiseless case
Given:
y = Axy ∈ Rm, A ∈ Rm×n, m << n
Goal: Recover sparse vector x
`0 Minimization:
minimize ‖x‖0
subject to Ax = y(P0)
(Combinatorial - Intractable)
⇔Basis Pursuit:
minimize ‖x‖1
subject to Ax = y(BP)
(Linear Program)
Theorem1
every k-sparse vector x is exactly recovered by (BP) if δ2k (A) <√
2− 1.1Candes and Wakin, “An Introduction To Compressive Sampling”, 2008.
2
CS - Noisy case
Setting:
y = Ax + wx sparse
LASSO (Constrained):minimize ‖x‖1
subject to ‖Ax− y‖2 ≤ σ(LC )
LASSO (Unconstrained):
minimize ‖Ax− y‖22 + λ‖x‖1 (LU)
3
CS - Noisy case
Setting:
y = Ax + wx sparse
LASSO (Constrained):minimize ‖x‖1
subject to ‖Ax− y‖2 ≤ σ(LC )
LASSO (Unconstrained):
minimize ‖Ax− y‖22 + λ‖x‖1 (LU)
3
LASSO estimation
Theorem2
Solution x∗ to (LC ) satisfies:
‖x∗ − x‖2 ≤ C0 · ‖x− xk‖1/√k
model mismatch
+ C1 · σ
noise
xk : k-many highest magnitude elements.
Assumptions: δ2k (A) <√
2− 1 and ‖w‖2 ≤ σ
2Candes and Wakin, “An Introduction To Compressive Sampling”, 2008.4
LASSO estimation
Theorem2
Solution x∗ to (LC ) satisfies:
‖x∗ − x‖2 ≤ C0 · ‖x− xk‖1/√k
model mismatch
+ C1 · σ
noise
xk : k-many highest magnitude elements.
Assumptions: δ2k (A) <√
2− 1 and ‖w‖2 ≤ σ
2Candes and Wakin, “An Introduction To Compressive Sampling”, 2008.4
LASSO estimation
Theorem2
Solution x∗ to (LC ) satisfies:
‖x∗ − x‖2 ≤ C0 · ‖x− xk‖1/√k
model mismatch
+ C1 · σnoise
xk : k-many highest magnitude elements.
Assumptions: δ2k (A) <√
2− 1 and ‖w‖2 ≤ σ
2Candes and Wakin, “An Introduction To Compressive Sampling”, 2008.4
Support estimation with LASSO
Theorem3
LASSO estimates satisfy:
support(x) = support(x)
sgn(xi ) = sgn(xi ) for every i
with probability ≥ 1− O(
1n√
log n
)− k
n2 .
Assumptions: AWGN, nonzero entires Ω(log n)
3Candes and Plan, “Near-ideal model selection by `1 minimization”, 2009.5
Outline
Background
Recursive Compressed SensingRecursive samplingRecursive estimation
Analysis
Simulations
Problem formulation
Setup: x x0 x1 x2 . . . xn−1 xn . . .
x(0) x0 x1 x2 . . . xn−1
...x1 x2 . . . xn−1 xn
x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1
Measurements: y(i) = A(i)x(i) + w(i)
I EncodingRecursive Sampling:
y(i+1) ← f (y(i), xi+n, xi )
I DecodingRecursive Estimation:
x(i+1) ← g(x(i), y(i+1))
6
Problem formulation
Setup: x x0 x1 x2 . . . xn−1 xn . . .
x(0) x0 x1 x2 . . . xn−1
...x1 x2 . . . xn−1 xn
x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1
Measurements: y(i) = A(i)x(i) + w(i)
I EncodingRecursive Sampling:
y(i+1) ← f (y(i), xi+n, xi )
I DecodingRecursive Estimation:
x(i+1) ← g(x(i), y(i+1))
6
Problem formulation
Setup: x x0 x1 x2 . . . xn−1 xn . . .
x(0) x0 x1 x2 . . . xn−1
x(1) x1 x2 . . . xn−1 xn
x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1
Measurements: y(i) = A(i)x(i) + w(i)
I EncodingRecursive Sampling:
y(i+1) ← f (y(i), xi+n, xi )
I DecodingRecursive Estimation:
x(i+1) ← g(x(i), y(i+1))
6
Problem formulation
Setup: x x0 x1 x2 . . . xn−1 xn . . .
x(0) x0 x1 x2 . . . xn−1
...
x1 x2 . . . xn−1 xn
x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1
Measurements: y(i) = A(i)x(i) + w(i)
I EncodingRecursive Sampling:
y(i+1) ← f (y(i), xi+n, xi )
I DecodingRecursive Estimation:
x(i+1) ← g(x(i), y(i+1))
6
Problem formulation
Setup: x x0 x1 x2 . . . xn−1 xn . . .
x(0) x0 x1 x2 . . . xn−1
...
x1 x2 . . . xn−1 xn
x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1
Measurements: y(i) = A(i)x(i) + w(i)
I EncodingRecursive Sampling:
y(i+1) ← f (y(i), xi+n, xi )
I DecodingRecursive Estimation:
x(i+1) ← g(x(i), y(i+1))
6
Problem formulation
Setup: x x0 x1 x2 . . . xn−1 xn . . .
x(0) x0 x1 x2 . . . xn−1
...
x1 x2 . . . xn−1 xn
x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1
Measurements: y(i) = A(i)x(i) + w(i)
I EncodingRecursive Sampling:
y(i+1) ← f (y(i), xi+n, xi )
I DecodingRecursive Estimation:
x(i+1) ← g(x(i), y(i+1))
6
Problem formulation
Setup: x x0 x1 x2 . . . xn−1 xn . . .
x(0) x0 x1 x2 . . . xn−1
...
x1 x2 . . . xn−1 xn
x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1
Measurements: y(i) = A(i)x(i) + w(i)
I EncodingRecursive Sampling:
y(i+1) ← f (y(i), xi+n, xi )
I DecodingRecursive Estimation:
x(i+1) ← g(x(i), y(i+1))
6
Recursive sampling
Take A(i+1) = A(i)Π:
A(0) =
| | | |a0 a1 · · · an−2 an−1
| | | |
→ A(1) =
| | | |a1 a2 · · · an−1 a0
| | | |
· · ·
LemmaIf A(0) satisfies RIP then A(i) satisfies RIP ∀i
Update rule:
For measurements y(i) = A(i)x(i) + w(i) it follows:
y(i+1) = y(i) + (xi+n − xi )a(i)0
rank-1 update
+ v(i+1)
w(i+1)−w(i)
7
Recursive sampling
Take A(i+1) = A(i)Π:
A(0) =
| | | |a0 a1 · · · an−2 an−1
| | | |
→ A(1) =
| | | |a1 a2 · · · an−1 a0
| | | |
· · ·LemmaIf A(0) satisfies RIP then A(i) satisfies RIP ∀i
Update rule:
For measurements y(i) = A(i)x(i) + w(i) it follows:
y(i+1) = y(i) + (xi+n − xi )a(i)0
rank-1 update
+ v(i+1)
w(i+1)−w(i)
7
Recursive sampling
Take A(i+1) = A(i)Π:
A(0) =
| | | |a0 a1 · · · an−2 an−1
| | | |
→ A(1) =
| | | |a1 a2 · · · an−1 a0
| | | |
· · ·LemmaIf A(0) satisfies RIP then A(i) satisfies RIP ∀i
Update rule:
For measurements y(i) = A(i)x(i) + w(i) it follows:
y(i+1) = y(i) + (xi+n − xi )a(i)0
rank-1 update
+ v(i+1)
w(i+1)−w(i)
7
Recursive sampling
Take A(i+1) = A(i)Π:
A(0) =
| | | |a0 a1 · · · an−2 an−1
| | | |
→ A(1) =
| | | |a1 a2 · · · an−1 a0
| | | |
· · ·LemmaIf A(0) satisfies RIP then A(i) satisfies RIP ∀i
Update rule:
For measurements y(i) = A(i)x(i) + w(i) it follows:
y(i+1) = y(i) + (xi+n − xi )a(i)0
rank-1 update
+ v(i+1)
w(i+1)−w(i)
7
Recursive estimation
Given an iterative solver for LASSONumber of iterations for convergence, T , increases with ‖xinit − x∗‖2
I Utilize previous estimate for warm start: x(i+1)init ←
[x
(i)1 x
(i)2 · · · x (i)
n−1 0]
8
Recursive estimation
Given an iterative solver for LASSONumber of iterations for convergence, T , increases with ‖xinit − x∗‖2
I Utilize previous estimate for warm start: x(i+1)init ←
[x
(i)1 x
(i)2 · · · x (i)
n−1 0]
8
Algorithm
Recursive
SamplingRecursive
Estimation
Support
Detection
LSE on
Support Set
Delay
Averaging
Delay
Figure : Architecture of RCS.
RCS Algorithm:I Recursive sampling and estimation
I Support detection by LASSO
I Ordinary LSE on estimated support
I Averaging the least squares estimates
9
Algorithm
Recursive
SamplingRecursive
Estimation
Support
Detection
LSE on
Support Set
Delay
Averaging
Delay
Figure : Architecture of RCS.
RCS Algorithm:I Recursive sampling and estimation
I Support detection by LASSO
I Ordinary LSE on estimated support
I Averaging the least squares estimates
9
Algorithm
Recursive
SamplingRecursive
Estimation
Support
Detection
LSE on
Support Set
Delay
Averaging
Delay
Figure : Architecture of RCS.
RCS Algorithm:I Recursive sampling and estimation
I Support detection by LASSO
I Ordinary LSE on estimated support
I Averaging the least squares estimates
9
Algorithm
Recursive
SamplingRecursive
Estimation
Support
Detection
LSE on
Support Set
Delay
Averaging
Delay
Figure : Architecture of RCS.
RCS Algorithm:I Recursive sampling and estimation
I Support detection by LASSO
I Ordinary LSE on estimated support
I Averaging the least squares estimates9
Support detection
Voting algorithm:
I Solve LASSO to get estimate
I Add votes on indices havingmagnitude ≥ ξ1
I LSE on indices havingcumulative votes ≥ ξ2
10
Support detection
Voting algorithm:
I Solve LASSO to get estimate
I Add votes on indices havingmagnitude ≥ ξ1
I LSE on indices havingcumulative votes ≥ ξ2
10
Support detection
Voting algorithm:
I Solve LASSO to get estimate
I Add votes on indices havingmagnitude ≥ ξ1
I LSE on indices havingcumulative votes ≥ ξ2
10
Outline
Background
Recursive Compressed SensingRecursive samplingRecursive estimation
Analysis
Simulations
Estimation error variance
Theorem (Normalized Mean Error)
Ex
[‖x(i) − x(i)‖2
‖x(i)‖2
]≤Pn · c1
1√n log n
+ (1− Pn) c2
c1, c2 constants, Pn ≥(
1− O(
1n√
log n
)− k
n2
)2n−1
Goes to 0 as n→∞ for k = O(n1−ε).
11
Estimation error variance
Theorem (Normalized Mean Error)
Ex
[‖x(i) − x(i)‖2
‖x(i)‖2
]≤Pn · c1
1√n log n
+ (1− Pn) c2
c1, c2 constants, Pn ≥(
1− O(
1n√
log n
)− k
n2
)2n−1
Goes to 0 as n→∞ for k = O(n1−ε).
11
Computational complexity
I Sampling with rank-1 update: O(m)
I Estimation:
• Computations in single iteration: ≥ O(mn) (A ∈ Rm×n)
• Number of iterations, T : O(√m)†
O(nm3/2)
• Least squares: O(k3)
T increases with ‖xinit − x∗‖2
‖x(i)init − x∗(i)‖2 ≤ C0
‖x(i) − x(i)k ‖1√
k︸ ︷︷ ︸=0
for x(i) k-sparse
+ C1σ︸︷︷︸noise
+ |x (i)n−1|︸ ︷︷ ︸O(1)
σ2 = σ2(m + 2
√2m)
†
(recall m = O(k log(n/k)))
k Computational Complexity
O(1) O(n(log n)3/2
)O(n) O
(n3)
12
Computational complexity
I Sampling with rank-1 update: O(m)
I Estimation:
• Computations in single iteration: ≥ O(mn) (A ∈ Rm×n)
• Number of iterations, T : O(√m)†
O(nm3/2)
• Least squares: O(k3)
T increases with ‖xinit − x∗‖2
‖x(i)init − x∗(i)‖2 ≤ C0
‖x(i) − x(i)k ‖1√
k︸ ︷︷ ︸=0
for x(i) k-sparse
+ C1σ︸︷︷︸noise
+ |x (i)n−1|︸ ︷︷ ︸O(1)
σ2 = σ2(m + 2
√2m)
†
(recall m = O(k log(n/k)))
k Computational Complexity
O(1) O(n(log n)3/2
)O(n) O
(n3)
12
Computational complexity
I Sampling with rank-1 update: O(m)
I Estimation:
• Computations in single iteration: ≥ O(mn) (A ∈ Rm×n)
• Number of iterations, T : O(√m)†
O(nm3/2)
• Least squares: O(k3)
T increases with ‖xinit − x∗‖2
‖x(i)init − x∗(i)‖2 ≤ C0
‖x(i) − x(i)k ‖1√
k︸ ︷︷ ︸=0
for x(i) k-sparse
+ C1σ︸︷︷︸noise
+ |x (i)n−1|︸ ︷︷ ︸O(1)
σ2 = σ2(m + 2
√2m)
†
(recall m = O(k log(n/k)))
k Computational Complexity
O(1) O(n(log n)3/2
)O(n) O
(n3)
12
Computational complexity
I Sampling with rank-1 update: O(m)
I Estimation:
• Computations in single iteration: ≥ O(mn) (A ∈ Rm×n)
• Number of iterations, T : O(√m)†
O(nm3/2)
• Least squares: O(k3)
T increases with ‖xinit − x∗‖2
‖x(i)init − x∗(i)‖2 ≤ C0
‖x(i) − x(i)k ‖1√
k︸ ︷︷ ︸=0
for x(i) k-sparse
+ C1σ︸︷︷︸noise
+ |x (i)n−1|︸ ︷︷ ︸O(1)
σ2 = σ2(m + 2
√2m)
†
(recall m = O(k log(n/k)))
k Computational Complexity
O(1) O(n(log n)3/2
)O(n) O
(n3)
12
Outline
Background
Recursive Compressed SensingRecursive samplingRecursive estimation
Analysis
Simulations
Runtime
0 500 1000 1500 2000 2500 30000
0.5
1
1.5
2
2.5
Runtime Plot
window size
tim
e (
s)
naive approach
RCS
Figure : Average time required to solve one window
k = 0.05n, m = 5k, w(i) ∼ N(0, σ2I
), σ = 0.01
13
Support estimation
Support estimation accuracy:
Define:
I true support := support(x)
I detected support := support(x)
Performance metrics:
I true positive rate (TPR) =|detected support ∩ true support|
|true support|
I false positive rate (FPR) =|detected support \ true support|
n − |true support|
14
Support estimation
300 400 500 600 700 8000
0.2
0.4
0.6
0.8
1
number of samples (m)
true positive rate and false positive rate
threshold = 0.01
threshold = 0.10
threshold = 1.00
Figure : Circle markers: true positive rate. Square markers: false positive rate.
n = 6000, σ = 0.1, min |xi | ≥ 3.34, ξ1 = 0.01, 0.10 and 1.00.14
RCS error
100 200 300 400 500 600 700 800 900 10000.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
RCS normalized error
norm
ali
zed
err
or
window length
Figure : Normalized error∑T
i=1(xi−xi )2∑Ti=1(xi )2 vs. window length.
AWGN σ = 0.1, 5% sparsity, A random Gaussian, m = 5× k, T = 60, 00015
Conclusion
Compressed Sensing on streaming dataI Encoding:
• Recursive sampling with minimal computational overhead (rank-1 update)
I Decoding:
• Recursive estimation
− warm start for faster convergence
− voting and averaging for reconstruction error variance reduction
16
Thank you!
Thank you!