Compressed sensing of streaming data€¦ · Recursive Compressed Sensing Recursive sampling...

Preview:

Citation preview

Compressed sensing of streaming data

Nick Freris Orhan Ocal Martin Vetterli

Ecole Polytechnique Federale de Lausanne

3 October 2013

51st Annual Allerton Conference

Outline

Background

Recursive Compressed SensingRecursive samplingRecursive estimation

Analysis

Simulations

Outline

Background

Recursive Compressed SensingRecursive samplingRecursive estimation

Analysis

Simulations

Compressed sensing

Sampling:

m << n

support(x) := i : xi 6= 0‖x‖0 := | support(x)|x k-sparse ⇔ ‖x‖0 ≤ k

Goal: Recover sparse vector x from measurement y

Restricted Isometry Property (RIP)

(1− δk)‖x‖22 ≤ ‖Ax‖2

2 ≤ (1 + δk)‖x‖22, ∀x k − sparse

Random matrices: Gaussian, Bernoulli, etc.

1

Compressed sensing

Sampling:

m << n

support(x) := i : xi 6= 0‖x‖0 := | support(x)|

x k-sparse ⇔ ‖x‖0 ≤ k

Goal: Recover sparse vector x from measurement y

Restricted Isometry Property (RIP)

(1− δk)‖x‖22 ≤ ‖Ax‖2

2 ≤ (1 + δk)‖x‖22, ∀x k − sparse

Random matrices: Gaussian, Bernoulli, etc.

1

Compressed sensing

Sampling:

m << n

support(x) := i : xi 6= 0‖x‖0 := | support(x)|x k-sparse ⇔ ‖x‖0 ≤ k

Goal: Recover sparse vector x from measurement y

Restricted Isometry Property (RIP)

(1− δk)‖x‖22 ≤ ‖Ax‖2

2 ≤ (1 + δk)‖x‖22, ∀x k − sparse

Random matrices: Gaussian, Bernoulli, etc.

1

Compressed sensing

Sampling:

m << n

support(x) := i : xi 6= 0‖x‖0 := | support(x)|x k-sparse ⇔ ‖x‖0 ≤ k

Goal: Recover sparse vector x from measurement y

Restricted Isometry Property (RIP)

(1− δk)‖x‖22 ≤ ‖Ax‖2

2 ≤ (1 + δk)‖x‖22, ∀x k − sparse

Random matrices: Gaussian, Bernoulli, etc.

1

Compressed sensing

Sampling:

m << n

support(x) := i : xi 6= 0‖x‖0 := | support(x)|x k-sparse ⇔ ‖x‖0 ≤ k

Goal: Recover sparse vector x from measurement y

Restricted Isometry Property (RIP)

(1− δk)‖x‖22 ≤ ‖Ax‖2

2 ≤ (1 + δk)‖x‖22, ∀x k − sparse

Random matrices: Gaussian, Bernoulli, etc.1

CS - Noiseless case

Given:

y = Axy ∈ Rm, A ∈ Rm×n, m << n

Goal: Recover sparse vector x

`0 Minimization:

minimize ‖x‖0

subject to Ax = y(P0)

(Combinatorial - Intractable)

⇔Basis Pursuit:

minimize ‖x‖1

subject to Ax = y(BP)

(Linear Program)

Theoremevery k-sparse vector x is exactly recovered by (BP) if δ2k (A) <

√2− 1.

2

CS - Noiseless case

Given:

y = Axy ∈ Rm, A ∈ Rm×n, m << n

Goal: Recover sparse vector x

`0 Minimization:

minimize ‖x‖0

subject to Ax = y(P0)

(Combinatorial - Intractable)

⇔Basis Pursuit:

minimize ‖x‖1

subject to Ax = y(BP)

(Linear Program)

Theoremevery k-sparse vector x is exactly recovered by (BP) if δ2k (A) <

√2− 1.

2

CS - Noiseless case

Given:

y = Axy ∈ Rm, A ∈ Rm×n, m << n

Goal: Recover sparse vector x

`0 Minimization:

minimize ‖x‖0

subject to Ax = y(P0)

(Combinatorial - Intractable)

Basis Pursuit:

minimize ‖x‖1

subject to Ax = y(BP)

(Linear Program)

Theoremevery k-sparse vector x is exactly recovered by (BP) if δ2k (A) <

√2− 1.

2

CS - Noiseless case

Given:

y = Axy ∈ Rm, A ∈ Rm×n, m << n

Goal: Recover sparse vector x

`0 Minimization:

minimize ‖x‖0

subject to Ax = y(P0)

(Combinatorial - Intractable)

⇔Basis Pursuit:

minimize ‖x‖1

subject to Ax = y(BP)

(Linear Program)

Theorem1

every k-sparse vector x is exactly recovered by (BP) if δ2k (A) <√

2− 1.1Candes and Wakin, “An Introduction To Compressive Sampling”, 2008.

2

CS - Noisy case

Setting:

y = Ax + wx sparse

LASSO (Constrained):minimize ‖x‖1

subject to ‖Ax− y‖2 ≤ σ(LC )

LASSO (Unconstrained):

minimize ‖Ax− y‖22 + λ‖x‖1 (LU)

3

CS - Noisy case

Setting:

y = Ax + wx sparse

LASSO (Constrained):minimize ‖x‖1

subject to ‖Ax− y‖2 ≤ σ(LC )

LASSO (Unconstrained):

minimize ‖Ax− y‖22 + λ‖x‖1 (LU)

3

LASSO estimation

Theorem2

Solution x∗ to (LC ) satisfies:

‖x∗ − x‖2 ≤ C0 · ‖x− xk‖1/√k

model mismatch

+ C1 · σ

noise

xk : k-many highest magnitude elements.

Assumptions: δ2k (A) <√

2− 1 and ‖w‖2 ≤ σ

2Candes and Wakin, “An Introduction To Compressive Sampling”, 2008.4

LASSO estimation

Theorem2

Solution x∗ to (LC ) satisfies:

‖x∗ − x‖2 ≤ C0 · ‖x− xk‖1/√k

model mismatch

+ C1 · σ

noise

xk : k-many highest magnitude elements.

Assumptions: δ2k (A) <√

2− 1 and ‖w‖2 ≤ σ

2Candes and Wakin, “An Introduction To Compressive Sampling”, 2008.4

LASSO estimation

Theorem2

Solution x∗ to (LC ) satisfies:

‖x∗ − x‖2 ≤ C0 · ‖x− xk‖1/√k

model mismatch

+ C1 · σnoise

xk : k-many highest magnitude elements.

Assumptions: δ2k (A) <√

2− 1 and ‖w‖2 ≤ σ

2Candes and Wakin, “An Introduction To Compressive Sampling”, 2008.4

Support estimation with LASSO

Theorem3

LASSO estimates satisfy:

support(x) = support(x)

sgn(xi ) = sgn(xi ) for every i

with probability ≥ 1− O(

1n√

log n

)− k

n2 .

Assumptions: AWGN, nonzero entires Ω(log n)

3Candes and Plan, “Near-ideal model selection by `1 minimization”, 2009.5

Outline

Background

Recursive Compressed SensingRecursive samplingRecursive estimation

Analysis

Simulations

Problem formulation

Setup: x x0 x1 x2 . . . xn−1 xn . . .

x(0) x0 x1 x2 . . . xn−1

...x1 x2 . . . xn−1 xn

x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1

Measurements: y(i) = A(i)x(i) + w(i)

I EncodingRecursive Sampling:

y(i+1) ← f (y(i), xi+n, xi )

I DecodingRecursive Estimation:

x(i+1) ← g(x(i), y(i+1))

6

Problem formulation

Setup: x x0 x1 x2 . . . xn−1 xn . . .

x(0) x0 x1 x2 . . . xn−1

...x1 x2 . . . xn−1 xn

x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1

Measurements: y(i) = A(i)x(i) + w(i)

I EncodingRecursive Sampling:

y(i+1) ← f (y(i), xi+n, xi )

I DecodingRecursive Estimation:

x(i+1) ← g(x(i), y(i+1))

6

Problem formulation

Setup: x x0 x1 x2 . . . xn−1 xn . . .

x(0) x0 x1 x2 . . . xn−1

x(1) x1 x2 . . . xn−1 xn

x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1

Measurements: y(i) = A(i)x(i) + w(i)

I EncodingRecursive Sampling:

y(i+1) ← f (y(i), xi+n, xi )

I DecodingRecursive Estimation:

x(i+1) ← g(x(i), y(i+1))

6

Problem formulation

Setup: x x0 x1 x2 . . . xn−1 xn . . .

x(0) x0 x1 x2 . . . xn−1

...

x1 x2 . . . xn−1 xn

x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1

Measurements: y(i) = A(i)x(i) + w(i)

I EncodingRecursive Sampling:

y(i+1) ← f (y(i), xi+n, xi )

I DecodingRecursive Estimation:

x(i+1) ← g(x(i), y(i+1))

6

Problem formulation

Setup: x x0 x1 x2 . . . xn−1 xn . . .

x(0) x0 x1 x2 . . . xn−1

...

x1 x2 . . . xn−1 xn

x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1

Measurements: y(i) = A(i)x(i) + w(i)

I EncodingRecursive Sampling:

y(i+1) ← f (y(i), xi+n, xi )

I DecodingRecursive Estimation:

x(i+1) ← g(x(i), y(i+1))

6

Problem formulation

Setup: x x0 x1 x2 . . . xn−1 xn . . .

x(0) x0 x1 x2 . . . xn−1

...

x1 x2 . . . xn−1 xn

x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1

Measurements: y(i) = A(i)x(i) + w(i)

I EncodingRecursive Sampling:

y(i+1) ← f (y(i), xi+n, xi )

I DecodingRecursive Estimation:

x(i+1) ← g(x(i), y(i+1))

6

Problem formulation

Setup: x x0 x1 x2 . . . xn−1 xn . . .

x(0) x0 x1 x2 . . . xn−1

...

x1 x2 . . . xn−1 xn

x(i) . . . xi xi+1 . . . xi+n−2 xi+n−1

Measurements: y(i) = A(i)x(i) + w(i)

I EncodingRecursive Sampling:

y(i+1) ← f (y(i), xi+n, xi )

I DecodingRecursive Estimation:

x(i+1) ← g(x(i), y(i+1))

6

Recursive sampling

Take A(i+1) = A(i)Π:

A(0) =

| | | |a0 a1 · · · an−2 an−1

| | | |

→ A(1) =

| | | |a1 a2 · · · an−1 a0

| | | |

· · ·

LemmaIf A(0) satisfies RIP then A(i) satisfies RIP ∀i

Update rule:

For measurements y(i) = A(i)x(i) + w(i) it follows:

y(i+1) = y(i) + (xi+n − xi )a(i)0

rank-1 update

+ v(i+1)

w(i+1)−w(i)

7

Recursive sampling

Take A(i+1) = A(i)Π:

A(0) =

| | | |a0 a1 · · · an−2 an−1

| | | |

→ A(1) =

| | | |a1 a2 · · · an−1 a0

| | | |

· · ·LemmaIf A(0) satisfies RIP then A(i) satisfies RIP ∀i

Update rule:

For measurements y(i) = A(i)x(i) + w(i) it follows:

y(i+1) = y(i) + (xi+n − xi )a(i)0

rank-1 update

+ v(i+1)

w(i+1)−w(i)

7

Recursive sampling

Take A(i+1) = A(i)Π:

A(0) =

| | | |a0 a1 · · · an−2 an−1

| | | |

→ A(1) =

| | | |a1 a2 · · · an−1 a0

| | | |

· · ·LemmaIf A(0) satisfies RIP then A(i) satisfies RIP ∀i

Update rule:

For measurements y(i) = A(i)x(i) + w(i) it follows:

y(i+1) = y(i) + (xi+n − xi )a(i)0

rank-1 update

+ v(i+1)

w(i+1)−w(i)

7

Recursive sampling

Take A(i+1) = A(i)Π:

A(0) =

| | | |a0 a1 · · · an−2 an−1

| | | |

→ A(1) =

| | | |a1 a2 · · · an−1 a0

| | | |

· · ·LemmaIf A(0) satisfies RIP then A(i) satisfies RIP ∀i

Update rule:

For measurements y(i) = A(i)x(i) + w(i) it follows:

y(i+1) = y(i) + (xi+n − xi )a(i)0

rank-1 update

+ v(i+1)

w(i+1)−w(i)

7

Recursive estimation

Given an iterative solver for LASSONumber of iterations for convergence, T , increases with ‖xinit − x∗‖2

I Utilize previous estimate for warm start: x(i+1)init ←

[x

(i)1 x

(i)2 · · · x (i)

n−1 0]

8

Recursive estimation

Given an iterative solver for LASSONumber of iterations for convergence, T , increases with ‖xinit − x∗‖2

I Utilize previous estimate for warm start: x(i+1)init ←

[x

(i)1 x

(i)2 · · · x (i)

n−1 0]

8

Algorithm

Recursive

SamplingRecursive

Estimation

Support

Detection

LSE on

Support Set

Delay

Averaging

Delay

Figure : Architecture of RCS.

RCS Algorithm:I Recursive sampling and estimation

I Support detection by LASSO

I Ordinary LSE on estimated support

I Averaging the least squares estimates

9

Algorithm

Recursive

SamplingRecursive

Estimation

Support

Detection

LSE on

Support Set

Delay

Averaging

Delay

Figure : Architecture of RCS.

RCS Algorithm:I Recursive sampling and estimation

I Support detection by LASSO

I Ordinary LSE on estimated support

I Averaging the least squares estimates

9

Algorithm

Recursive

SamplingRecursive

Estimation

Support

Detection

LSE on

Support Set

Delay

Averaging

Delay

Figure : Architecture of RCS.

RCS Algorithm:I Recursive sampling and estimation

I Support detection by LASSO

I Ordinary LSE on estimated support

I Averaging the least squares estimates

9

Algorithm

Recursive

SamplingRecursive

Estimation

Support

Detection

LSE on

Support Set

Delay

Averaging

Delay

Figure : Architecture of RCS.

RCS Algorithm:I Recursive sampling and estimation

I Support detection by LASSO

I Ordinary LSE on estimated support

I Averaging the least squares estimates9

Support detection

Voting algorithm:

I Solve LASSO to get estimate

I Add votes on indices havingmagnitude ≥ ξ1

I LSE on indices havingcumulative votes ≥ ξ2

10

Support detection

Voting algorithm:

I Solve LASSO to get estimate

I Add votes on indices havingmagnitude ≥ ξ1

I LSE on indices havingcumulative votes ≥ ξ2

10

Support detection

Voting algorithm:

I Solve LASSO to get estimate

I Add votes on indices havingmagnitude ≥ ξ1

I LSE on indices havingcumulative votes ≥ ξ2

10

Outline

Background

Recursive Compressed SensingRecursive samplingRecursive estimation

Analysis

Simulations

Estimation error variance

Theorem (Normalized Mean Error)

Ex

[‖x(i) − x(i)‖2

‖x(i)‖2

]≤Pn · c1

1√n log n

+ (1− Pn) c2

c1, c2 constants, Pn ≥(

1− O(

1n√

log n

)− k

n2

)2n−1

Goes to 0 as n→∞ for k = O(n1−ε).

11

Estimation error variance

Theorem (Normalized Mean Error)

Ex

[‖x(i) − x(i)‖2

‖x(i)‖2

]≤Pn · c1

1√n log n

+ (1− Pn) c2

c1, c2 constants, Pn ≥(

1− O(

1n√

log n

)− k

n2

)2n−1

Goes to 0 as n→∞ for k = O(n1−ε).

11

Computational complexity

I Sampling with rank-1 update: O(m)

I Estimation:

• Computations in single iteration: ≥ O(mn) (A ∈ Rm×n)

• Number of iterations, T : O(√m)†

O(nm3/2)

• Least squares: O(k3)

T increases with ‖xinit − x∗‖2

‖x(i)init − x∗(i)‖2 ≤ C0

‖x(i) − x(i)k ‖1√

k︸ ︷︷ ︸=0

for x(i) k-sparse

+ C1σ︸︷︷︸noise

+ |x (i)n−1|︸ ︷︷ ︸O(1)

σ2 = σ2(m + 2

√2m)

(recall m = O(k log(n/k)))

k Computational Complexity

O(1) O(n(log n)3/2

)O(n) O

(n3)

12

Computational complexity

I Sampling with rank-1 update: O(m)

I Estimation:

• Computations in single iteration: ≥ O(mn) (A ∈ Rm×n)

• Number of iterations, T : O(√m)†

O(nm3/2)

• Least squares: O(k3)

T increases with ‖xinit − x∗‖2

‖x(i)init − x∗(i)‖2 ≤ C0

‖x(i) − x(i)k ‖1√

k︸ ︷︷ ︸=0

for x(i) k-sparse

+ C1σ︸︷︷︸noise

+ |x (i)n−1|︸ ︷︷ ︸O(1)

σ2 = σ2(m + 2

√2m)

(recall m = O(k log(n/k)))

k Computational Complexity

O(1) O(n(log n)3/2

)O(n) O

(n3)

12

Computational complexity

I Sampling with rank-1 update: O(m)

I Estimation:

• Computations in single iteration: ≥ O(mn) (A ∈ Rm×n)

• Number of iterations, T : O(√m)†

O(nm3/2)

• Least squares: O(k3)

T increases with ‖xinit − x∗‖2

‖x(i)init − x∗(i)‖2 ≤ C0

‖x(i) − x(i)k ‖1√

k︸ ︷︷ ︸=0

for x(i) k-sparse

+ C1σ︸︷︷︸noise

+ |x (i)n−1|︸ ︷︷ ︸O(1)

σ2 = σ2(m + 2

√2m)

(recall m = O(k log(n/k)))

k Computational Complexity

O(1) O(n(log n)3/2

)O(n) O

(n3)

12

Computational complexity

I Sampling with rank-1 update: O(m)

I Estimation:

• Computations in single iteration: ≥ O(mn) (A ∈ Rm×n)

• Number of iterations, T : O(√m)†

O(nm3/2)

• Least squares: O(k3)

T increases with ‖xinit − x∗‖2

‖x(i)init − x∗(i)‖2 ≤ C0

‖x(i) − x(i)k ‖1√

k︸ ︷︷ ︸=0

for x(i) k-sparse

+ C1σ︸︷︷︸noise

+ |x (i)n−1|︸ ︷︷ ︸O(1)

σ2 = σ2(m + 2

√2m)

(recall m = O(k log(n/k)))

k Computational Complexity

O(1) O(n(log n)3/2

)O(n) O

(n3)

12

Outline

Background

Recursive Compressed SensingRecursive samplingRecursive estimation

Analysis

Simulations

Runtime

0 500 1000 1500 2000 2500 30000

0.5

1

1.5

2

2.5

Runtime Plot

window size

tim

e (

s)

naive approach

RCS

Figure : Average time required to solve one window

k = 0.05n, m = 5k, w(i) ∼ N(0, σ2I

), σ = 0.01

13

Support estimation

Support estimation accuracy:

Define:

I true support := support(x)

I detected support := support(x)

Performance metrics:

I true positive rate (TPR) =|detected support ∩ true support|

|true support|

I false positive rate (FPR) =|detected support \ true support|

n − |true support|

14

Support estimation

300 400 500 600 700 8000

0.2

0.4

0.6

0.8

1

number of samples (m)

true positive rate and false positive rate

threshold = 0.01

threshold = 0.10

threshold = 1.00

Figure : Circle markers: true positive rate. Square markers: false positive rate.

n = 6000, σ = 0.1, min |xi | ≥ 3.34, ξ1 = 0.01, 0.10 and 1.00.14

RCS error

100 200 300 400 500 600 700 800 900 10000.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

RCS normalized error

norm

ali

zed

err

or

window length

Figure : Normalized error∑T

i=1(xi−xi )2∑Ti=1(xi )2 vs. window length.

AWGN σ = 0.1, 5% sparsity, A random Gaussian, m = 5× k, T = 60, 00015

Conclusion

Compressed Sensing on streaming dataI Encoding:

• Recursive sampling with minimal computational overhead (rank-1 update)

I Decoding:

• Recursive estimation

− warm start for faster convergence

− voting and averaging for reconstruction error variance reduction

16

Thank you!

Thank you!

Recommended