Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Introduction to Sparsity in Signal Processing
Ivan Selesnick
Polytechnic Institute of New York University
Brooklyn, New York
2012
1
Under-determined linear equations
Consider a system of under-determined system of equations
y = Ax (1)
A : M × N M < N
y : length-M
x : length-N
y =
y(0)...
y(M − 1)
x =
x(0)
...
x(N − 1)
The system has more unknowns than equations.
The matrix A is wider than it is tall.
We assume that AA∗ is invertible, therefore the system of equations (1) hasinfinitely many solutions.
2
Norms
We will use the `2 and `1 norms:
‖x‖22 :=N−1∑n=0
|x(n)|2 (2)
‖x‖1 :=N−1∑n=0
|x(n)|. (3)
‖x‖22, i. e. the sum of squares, is referred to as the ‘energy’ of x.
3
Least squares
To solve y = Ax, it is common to minimize the energy of x.
argminx‖x‖22 (4a)
such that y = Ax. (4b)
Solution to (4):x = A∗(AA∗)−1 y. (5)
When y is noisy, don’t solve y = Ax exactly. Instead, find approximate solution:
argminx‖y − Ax‖22 + λ‖x‖22. (6)
Solution to (6):x = (A∗A + λI)−1A∗ y. (7)
Large scale systems −→ fast algorithms needed..
4
Sparse solutions
Another approach to solve y = Ax:
argminx‖x‖1 (8a)
such that y = Ax (8b)
Problem (8) is the basis pursuit (BP) problem.
When y is noisy, don’t solve y = Ax exactly. Instead, find approximate solution:
argminx‖y − Ax‖22 + λ‖x‖1. (9)
Problem (9) is the basis pursuit denoising (BPD) problem.
The BP/BPD problems can not be solved in explicit form, only by iterativenumerical algorithms.
5
Least squares & BP/BPD
Least squares and BP/BPD solutions are quite different. Why?
To minimize ‖x‖22 . . . the largest values of x must be made small as theycount much more than the smallest values.⇒ least square solutions have many small values, as they are relativelyunimportant ⇒ least square solutions are not sparse.
x
x2
|x|
1 2�1�2 0
t
f(t)
�(|t| + ✏)p
a + b |t|
t0
x
f(x)
f(x) = x
f(x) = sin x
f(x) = 120ex
1
Therefore, when it is known/expected that x is sparse, use the `1 norm;not the `2 norm.
6
Algorithms for sparse solutions
Cost function:
I Non-differentiable
I Convex
I Large-scale
Algorithms:
I ‘Matrix-free’
I ISTA
I FISTA
I SALSA (ADMM)
and more...
7
Parseval frames
If A satisfiesAA∗ = pI (10)
then the columns of A are said to form a ‘Parseval frame’.A can be rectangular.
Least squares solution (5) becomes
x = A∗(AA∗)−1 y =1
pA∗ y (AA∗ = pI) (11)
No matrix inversion needed.
Least square solution (7) becomes
x = (A∗A + λI)−1A∗ y =1
λ+ pA∗ y (AA∗ = pI) (12)
(Use the matrix inverse lemma.) No matrix inversion needed.
If A satisfies (10), then it is very easy to find least square solutions.Some algorithms for BP/BPD also become computationally easier.
8
Example: Sparse Fourier coefficients using BP
The Fourier transform tells how to write the signal as a sum of sinusoids. But,it is not the only way.
Basis pursuit gives a sparse spectrum.
Suppose the M-point signal y(m) is written as
y(m) =N−1∑n=0
c(n) exp
(j2π
Nmn
), 0 ≤ m ≤ M − 1 (13)
where c(n) is a length-N coefficient sequence, with M ≤ N.
y = Ac (14)
Am,n = exp
(j2π
Nmn
), 0 ≤ m ≤ M − 1, 0 ≤ n ≤ N − 1 (15)
c : length-N
The coefficients c(n) are frequency-domain (Fourier) coefficients.
9
Example: Sparse Fourier coefficients using BP
1. If N = M, then A is the inverse N-point DFT matrix.
2. If N > M, then A is the first M rows of the inverse N-point DFT matrix.⇒ A or A∗ can be implemented efficiently using the FFT.For example, in Matlab, y = Ac is implemented as:
function y = A(c, M, N)
v = N * ifft(c);
y = v(1:M);
end
Similarly, A∗ y can be obtained by zero-padding and computing the DFT.In Matlab, c = A∗ y is implemented as:
function c = AT(y, M, N)
c = fft([y; zeros(N-M, 1)]);
end
⇒ Matrix-free algorithms.
3. Due to the orthogonality properties of complex sinusoids,
AA∗ = N IM (16)
10
Example: Sparse Fourier coefficients using BP
When N = M, the coefficients c satisfying y = Ac are uniquely determined.
When N > M, the coefficients c are not unique. Any vector c satisfyingy = Ac can be considered a valid set of coefficients. To find a particularsolution we can minimize either ‖c‖22 or ‖c‖1.
Least squares:
argminc‖c‖22 (17a)
such that y = Ac (17b)
Basis pursuit:
argminc‖c‖1 (18a)
such that y = Ac. (18b)
The two solutions can be quite different...
11
Example: Sparse Fourier coefficients using BP
0 20 40 60 80 100
−1
−0.5
0
0.5
1
1.5
Time (samples)
Signal
Real part
Imaginary part
Least square solution:
c = A∗(AA∗)−1 y =1
NA∗y (least square solution)
A∗y is computed via:
1. zero-pad the length-M signal y to length-N
2. compute its DFT
BP solution: Compute using algorithm SALSA
12
Example: Sparse Fourier coefficients using BP
0 20 40 60 80 1000
20
40
60
80
Frequency (DFT index)
(A) Fourier coefficients (DFT)
0 50 100 150 200 2500
0.1
0.2
0.3
0.4(B) Fourier coefficients (least square solution)
Frequency (index)
0 50 100 150 200 2500
0.2
0.4
0.6
0.8
1(C) Fourier coefficients (basis pursuit solution)
Frequency (index)
The BP solution does not exhibit the leakage phenomenon.13
Example: Sparse Fourier coefficients using BP
0 20 40 60 80 100
1
1.5
2
2.5
Cost function history
Iteration
Cost function history of algorithm for basis pursuit solution
14
Example: Denoising using BPD
Digital LTI filters are often used for noise reduction (denoising).
But if
I the noise and signal overlap in the frequency domainor
I the respective frequency bands are unknown,
then it is difficult to use LTI filters.
However, if the signal has sparse (or relatively sparse) Fourier coefficients, thenBPD can be used for noise reduction.
15
Example: Denoising using BPD
Noisy speech signal y(m):
y(m) = s(m) + w(m), 0 ≤ m ≤ M − 1, M = 500 (19)
s(m) : noise-free speech signalw(m) : noise sequence.
0 100 200 300 400 500
−0.4
−0.2
0
0.2
0.4
0.6 Noisy signal
Time (samples)
0 100 200 300 400 5000
0.01
0.02
0.03
0.04(A) Fourier coefficients (FFT) of noisy signal
Frequency (index)
16
Example: Denoising using BPD
Assume the noise-free speech signal s(n) has a sparse set of Fourier coefficients:
y = Ac + w
y : noisy speech signal, length-MA : M × N DFT matrix (15)c : sparse Fourier coefficients, length-Nw : noise, length-M
As y is noisy, find c by solving the least square problem
argminc‖y − Ac‖22 + λ‖c‖22 (20)
or the basis pursuit denoising (BPD) problem
argminc‖y − Ac‖22 + λ‖c‖1. (21)
Once c is found, an estimate of the speech signal is given by s = Ac.
17
Example: Denoising using BPD
Least square solution:
c = (A∗A + λI)−1A∗ y =1
λ+ NA∗ y (AA∗ = N I) (22)
using (12) and (16).⇒ least square estimate of the speech signal is
s = Ac =N
λ+ Ny (least square solution).
But this is only a scaled version of the noisy signal!
No real filtering is achieved.
18
Example: Denoising using BPD
BPD solution:
0 100 200 300 400 5000
0.02
0.04
0.06(B) Fourier coefficients (BPD solution)
Frequency (index)
0 100 200 300 400 500
−0.4
−0.2
0
0.2
0.4
0.6 Denoising using BPD
Time (samples)
obtained with algorithm SALSA. Effective noise reduction, unlike least squares!
19
Example: Deconvolution using BPD
If the signal of interest x(m) is not only noisy but is also distorted by an LTIsystem with impulse response h(m), then the available data y(m) is
y(m) = (h ∗ x)(m) + w(m) (23)
where ‘∗’ denotes convolution (linear convolution) and w(m) is additive noise.Given the observed data y , we aim to estimate the signal x . We will assumethat the sequence h is known.
20
Example: Deconvolution using BPD
y = Hx + w (24)
x : length Nh : length Ly : length M = N + L− 1
H =
h0h1 h0h2 h1 h0
h2 h1 h0h2 h1
h2
(25)
H is of size M × N with M > N (because M = N + L− 1).
21
Example: Deconvolution using BPD
Sparse signal convolved by the 4-point moving average filter
h(n) =
{14
n = 0, 1, 2, 3
0 otherwise
0 20 40 60 80 100
−1
0
1
2 Sparse signal
0 20 40 60 80 100−0.4
−0.2
0
0.2
0.4
0.6Observed signal
22
Example: Deconvolution using BPD
Due to noise w, solve the least square problem
argminx‖y −Hx‖22 + λ‖x‖22 (26)
or the basis pursuit denoising (BPD) problem
argminx‖y −Hx‖22 + λ‖x‖1 (27)
Least square solution:x = (H∗H + λI)−1H∗ y. (28)
23
Example: Deconvolution using BPD
0 20 40 60 80 100−0.4
−0.2
0
0.2
0.4
0.6Deconvolution (least square solution)
0 20 40 60 80 100
−1
0
1
2 Deconvolution (BPD solution)
The BPD solution, obtained using SALSA, is more faithful to original signal.
24
Example: Filling in missing samples using BP
Due to data transmission/acquisition errors, some signal samples may be lost.Fill in missing values for error concealment.
Part of a signal or image may be intentionally deleted (image editing, etc).Convincingly fill in missing values according to the surrounding area to doinpainting.
0 100 200 300 400 500
−0.5
0
0.5 Incomplete signal
Time (samples)
200 missing samples
25
Example: Filling in missing samples using BP
The incomplete signal y:y = Sx (29)
x : length My : length K < MS : ‘selection’ (or ‘sampling’) matrix of size K ×M.
For example, if only the first, second and last elements of a 5-point signal x areobserved, then the matrix S is given by:
S =
1 0 0 0 00 1 0 0 00 0 0 0 1
. (30)
Problem: Given y and S, find x such that y = Sx
⇒ Underdetermined system, infinitely many solutions.
Least square and BP solutions are very different...
26
Example: Filling in missing samples using BP
Properties of S:
1.SSt = I (31)
where I is an K × K identity matrix. For example, with S in (30)
SSt =
1 0 00 1 00 0 1
.2. Sty sets the missing samples to zero.
For example, with S in (30)
Sty =
1 0 00 1 00 0 00 0 00 0 1
y(0)y(1)y(2)
=
y(0)y(1)
00
y(2)
. (32)
27
Example: Filling in missing samples using BP
Suppose x has a sparse representation with respect to A,
x = Ac (33)
c : sparse vector, length N, with M ≤ NA : size M × N.
The incomplete signal y can then be written as
y = Sx from (29) (34a)
= SAc from (33). (34b)
Therefore, if we can find a vector c satisfying
y = SAc (35)
then we can create an estimate x of x by setting
x = Ac. (36)
From (35), Sx = y. That is, x agrees with the known samples y.
28
Example: Filling in missing samples using BP
Note that y is shorter than the coefficient vector c, so there are infinitely manysolutions to (35).
Any vector c satisfying y = SAc can be considered a valid set of coefficients.
To find a particular solution, solve the least square problem
argminc‖c‖22 (37a)
such that y = SAc (37b)
or the basis pursuit problem
argminc‖c‖1 (38a)
such that y = SAc. (38b)
The two solutions are very different...
Let us assume A satisfiesAA∗ = pI, (39)
for some positive real number p.
29
Example: Filling in missing samples using BP
The least square solution is
c = (SA)∗((SA)(SA)∗)−1 y (40)
= A∗St(SAA∗St)−1 y (41)
= A∗St(p SSt)−1 y using (10) (42)
= A∗St(p I)−1 y using (31) (43)
=1
pA∗St y (44)
Hence, the least square estimate x is given by
x = Ac (45)
=1
pAA∗St y using (44) (46)
= St y using (10). (47)
This estimate consists of setting all the missing values to zero! See (32).
No real estimation of the missing values has been achieved.
The least square solution is of no use here.
30
Example: Filling in missing samples using BP
Short segments of speech can be sparsely represented using the DFT; thereforewe set A equal to the M × N DFT (15) with N = 1024.
BP solution obtained using 100 iterations of a SALSA:
0 100 200 300 400 5000
0.02
0.04
0.06
0.08Estimated coefficients
Frequency (DFT index)
0 100 200 300 400 500
−0.5
0
0.5 Estimated signal
Time (samples)
The missing samples have been filled in quite accurately.
31
Total Variation Denoising/Deconvolution
A signal is observed in additive noise,
y = x + w.
Total variation denoising is suitable when it is known that the signal of interesthas a sparse derivative.
argminx‖y − x‖22 + λ
N∑n=2
|x(n)− x(n − 1)|. (48)
The total variation of x is defined as
TV(x) :=N∑
n=2
|x(n)− x(n − 1)| = ‖Dx‖1
D =
−1 1
−1 1. . .
−1 1
(49)
Matrix D has size (N − 1)× N.
32
Total Variation Denoising/Deconvolution
0 50 100 150 200 250
−2
0
2
4
6Noise−free signal
0 50 100 150 200 250
−2
0
2
4
6Noisy signal
0 50 100 150 200 250
−2
0
2
4
6
L2−filtered signal (λ = 3.00)
0 50 100 150 200 250
−2
0
2
4
6
L2−filtered signal (λ = 8.00)
0 50 100 150 200 250
−2
0
2
4
6
TV−filtered signal (λ = 3.00)
TV filtering preserves edges/jumps more accurately than least squares filtering.
33
Polynomial Smoothing of Time Series with Additive Step Discontinuities
0 50 100 150 200 250 300
−1
0
1
2
3 (a) Noisy data
0 50 100 150 200 250 300
−1
0
1
2
3 (b) Calculated TV component
0 50 100 150 200 250 300
−1
0
1
2
3 (c) TV−compensated data
0 50 100 150 200 250 300
−1
0
1
2
3 (d) Estimated signal
Observed noisy data: a low-frequency signal and with step-wise discontinuities(steps/jumps).
Goal: Simultaneous smoothing and change/jump detection.
Conventional low-pass filtering blurs discontinuities.
34
Polynomial Smoothing of Time Series with Additive Step Discontinuities
Variational Formulation: Given noisy data y, find
x (step signal)
p (local polynomial signal)
by minimization:
argmina,x
λ
N∑n=2
|x(n)− x(n − 1)|+N∑
n=1
|y(n)− p(n)− x(n)|2
wherep(n) = a1 n + · · ·+ ad n
d .
The cost function is non-differentiable. We use ‘variable splitting’ and ADMMto derive an efficient iterative algorithm.
35
Polynomial Smoothing of Time Series with Additive Step Discontinuities
Nano-particle Detection: The Whispering Gallery Mode (WGM) biosensor,
designed to detect individual nano-particles in real time, is based on detecting discrete
changes in the resonance frequency of a microspherical cavity. The adsorption of a
nano-particle onto the surface of a custom designed microsphere produces a
theoretically predicted but minute change in the optical properties of the sphere
[Arnold, 2003].
10 20 30 40 50 60 70 80 900
10
20
30
40
50
60
70
80
90
Time (seconds)
wa
ve
len
gth
(fm
)
36
Polynomial Smoothing of Time Series with Additive Step Discontinuities
Nano-particle Detection:
0 50 100 150 200 250 3000
50
100
150
200
250
300
Time (seconds)
wa
ve
len
th (
fm)
0 50 100 150 200 250 300−30
−20
−10
0
10
20x(t)
wa
ve
len
gth
(fm
)
Time (seconds)
0 50 100 150 200 250 300−5
0
5
10
15diff(x(t))
Time (seconds)w
ave
len
gth
(fm
)
Reference: Polynomial Smoothing of Time Series with Additive Step Discontinuities, IS, S. Arnold,
V. Dantham. Submitted to IEEE Trans on Signal Processing.
37
Summary
1. Basis pursuit
2. Basis pursuit denoising
3. Sparse Fourier coefficients using BP
4. Denoising using BPD
5. Deconvolution using BPD
6. Filling in missing samples using BP
7. Total variation filtering
8. Simultaneous least-squares & sparsity
38