Upload
gabriel-peyre
View
815
Download
1
Tags:
Embed Size (px)
DESCRIPTION
Presentation at the workshop "Workshop on tomography reconstruction", December 11th, 2012, ENS Paris
Citation preview
CompressiveSensingGabriel Peyré
www.numerical-tours.com
Overview
•Compressive Sensing Acquisition
•Theoretical Guarantees
•Fourier Domain Measurements
•Parameters Selection
f
Single Pixel Camera (Rice)
f
P measures � N micro-mirrors
Single Pixel Camera (Rice)
y[i] = �f, �i�
f
P/N = 0.16 P/N = 0.02P/N = 1
P measures � N micro-mirrors
Single Pixel Camera (Rice)
y[i] = �f, �i�
Physical hardware resolution limit: target resolution f � RN .
f � L2 f � RN y � RPmicromirrors
arrayresolution
CS hardwareK
CS Hardware Model
CS is about designing hardware: input signals f � L2(R2).
Physical hardware resolution limit: target resolution f � RN .
f � L2 f � RN y � RPmicromirrors
arrayresolution
CS hardware
,
...
K
CS Hardware Model
CS is about designing hardware: input signals f � L2(R2).
,
,
Physical hardware resolution limit: target resolution f � RN .
f � L2 f � RN y � RPmicromirrors
arrayresolution
CS hardware
,
...
fOperator K
K
CS Hardware Model
CS is about designing hardware: input signals f � L2(R2).
,
,
f0 � RN sparse in ortho-basis �
Sparse CS Recovery
���
x0 � RN
f0 � RN
(Discretized) sampling acquisition:
f0 � RN sparse in ortho-basis �
y = Kf0 + w = K � �(x0) + w= �
Sparse CS Recovery
���
x0 � RN
f0 � RN
(Discretized) sampling acquisition:
f0 � RN sparse in ortho-basis �
y = Kf0 + w = K � �(x0) + w= �
K drawn from the Gaussian matrix ensemble
Ki,j � N (0, P�1/2) i.i.d.
� � drawn from the Gaussian matrix ensemble
Sparse CS Recovery
���
x0 � RN
f0 � RN
(Discretized) sampling acquisition:
f0 � RN sparse in ortho-basis �
y = Kf0 + w = K � �(x0) + w= �
K drawn from the Gaussian matrix ensemble
Ki,j � N (0, P�1/2) i.i.d.
� � drawn from the Gaussian matrix ensemble
Sparse recovery: min||�x�y||�||w||
||x||1
Sparse CS Recovery
���
x0 � RN
f0 � RN
� = translation invariantwavelet frame
Original f0
CS Simulation Example
Overview
•Compressive Sensing Acquisition
•Theoretical Guarantees
•Fourier Domain Measurements
•Parameters Selection
⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:
�1 recovery:
CS with RIP
x⇥ � argmin||�x�y||��
||x||1 where�
y = �x0 + w||w|| � �
⇥ ||x||0 � k, (1� �k)||x||2 � ||�x||2 � (1 + �k)||x||2Restricted Isometry Constants:
�1 recovery:
CS with RIP
[Candes 2009]
x⇥ � argmin||�x�y||��
||x||1 where�
y = �x0 + w||w|| � �
Theorem: If �2k ��
2� 1, then
where xk is the best k-term approximation of x0.
||x0 � x�|| � C0⇥k
||x0 � xk||1 + C1�
0 0.5 1 1.5 2 2.50
0.5
1
1.5
P=200, k=10
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
P=200, k=30
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
P=200, k=50
f�(⇥) =1
2⇤�⇥
�(⇥� b)+(a� ⇥)+
Eigenvalues of ��I�I with |I| = k are essentially in [a, b]
a = (1��
�)2 and b = (1��
�)2 where � = k/P
When k = �P � +�, the eigenvalue distribution tends to
[Marcenko-Pastur]
Large deviation inequality [Ledoux]
P = 200, k = 30
Singular Values Distributions
f�(�)
�
0 0.5 1 1.5 2 2.50
0.5
1
1.5
P=200, k=10
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
1
P=200, k=30
0 0.5 1 1.5 2 2.50
0.2
0.4
0.6
0.8
P=200, k=50
f�(⇥) =1
2⇤�⇥
�(⇥� b)+(a� ⇥)+
Eigenvalues of ��I�I with |I| = k are essentially in [a, b]
a = (1��
�)2 and b = (1��
�)2 where � = k/P
When k = �P � +�, the eigenvalue distribution tends to
[Marcenko-Pastur]
Large deviation inequality [Ledoux]
P = 200, k = 30
Singular Values Distributions
f�(�)
�
k � C
log(N/P )PTheorem: If
then �2k ��
2� 1 with high probability.
(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:
smallest / largest eigenvalues of A�A
Numerics with RIP
�2� 1
(1� ⇥1(A))||�||2 � ||A�||2 � (1 + ⇥2(A))||�||2Stability constant of A:
Upper/lower RIC:
�ik = max
|I|=k�i(�I)
�k = min(�1k, �2
k)
k
�2k
�2k
Monte-Carlo estimation:�k � �k
smallest / largest eigenvalues of A�A
N = 4000, P = 1000
Numerics with RIP
All MostRIP
� Sharp constants.
� No noise robustness.
All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.
Call(1/4) � 0.065
Cmost(1/4) � 0.25
[Donoho]
Polytope Noiseless Recovery
50 100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Counting faces of random polytopes:
All MostRIP
� Sharp constants.
� No noise robustness.
All x0 such that ||x0||0 � Call(P/N)P are identifiable.Most x0 such that ||x0||0 � Cmost(P/N)P are identifiable.
Call(1/4) � 0.065
Cmost(1/4) � 0.25
[Donoho]
� Computation of“pathological” signals
[Dossal, P, Fadili, 2010]
Polytope Noiseless Recovery
50 100 150 200 250 300 350 4000
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Counting faces of random polytopes:
Overview
•Compressive Sensing Acquisition
•Theoretical Guarantees
•Fourier Domain Measurements
•Parameters Selection
Tomography and Fourier Measures
Kf = (f [!])!2⌦
Tomography and Fourier Measures
�
Fourier slice theorem: p�(⇥) = f(⇥ cos(�), ⇥ sin(�))
1D 2D Fourier
�k
f = FFT2(f)
Partial Fourier measurements:
Equivalent to:
{p�k(t)}t�R0�k<K
Regularized Inversion
f⇥ = argminf
12
�
���
|y[⇤] � f [⇤]|2 + ��
m
|⇥f, ⇥m⇤|.�1 regularization:
Noisy measurements: ⇥� � �, y[�] = f0[�] + w[�].
Noise: w[⇥] � N (0,�), white noise.
MRI ImagingFrom [Lutsig et al.]
Fourier sub-sampling pattern:
randomization
MRI Reconstruction
High resolution Linear SparsityLow resolution
From [Lutsig et al.]
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix: {��}� orthogonal basis.
Fast measurements: (e.g. Fourier basis)
Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.
Structured Measurements
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix: {��}� orthogonal basis.
Fast measurements: (e.g. Fourier basis)
Mutual incoherence: µ =⌅
Nmax�,m
|⇥⇥�, �m⇤| � [1,⌅
N ]
Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.
Structured Measurements
�� not universal: requires incoherence.
Gaussian matrices: intractable for large N .
Random partial orthogonal matrix: {��}� orthogonal basis.
Fast measurements: (e.g. Fourier basis)
Mutual incoherence: µ =⌅
Nmax�,m
|⇥⇥�, �m⇤| � [1,⌅
N ]
Kf = (h'!, fi)!2⌦ where |⌦| = P uniformly random.
Structured Measurements
Theorem: with high probability on �,
[Rudelson, Vershynin, 2006]
� = K
If k 6 CP
µ2log(N)
4, then �2k 6
p2� 1
Overview
•Compressive Sensing Acquisition
•Theoretical Guarantees
•Fourier Domain Measurements
•Parameter Selection
Estimator: e.g.
Risk Minimization
x
�
(y) 2 argminx
1
2||y � �x||2 + �||x||1
Estimator: e.g.
Risk Minimization
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qua
dra
tic lo
ss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qua
dra
tic lo
ss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Average risk: R(�) = Ew(||x�(y)� x0||2)
x
�
(y) 2 argminx
1
2||y � �x||2 + �||x||1
Plugin-estimator: x�?(y)(y)�?(y) = argmin�
R(�)
But:Ew is not accessible ! use one observation.
Estimator: e.g.
Risk Minimization
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qua
dra
tic lo
ss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qua
dra
tic lo
ss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Average risk: R(�) = Ew(||x�(y)� x0||2)
x
�
(y) 2 argminx
1
2||y � �x||2 + �||x||1
Plugin-estimator: x�?(y)(y)�?(y) = argmin�
R(�)
But:x0 is not accessible ! needs risk estimators.
Ew is not accessible ! use one observation.
Estimator: e.g.
Risk Minimization
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qua
dra
tic lo
ss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qua
dra
tic lo
ss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Average risk: R(�) = Ew(||x�(y)� x0||2)
x
�
(y) 2 argminx
1
2||y � �x||2 + �||x||1
Plugin-estimator: x�?(y)(y)�?(y) = argmin�
R(�)
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
Prediction Risk Estimation
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
Stein Unbiased Risk Estimator:
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
df�(y) = tr(@µ�(y)) = div(µ�)(y)
SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)
Prediction Risk Estimation
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
Theorem: [Stein, 1981]
Stein Unbiased Risk Estimator:
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
df�(y) = tr(@µ�(y)) = div(µ�)(y)
SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)
Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)
Prediction Risk Estimation
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
Theorem: [Stein, 1981]
Other estimators: GCV, BIC, AIC, . . .
Stein Unbiased Risk Estimator:
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
df�(y) = tr(@µ�(y)) = div(µ�)(y)
SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)
Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)
Prediction Risk Estimation
Prediction: µ�(y) = �x�(y)
Sensitivity analysis: if µ� is weakly di↵erentiable
Theorem: [Stein, 1981]
Other estimators: GCV, BIC, AIC, . . .
Stein Unbiased Risk Estimator:
µ�(y + �) = µ�(y) + @µ�(y) · � +O(||�||2)
df�(y) = tr(@µ�(y)) = div(µ�)(y)
SURE�(y) = ||y � µ�(y)||2 � �2P + 2�2df�(y)
Ew(SURE�(y)) = Ew(||�x0 � µ�(y)||2)
Generalized SURE: estimate Ew(||Pker(�)?(x0 � x�(y))||2)
Prediction Risk Estimation
Sparse estimator:
Computation for L1 Regularization
x
�
(y) 2 argminx
1
2||y � �x||2 + �||x||1
Sparse estimator:
Theorem: for all y, there exists x
?s.t. �I injective.
df�(y) = div (�x�) (y) = ||x?||0 [Dossal et al. 2011]
Computation for L1 Regularization
x
�
(y) 2 argminx
1
2||y � �x||2 + �||x||1
Sparse estimator:
Theorem: for all y, there exists x
?s.t. �I injective.
df�(y) = div (�x�) (y) = ||x?||0 [Dossal et al. 2011]
: TI wavelets.
Computation for L1 Regularization
x
�
(y) 2 argminx
1
2||y � �x||2 + �||x||1
�+y
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
x�?(y)
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
��?
Quadraticloss
� 2 RP⇥Nrealization of a random vector. P = N/4
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Qu
ad
ratic
loss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
Observations y
Anisotropic Total-Variation
Unbiased Risk Estimation for Sparse Analysis RegularizationCharles Deledalle1, Samuel Vaiter1, Gabriel Peyre1, Jalal Fadili3 and Charles Dossal2
1CEREMADE, Universite Paris–Dauphine — 2GREY’C, ENSICAEN — 3IMB, Universite Bordeaux I
Problem statement
Consider the convex but non-smooth Analysis Sparsity Regularization problem
x
?(y,�) 2 argminx2RN
1
2||y � �x||2 + �||D⇤
x||1 (P�
(y))
which aims at inverting
y = �x0 + w
by promoting sparsity and with
Ix0 2 RN the unknown image of interest,
Iy 2 RQ the low-dimensional noisy observation of x0,
I � 2 RQ⇥N a linear operator that models the acquisition process,
Iw ⇠ N (0, �2Id
Q
) the noise component,
ID 2 RN⇥P an analysis dictionary, and
I� > 0 a regularization parameter.
How to choose the value of the parameter �?
Risk-based selection of �
I Risk associated to �: measure of the expected quality of x?(y,�) wrt x0,
R(�) = Ew
||x?(y,�) � x0||2 .I The optimal (theoretical) � minimizes the risk.
The risk is unknown since it depends on x0.
Can we estimate the risk solely from x
?(y,�)?
Risk estimation
I Assume y 7! �x?(y,�) is weakly di↵erentiable (a fortiori uniquely defined).
Prediction risk estimation via SURE
I The Stein Unbiased Risk Estimator (SURE):
SURE(y,�) =||y � �x?(y,�)||2 � �
2Q + 2�2 tr
✓@�x?(y,�)
@y
◆
| {z }Estimator of the DOF
is an unbiased estimator of the prediction risk [Stein, 1981]:
Ew
(SURE(y,�)) = Ew
(||�x0 � �x?(y,�)||2) .
Projection risk estimation via GSURE
I Let ⇧ = �⇤(��⇤)+� be the orthogonal projector on ker(�)? = Im(�⇤),I Denote xML(y) = �⇤(��⇤)+y,I The Generalized Stein Unbiased Risk Estimator (GSURE):
GSURE(y,�) =||xML(y) � ⇧x?(y,�)||2 � �
2 tr((��⇤)+) + 2�2 tr
✓(��⇤)+@�x
?(y,�)
@y
◆
is an unbiased estimator of the projection risk [Vaiter et al., 2012]
Ew
(GSURE(y,�)) = Ew
(||⇧x0 � ⇧x?(y,�)||2)(see also [Eldar, 2009, Pesquet et al., 2009, Vonesch et al., 2008] for similar results).
Illustration of risk estimation
(here, x? denotes x?(y,�) for an arbitrary value of �)
How to estimate the quantity tr⇣(��⇤)+@x
?(y,�)@y
⌘?
Main notations and assumptions
I Let I = supp(D⇤x
?(y,�)) be the support of D⇤x
?(y,�),I Let J = I
c be the co-support of D⇤x
?(y,�),I Let D
I
be the submatrix of D whose columns are indexed by I ,
I Let sI
= sign(D⇤x
?(y,�))I
be the subvector of D⇤x
?(y,�) whose entries are indexed by I ,
I Let GJ
= KerD⇤J
be the “cospace” associated to x
?(y,�) ,I To study the local behaviour of x?(y,�), we impose � to be “invertible” on G
J
:
GJ
\ Ker� = {0},I It allows us to define the matrix
A
[J ] = U(U⇤�⇤�U)�1U
⇤,
where U is a matrix whose columns form a basis of GJ
,
I In this case, we obtain an implicit equation:
x
?(y,�) solution of P�
(y) , x
?(y,�) = x(y,�) , A
[J ]�⇤y � �A
[J ]D
I
s
I
.
Is this relation true in a neighbourhood of (y,�)?
Theorem (Local Parameterization)
I Even if the solutions x?(y,�) of P�
(y) might benot unique, �x?(y,�) is uniquely defined.
I If (y,�) 62 H, for (y, �) close to (y,�), x(y, �)is a solution of P(y, �) where
x(y, �) = A
[J ]�⇤y � �A
[J ]D
I
s
I
.
I Hence, it allows us writing
@�x?(y,�)
@y
= �A[J ]�⇤,
I Moreover, the DOF can be estimated by
tr
✓@�x?(y,�)
@y
◆= dim(G
J
) .
Can we compute this quantity e�ciently?
�
x1
x2
�0 = 0 �k
x�k = 0
x�0
P0(y)
Monday, September 24, 12
Computation of GSURE
I One has for Z ⇠ N (0, IdP
),
tr
✓(��⇤)+@�x
?(y,�)
@y
◆= E
Z
(h⌫(Z), �⇤(��⇤)+Zi)
where, for any z 2 RP , ⌫ = ⌫(z) solves the following linear system✓�⇤� D
J
D
⇤J
0
◆✓⌫
⌫
◆=
✓�⇤
z
0
◆.
I In practice, with law of large number, the empirical mean is replaced for the expectation.
I The computation of ⌫(z) is achieved by solving the linear system with a conjugate gradient solver.
Numerical example
Super-resolution using (anisotropic) Total-Variation
(a) y
(b) x?(y,�) at the optimal � 2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Compressed-sensing using multi-scale wavelet thresholding
(c) xML
(d) x?(y,�) at the optimal �2 4 6 8 10 12
1
1.5
2
2.5x 10
6
Regularization parameter !
Quadra
tic lo
ss
Projection RiskGSURETrue Risk
Perspectives: How to e�ciently minimizes GSURE(y,�) wrt �?
References
Eldar, Y. C. (2009).Generalized SURE for exponential families: Applications to regularization.IEEE Transactions on Signal Processing, 57(2):471–481.
Pesquet, J.-C., Benazza-Benyahia, A., and Chaux, C. (2009).A SURE approach for digital signal/image deconvolution problems.IEEE Transactions on Signal Processing, 57(12):4616–4632.
Stein, C. (1981).Estimation of the mean of a multivariate normal distribution.The Annals of Statistics, 9(6):1135–1151.
Vaiter, S., Deledalle, C., Peyre, G., Dossal, C., and Fadili, J. (2012).Local behavior of sparse analysis regularization: Applications to risk estimation.Arxiv preprint arXiv:1204.3212.
Vonesch, C., Ramani, S., and Unser, M. (2008).Recursive risk estimation for non-linear image deconvolution with a wavelet-domain sparsity constraint.In ICIP, pages 665–668. IEEE.
http://www.ceremade.dauphine.fr/~deledall/ [email protected]
��?
Quadraticloss
x�?(y)
Extension to `1 analysis, TV.
[Vaiter et al. 2012]
�: vertical sub-sampling.
D = [@1, @2]
Finite di↵erences gradient:
dictionary
ConclusionSparsity: approximate signals with few atoms.
�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.
Compressed sensing ideas:
�� CS is about designing new hardware.
dictionary
ConclusionSparsity: approximate signals with few atoms.
�� Randomized sensors + sparse recovery.�� Number of measurements � signal complexity.
Compressed sensing ideas:
The devil is in the constants:
�� Worse case analysis is problematic.
�� Designing good signal models.
�� CS is about designing new hardware.
dictionary
ConclusionSparsity: approximate signals with few atoms.