Compressed Sensing and Tomography

  • View
    798

  • Download
    1

Embed Size (px)

DESCRIPTION

Presentation at the workshop "Workshop on tomography reconstruction", December 11th, 2012, ENS Paris

Text of Compressed Sensing and Tomography

  • 1. CompressiveSensing Gabriel Peyr www.numerical-tours.com

2. OverviewCompressive Sensing AcquisitionTheoretical GuaranteesFourier Domain MeasurementsParameters Selection 3. Single Pixel Camera (Rice)f 4. Single Pixel Camera (Rice)fy[i] = f, iP measures N micro-mirrors 5. Single Pixel Camera (Rice)fy[i] = f, iP measures N micro-mirrorsP/N = 1P/N = 0.16 P/N = 0.02 6. CS Hardware ModelCS is about designing hardware: input signals fL2 (R2 ).Physical hardware resolution limit: target resolution f RN . array microf L2 fRN mirrors y RP resolution KCS hardware 7. CS Hardware ModelCS is about designing hardware: input signals fL2 (R2 ).Physical hardware resolution limit: target resolution f RN . array microf L2 fRN mirrors y RP resolution KCS hardware , ,... , 8. CS Hardware ModelCS is about designing hardware: input signals fL2 (R2 ).Physical hardware resolution limit: target resolution f RN . array microf L2 fRN mirrors y RP resolution KCS hardware , Operator K ,f... , 9. Sparse CS Recoveryf0 RNf0 RN sparse in ortho-basis Nx0 R 10. Sparse CS Recovery f0 RNf0 RN sparse in ortho-basis(Discretized) sampling acquisition:y = Kf0 + w = K(x0 ) + w=N x0 R 11. Sparse CS Recovery f0 RNf0 RN sparse in ortho-basis(Discretized) sampling acquisition:y = Kf0 + w = K (x0 ) + w=K drawn from the Gaussian matrix ensembleKi,jN (0, P 1/2) i.i.d. drawn from the Gaussian matrix ensembleN x0 R 12. Sparse CS Recovery f0 RNf0 RN sparse in ortho-basis(Discretized) sampling acquisition:y = Kf0 + w = K(x0 ) + w=K drawn from the Gaussian matrix ensemble Ki,j N (0, P1/2 ) i.i.d. drawn from the Gaussian matrix ensembleN x0 R Sparse recovery: min ||x||1|| x y|| ||w|| 13. CS Simulation ExampleOriginal f0 = translation invariant wavelet frame 14. OverviewCompressive Sensing AcquisitionTheoretical GuaranteesFourier Domain MeasurementsParameters Selection 15. CS with RIP 1 recovery: y=x0 + w x argmin ||x||1where|| x y|| ||w||Restricted Isometry Constants: ||x||0 k, (1k )||x||2 || x||2(1 +k )||x||2 16. CS with RIP 1 recovery: y=x0 + w x argmin ||x||1where|| x y|| ||w||Restricted Isometry Constants: ||x||0 k, (1k )||x||2 || x||2(1 +k )||x||2Theorem:If 2k2 1, then [Candes 2009]C0||x0 x || ||x0 xk ||1 + C1 k where xk is the best k-term approximation of x0 . 17. Singular Values DistributionsEigenvalues of II with |I| = k are essentially in [a, b] a = (1) 2 and b = (1)2 P=200, k=10where= k/P1.5When k = P 1 + , the eigenvalue distribution tends to0.51 0f () = (b)+ (a )+ [Marcenko-Pastur]2 00.511.5 2 2.5 P=200, k=30f ( ) 10.80.6 P = 200, k = 300.40.2 000.511.5 2 2.5Large deviation inequality [Ledoux] P=200, k=500.80.60.40.2 000.511.5 2 2.5 18. Singular Values DistributionsEigenvalues of I with |I| = k are essentially in [a, b]I a = (1) 2and b = (1)2 P=200, k=10 where= k/P1.5When k = P 1 + , the eigenvalue distribution tends to0.51 0f () = (b)+ (a )+ [Marcenko-Pastur]2 00.511.5 2 2.5 P=200, k=30f ( ) 10.80.6 P = 200, k = 300.40.2 000.511.5 2 2.5Large deviation inequality [Ledoux] P=200, k=500.80.60.4C Theorem:0.2 If k P 000.5log(N/P )11.5 2 2.5 then2k 2 1 with high probability. 19. Numerics with RIPStability constant of A:(1 1 (A))|| ||2 ||A ||2 (1 + 2 (A))|| ||2 smallest / largest eigenvalues of A A 20. Numerics with RIPStability constant of A:(1 1 (A))|| ||2||A ||2 (1 + 2 (A))|| ||2 smallest / largest eigenvalues of A AUpper/lower RIC: 2 k i k = max i(I) |I|=k2 12 k k = min(k, k) 1 2Monte-Carlo estimation: kk k N = 4000, P = 1000 21. Polytope Noiseless RecoveryCounting faces of random polytopes:[Donoho]All x0 such that ||x0 ||0 Call (P/N )P are identiable. Most x0 such that ||x0 ||0 Cmost (P/N )P are identiable. Call (1/4) 0.06510.9 Cmost (1/4)0.250.80.70.6 Sharp constants. 0.50.4 No noise robustness. 0.30.20.1050 100 150 200 250 300 350 400RIP All Most 22. Polytope Noiseless RecoveryCounting faces of random polytopes:[Donoho]All x0 such that ||x0 ||0 Call (P/N )P are identiable. Most x0 such that ||x0 ||0 Cmost (P/N )P are identiable. Call (1/4) 0.06510.9 Cmost (1/4)0.250.80.70.6 Sharp constants. 0.50.4 No noise robustness. 0.30.2 Computation of 0.1 pathological signals 050 100 150 200 250 300 350 400[Dossal, P, Fadili, 2010]RIP All Most 23. OverviewCompressive Sensing AcquisitionTheoretical GuaranteesFourier Domain MeasurementsParameters Selection 24. Tomography and Fourier Measures 25. Tomography and Fourier Measuresf = FFT2(f ) kFourier slice theorem: p () = f ( cos( ), sin( )) 1D2D Fourier t RPartial Fourier measurements: {p k (t)}0 k0 a regularization parameter. Risk Minimization How to choose the value of the parameter? 1 2Estimator: e.g. x (y) 2 argmin ||yRisk-based selection ofx|| + ||x||1x2?(y, ) ||2 )x , Risk associated risk: R(of ) = Ew (||x (y)x x0 wrt 0I Average to : measure the expected quality of ? R( ) = Ew ||x?(y, ) x0||2 . (y) = argmin R( )Plugin-estimator: x ? (y) (y)I The optimal (theoretical) minimizes the risk.The risk is unknown since it depends on x0. Can we estimate the risk solely from x?(y, )? 35. I >0 a regularization parameter. Risk Minimization How to choose the value of the parameter? 1 2Estimator: e.g. x (y) 2 argmin ||yRisk-based selection ofx|| + ||x||1x2?(y, ) ||2 )x , Risk associated risk: R(of ) = Ew (||x (y)x x0 wrt 0I Average to : measure the expected quality of ? R( ) = Ew ||x?(y, ) x0||2 . (y) = argmin R( )Plugin-estimator: x ? (y) (y)I The optimal (theoretical) minimizes the risk.Ew is not accessible ! use one observation.But:The risk is unknown since it depends on x0. Can we estimate the risk solely from x?(y, )? 36. I >0 a regularization parameter. Risk Minimization How to choose the value of the parameter? 1 2Estimator: e.g. x (y) 2 argmin ||yRisk-based selection ofx|| + ||x||1x2?(y, ) ||2 )x , Risk associated risk: R(of ) = Ew (||x (y)x x0 wrt 0I Average to : measure the expected quality of ? R( ) = Ew ||x?(y, ) x0||2 . (y) = argmin R( )Plugin-estimator: x ? (y) (y)I The optimal (theoretical) minimizes the risk.Ew is not accessible ! use one observation.But: The risk is unknown since it depends on x0.x0 is not accessible ! needs risk estimators.? Can we estimate the risk solely from x (y, )? 37. Prediction Risk EstimationPrediction: (y) = x (y)Sensitivity analysis: if is weakly dierentiable2 (y + ) = (y) + @ (y) + O(|| || ) 38. Prediction Risk EstimationPrediction: (y) = x (y)Sensitivity analysis: if is weakly dierentiable2 (y + ) = (y) + @ (y) + O(|| || )Stein Unbiased Risk Estimator:2 22SURE (y) = ||y (y)|| P +2 df (y)df (y) = tr(@ (y)) = div( )(y) 39. Prediction Risk Estimation Prediction: (y) = x (y) Sensitivity analysis: if is weakly dierentiable 2 (y + ) = (y) + @ (y) + O(|| || ) Stein Unbiased Risk Estimator: 2 22 SURE (y) = ||y (y)|| P +2 df (y) df (y) = tr(@ (y)) = div( )(y) 2Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0 (y)|| ) 40. Prediction Risk Estimation Prediction: (y) = x (y) Sensitivity analysis: if is weakly dierentiable 2 (y + ) = (y) + @ (y) + O(|| || ) Stein Unbiased Risk Estimator: 2 22 SURE (y) = ||y (y)|| P +2 df (y) df (y) = tr(@ (y)) = div( )(y) 2Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0 (y)|| ) Other estimators: GCV, BIC, AIC, . . . 41. Prediction Risk Estimation Prediction: (y) = x (y) Sensitivity analysis: if is weakly dierentiable 2 (y + ) = (y) + @ (y) + O(|| || ) Stein Unbiased Risk Estimator: 2 22 SURE (y) = ||y (y)|| P +2 df (y) df (y) = tr(@ (y)) = div( )(y) 2Theorem: [Stein, 1981] Ew (SURE (y)) = Ew (|| x0 (y)|| ) Other estimators: GCV, BIC, AIC, . . . Generalized SURE: estimate Ew (||Pker( )? (x0 x (y))||2 ) 42. Computation for L1 Regularization12Sparse estimator: x (y) 2 argmin ||y x|| + ||x||1 x2 43. Computation for L1 Regularization12Sparse estimator: x (y) 2 argmin ||y x|| + ||x||1 x2Theorem: for all y, there exists x? s.t. I injective. df (y) = div ( x ) (y) = ||x? ||0[Dossal et al. 2011] 44. (a) y Computation for L1 Regularization Regulariz6Quadratic lo ? x 10 (b) x (y, ) at the optimal2462.5Projection RiskGSURE 1True RiskSparse estimator: x (y) 21.5 using multi-scale 2 + ||x||thresholding argmin 2||yCompressed-sensingx|| wavelet 1(a) y 2Quadratic lossx )6 1.5x 10)?Theorem: for all y, there exists x? s.t. I injective.2.5 df (y) = div ( x ) (y)1 = ||x? ||01[Dossal et al. 2011](b) x?(y, ) at the optimal2 466(b) x?(y, ) at the optimal2 4 8 8 101012Regularization parameter Regularization parameter 2R P NCompressed-sensing using multi-scale wavelet thresholding realization of a random vector. P = N/4 2 Quadratic losspressed-sensing using multi-scale wavelet thresholding are indexed by I,6x 10 : TI wavelets.(c) xM L2.5Projection Riskon GJ : 6 GSURE x 10 True RiskQuadratic loss 2.521.5ProjectiQuadratic loss (c) xM L GSURE True Ri 1.5A[J]DI sI .2atic loss )?1+xat ? (y) 246y(c) xM L (d) x?(y, (d) x?(y, ) the optimal ) at the optimal12 4 ?6 8Regularization parameter 1012Regulariz 45. where, for any z 2 RP , = (z) solves the following linear systemAnisotropic Total-Variation DJ 6 z 10 6x 10DJ 0 =x .0 12.5 2.5 Extension to ` analysis, TV. I In practice, with law of large number, the empirical mean is replaced for the expectation. I The computation of (z) is achieved by solving the [Vaiter et al. conjugate gradient solverlinear system with a 2012]: vertical sub-sampling. Numerical example Finite dierences gradient:Super-resolution using (anisotropic) Total-VariationObservations y 2 2D = [@1 , @2 ] (a) y Quadratic loss (a) y Quadratic loss6 x 102.5 Projection Risk GSURE True Risk Quadratic loss 2 (a) yQuadratic loss1.51.51.5 1 1 1 ? ? x (y)(b) x?(y, ? at the optimal )24?6 8 1012 46. ConclusionSparsity: approximate signals with few atoms. dictionary 47. Conclusion Sparsity: approximate signals with few atoms.dictionaryCompressed sensing ideas:Randomized sensors + sparse recovery.Number of measurements signal complexity.CS is about designing new hardware. 48. Conclusion Sparsity: approximate si