Exact Recovery

8/10/2019 Exact Recovery

1/41

Robust Uncertainty Principles:

Exact Signal Reconstruction from Highly Incomplete

Frequency Information

Emmanuel Candes, Justin Romberg, and Terence Tao

Applied and Computational Mathematics, Caltech, Pasadena, CA 91125 Department of Mathematics, University of California, Los Angeles, CA 90095

June 2004; Revised August 2005

Abstract

This paper considers the model problem of reconstructing an object from incompletefrequency samples. Consider a discrete-time signal f CN and a randomly chosen setof frequencies . Is it possible to reconstruct ffrom the partial knowledge of its Fouriercoefficients on the set ?

A typical result of this paper is as follows. Suppose that fis a superposition of|T|spikes f(t) =

Tf() (t ) obeying

|T| CM (log N)1 ||,

for some constant CM > 0. We do not know the locations of the spikes nor their

amplitudes. Then with probability at least 1O(NM

),fcan be reconstructed exactlyas the solution to the1 minimization problem

ming

N1t=0

|g(t)|, s.t. g() = f() for all .

In short, exact recovery may be obtained by solving a convex optimization problem.We give numerical values for CM which depend on the desired probability of success.

Our result may be interpreted as a novel kind of nonlinear sampling theorem. Ineffect, it says that any signal made out of|T| spikes may be recovered by convexprogramming from almost every set of frequencies of size O(|T| log N). Moreover, thisis nearly optimal in the sense that anymethod succeeding with probability 1O(NM)would in general require a number of frequency samples at least proportional to

|T

| log N.The methodology extends to a variety of other situations and higher dimensions.

For example, we show how one can reconstruct a piecewise constant (one- or two-dimensional) object from incomplete frequency samplesprovided that the number of

jumps (discontinuities) obeys the condition aboveby minimizing other convex func-tionals such as the total variation off.

Keywords. Random matrices, free probability, sparsity, trigonometric expansions, uncertaintyprinciple, convex optimization, duality in optimization, total-variation minimization, image recon-struction, linear programming.

1


2/41

Acknowledgments. E. C. is partially supported by a National Science Foundation grant DMS

01-40698 (FRG) and by an Alfred P. Sloan Fellowship. J. R. is supported by National Science

Foundation grants DMS 01-40698 and ITR ACI-0204932. T. T. is a Clay Prize Fellow and is

supported in part by grants from the Packard Foundation. E. C. and T.T. thank the Institute for

Pure and Applied Mathematics at UCLA for their warm hospitality. E. C. would like to thank

Amos Ron and David Donoho for stimulating conversations, and Po-Shen Loh for early numericalexperiments on a related project. We would also like to thank Holger Rauhut for corrections on an

earlier version and the anonymous referees for their comments and references.

1 Introduction

In many applications of practical interest, we often wish to reconstruct an object (a discretesignal, a discrete image, etc.) from incomplete Fourier samples. In a discrete setting, wemay pose the problem as follows; let fbe the Fourier transform of a discrete object f(t),t= (t1, . . . , td) ZdN := {0, 1, . . . , N 1}d,

f() =tZdN

f(t)e2i(1t1+...+dtd)/N.

The problem is then to recover ffrom partial frequency information, namely, from f(),where = (1, . . . , d) belongs to some set of cardinality less than N

dthe size of thediscrete object.

In this paper, we show that we can recover f exactly from observations f| on small setof frequencies provided that f is sparse. The recovery consists of solving a straightforwardoptimization problem that finds f of minimal complexity with f() = f(), .

1.1 A puzzling numerical experiment

This idea is best motivated by an experiment with surprisingly positive results. Considera simplified version of the classical tomography problem in medical imaging: we wish toreconstruct a 2D image f(t1, t2) from samples f| of its discrete Fourier transform ona star-shaped domain [4]. Our choice of domain is not contrived; many real imagingdevices collect high-resolution samples along radial lines at relatively few angles. Figure 1(b)illustrates a typical case where one gathers 512 samples along each of 22 radial lines.

Frequently discussed approaches in the literature of medical imaging for reconstructing anobject from polar frequency samples are the so-called filtered backprojection algorithms. In

a nutshell, one assumes that the Fourier coefficients at all of the unobserved frequencies arezero (thus reconstructing the image of minimal energy under the observation constraints).This strategy does not perform very well, and could hardly be used for medical diagnostics[24]. The reconstructed image, shown in Figure 1(c), has severe nonlocal artifacts causedby the angular undersampling. A good reconstruction algorithm, it seems, would haveto guess the values of the missing Fourier coefficients. In other words, one would needto interpolate f(1, 2). This seems highly problematic, however; predictions of Fouriercoefficients from their neighbors are very delicate, due to the global and highly oscillatorynature of the Fourier transform. Going back to the example in Figure 1, we can see the

2


3/41

(a) (b)

(c) (d)

Figure 1: Example of a simple recovery problem. (a) The Logan-Shepp phantom testimage. (b) Sampling domain in the frequency plane; Fourier coefficients are sampledalong 22 approximately radial lines. (c) Minimum energy reconstruction obtained by settingunobserved Fourier coefficients to zero. (d) Reconstruction obtained by minimizing the totalvariation, as in (1.1). The reconstruction is an exact replica of the image in (a).

3


4/41

problem immediately. To recover frequency information near (21/N, 22/N), where21/N is near, we would need to interpolate fat the Nyquist rate 2/N. However,we only have samples at rate about /22; the sampling rate is almost 50 times smaller thanthe Nyquist rate!

We propose instead a strategy based on convex optimization. Let

g

TV be the total-

variation norm of a two-dimensional objectg. For discrete datag(t1, t2), 0 t1, t2 N1,gTV =

t1,t2

|D1g(t1, t2)|2 + |D2g(t1, t2)|2,

whereD1is the finite differenceD1g= g(t1, t2)g(t1 1, t2) andD2g= g(t1, t2)g(t1, t2 1). To recoverf from partial Fourier samples, we find a solution f to the optimizationproblem

min gTV sub ject to g() = f() for all . (1.1)In a nutshell, given partial observation f|, we seek a solution f with minimum complexityhere Total Variation (TV)and whose visible coefficients match those of the unknownobjectf. Our hope here is to partially erase some of the artifacts that classical reconstruc-

tion methods exhibit (which tend to have large TV norm) while maintaining fidelity to theobserved data via the constraints on the Fourier coefficients of the reconstruction.

When we use (1.1) for the recovery problem illustrated in Figure 1 (with the popular Logan-Shepp phantom as a test image), the results are surprising. The reconstruction is exact;that is, f = f! This numerical result is also not special to this phantom. In fact, weperformed a series of experiments of this type and obtained perfect reconstruction on manysimilar test phantoms.

1.2 Main results

This paper is about a quantitative understanding of this very special phenomenon. Forwhich classes of signals/images can we expect perfect reconstruction? What are the trade-offs between complexity and number of samples? In order to answer these questions, wefirst develop a fundamental mathematical understanding of a special one-dimensional modelproblem. We then exhibit reconstruction strategies which are shown to exactly reconstructcertain unknown signals, and can be extended for use in a variety of related and sophisti-cated reconstruction applications.

For a signalf CN, we define the classical discrete Fourier transform Ff= f :CN CNby

f() :=N1

t=0

f(t) e2it/N, = 0, 1, . . . , N

1. (1.2)

If we are given the value of the Fourier coefficients f() for allfrequencies ZN, thenone can obviously reconstruct fexactly via the Fourier inversion formula

f(t) = 1

N

N1=0

f() e2it/N.

Now suppose that we are only given the Fourier coefficients f| sampled on some partialsubset ZNof all frequencies. Of course, this is not enough information to reconstruct

4


5/41

f exactly in general; f has N degrees of freedom and we are only specifying|| < N ofthose degrees (here and below|| denotes the cardinality of ).Suppose, however, that we also specify that fis supported on a small (but a prioriunknown)subsetT ofZN; that is, we assume that fcan be written as a sparse superposition of spikes

f(t) =T

f()(t ), (t) = 1{t=0}.

In the case where Nis prime, the following theorem tells us that it is possible to recover fexactly if|T| is small enough.

Theorem 1.1 Suppose that the signal lengthN is a prime integer. Let be a subset of{0, . . . , N 1}, and letf be a vector supported onT such that

|T| 12||. (1.3)

Thenfcan be reconstructed uniquely from andf|. Conversely, ifis not the set of allNfrequencies, then there exist distinct vectorsf , g such that|supp(f)|, |supp(g)| 12 ||+ 1and such thatf|= g|.

Proof We will need the following lemma [29], from which we see that with knowledge ofT, we can reconstruct funiquely (using linear algebra) from f|:

Lemma 1.2 ( [29], Corollary 1.4) Let Nbe a prime integer and T, be subsets ofZN.Put2(T) (resp. 2()) to be the space of signals that are zero outside ofT (resp. ). Therestricted Fourier transformFT: 2(T) 2() is defined as

FTf := f| for allf 2(T),

If|T| = ||, thenFT is a bijection; as a consequence, we thus see thatFT is injectivefor|T| || and surjective for|T| ||. Clearly, the same claims hold if the FouriertransformF is replaced by the inverse Fourier transformF1.

To prove Theorem 1.1, assume that|T| 12 ||. Suppose for contradiction that there weretwo objects f, g such that f| = g| and|supp(f)|, |supp(g)| 12 ||. Then the Fouriertransform off g vanishes on , and|supp(f g)| ||. By Lemma 1.2 we see thatFsupp(fg) is injective, and thus f g= 0. The uniqueness claim follows.

We now examine the converse claim. Since|| < N, we can find disjoint subsets T, S ofZN such that|T|, |S| 12 || + 1 and|T| + |S|=|| + 1. Let 0 be some frequency whichdoes not lie in . Applying Lemma 1.2, we have that FTS{0} is a bijection, and thuswe can find a vector h supported on T Swhose Fourier transform vanishes on but isnonzero on 0; in particular, h is not identically zero. The claim now follows by takingf :=h|T and g := h|S.

Note that ifN is not prime, the lemma (and hence the theorem) fails, essentially becauseof the presence of non-trivial subgroups ofZN with addition modulo N; see Sections 1.3

5


6/41

and 1.4 for concrete counter examples, and [7], [29] for further discussion. However, itis plausible to think that Lemma 1.2 continues to hold for non-prime N if T and areassumed to begenericin particular, they are not subgroups ofZN, or cosets of subgroups.IfTand are selected uniformly at random, then it is expected that the theorem holdswith probability very close to one; one can indeed presumably quantify this statement by

adapting the arguments given above but we will not do so here. However, we refer the readerto Section 1.7 for a rapid presentation of informal arguments pointing in this direction.

A refinement of the argument in Theorem 1.1 shows that for fixed subsets T, S of timedomain an in the frequency domain, the space of vectors f, g supported on T, S suchthat f|= g| has dimension|T S| || when|T S| ||, and has dimension|T S|otherwise. In particular, if we let (Nt) denote those vectors whose support has size atmost Nt, then set of the vectors in (Nt) which cannot be reconstructed uniquely in thisclass from the Fourier coefficients sampled at , is contained in a finite union of linearspaces of dimension at most 2Nt ||. Since (Nt) itself is a finite union of linear spacesof dimensionNt, we thus see that recovery off from f| is in principle possiblegenericallywhenever|supp(f)|= Nt 0 is aconstant): in these cases, solving the convex problem(P1) recoversf exactly.

To establish this upper bound, we will assume that the observed Fourier coefficients arerandomly sampled. Given the number N of samples to take in the Fourier domain, we

6


7/41

choose the subset uniformly at random from all sets of this size; i.e. each of the NN

possible subsets are equally likely. Our main theorem can now be stated as follows.

Theorem 1.3 Letf CN be a discrete signal supported on an unknown setT, and choose of size|| =N uniformly at random. For a given accuracy parameterM, if

|T| CM (log N)1 ||, (1.6)

then with probability at least 1 O(NM), the minimizer to the problem (1.5) is uniqueand is equal to f.

Notice that (1.6) essentially says that |T| is of size ||, modulo a constant and a logarithmicfactor. Our proof gives an explicit value ofCM, namely, CM 1/[23(M+ 1)] (valid for|| N/4,M 2 andN 20, say) although we have not pursued the question of exactlywhat the optimal value might be.

In Section 5, we present numerical results which suggest that in practice, we can expect to

recovermostsignalsfmore than 50% of the time if the size of the support obeys |T| ||/4.By most signals, we mean that we empirically study the success rate for randomly selectedsignals, and do not search for the worst case signalfthat which needs the most frequencysamples. For|T| ||/8, the recovery rate is above 90%. Empirically, the constants 1/4and 1/8 do not seem to vary for N in the range of a few hundred to a few thousand.

1.3 For almost every

As the theorem allows, there exist sets and functions f for which the 1-minimizationprocedure does not recover f correctly, even if|supp(f)| is much smaller than||. Wesketch two counter examples:

A discrete Dirac comb. Suppose that N is a perfect square and consider the picket-fence signal which consists of spikes of unit height and with uniform spacing equal to

N. This signal is often used as an extremal point for uncertainty principles [7, 8]as one of its remarkable properties is its invariance through the Fourier transform.Hence suppose that is the set of all frequencies but the multiples of

N, namely,

|| =N N. Then f|= 0 and obviously the reconstruction is identically zero.Note that the problem here does not really have anything to do with 1-minimizationper se;fcannot be reconstructed from its Fourier samples on thereby showing thatTheorem 1.1 does not work as is for arbitrary sample sizes.

Boxcar signals. The example above suggests that in some sense |T| must not be greaterthan about

||. In fact, there exist more extreme examples. Assume the sample sizeNis large and consider for example the indicator function fof the interval T := {t:N0.01 < t N/2< N0.01} and let be the set :={ :N/3< c for some absolute constant c > 0. Because of

7


8/41

this, the signalf |h|2 will have smaller 1-norm thanf for >0 sufficiently small(and Nsufficiently large), while still having the same Fourier coefficients as fon .Thus in this case f is not the minimizer to the problem (P1), despite the fact thatthe support off is much smaller than that of .

The above counter examples relied heavily on the special choice of (and to a lesserextent of supp(f)); in particular, it needed the fact that the complement of contained alarge interval (or more generally, a long arithmetic progression). But for most sets , largearithmetic progressions in the complement do not exist, and the problem largely disappears.In short, Theorem 1.3 essentially says that for mostsets ofTof size about||, there is noloss of information.

1.4 Optimality

Theorem 1.3 states that for any signal fsupported on an arbitrary setTin the time domain,(P1) recoversfexactlywith high probability from a number of frequency samples thatis within a constant ofM |T| log N. It is natural to wonder whether this is a fundamentallimit. In other words, is there an algorithm that can recover an arbitrary signal from farfewer random observations, and with the same probability of success?

It is clear that the number of samples needs to be at least proportional to |T|, otherwiseFT will not be injective. We argue here that it must also be proportional to Mlog N toguarantee recovery of certain signals from the vast majority of sets of a certain size.

Suppose f is the Dirac comb signal discussed in the previous section. If we want to havea chance of recovering f, then at the very least, the observation set and the frequencysupport W = supp fmust overlap at one location; otherwise, all of the observations arezero, and nothing can be done. Choosing uniformly at random, the probability that it

includes none of the members ofW is

P ( W = ) =NN

||

N|| 1 2||

N

N,

where we have used the assumption that||>|T|= N. Then for P( W =) to besmaller than NM, it must be true that

N log

1 2||

N

Mlog N,

and if we make the restriction that|| cannot be as large as N/2, meaning that log(1 2||N ) 2||N , we have

|| Const M

N log N.For the Dirac comb then, any algorithm must have|| |T| M log Nobservations for theidentified probability of success.

Examples for larger supports T exist as well. If N is an even power of two, we can su-perimpose 2m Dirac combs at dyadic shifts to construct signals with time-domain support

8


9/41

|T|= 2mN and frequency-domain support|W|= 2mN for m= 0, . . . , log2

N. Thesame argument as above would then dictate that

|| Const M N|W| log N= Const M |T| log N.

In short, Theorem 1.3 identifies a fundamental limit. No recovery can be successful for allsignals using significantly fewer observations.

1.5 Extensions

As mentioned earlier, results for our model problem extend easily to higher dimensionsand alternate recovery scenarios. To be concrete, consider the problem of recovering aone-dimensional piecewise constant signal via

ming

tZN

|g(t) g(t 1)| g|= f|, (1.7)

where we adopt the convention that g(1) = g(N 1). In a nutshell, model (1.5) isobtained from (1.7) after differentiation. Indeed, let be the vector of first difference(t) =g(t) g(t 1), and note that (t) = 0. Obviously,

() = (1 e2i/N)g(), for all= 0and, therefore, with () = (1 e2i/N)1, the problem is identical to

min

1 |\{0}= (f)|\{0}, (0) = 0,

which is precisely what we have been studying.

Corollary 1.4 Put T ={t, f(t)= f(t1)}. Under the assumptions of Theorem 1.3,the minimizer to the problem (1.7) is unique and is equal fwith probability at least 1 O(NM)provided thatfbe adjusted so that

f(t) = f(0).

We now explore versions of Theorem 1.3 in higher dimensions. To be concrete, considerthe two-dimensional situation (statements in arbitrary dimensions are exactly of the sameflavor):

Theorem 1.5 PutN=n2. We letf(t1, t2), 1t1, t2n be a discrete real-valued imageand of a certain size be chosen uniformly at random. Assume that for a given accuracyparameterM,fis supported onT obeying(1.6). Then with probability at least1

O(NM),

the minimizer to the problem (1.5) is unique and is equal to f.

We will not prove this result as the strategy is exactly parallel to that of Theorem 1.3.Letting D1fbe the horizontal finite differences D1f(t1, t2) = f(t1, t2) f(t1 1, t2) andD2f be the vertical analog, we have just seen that we can think about the data as theproperly renormalized Fourier coefficients of D1f and D2f. Now put d = D1f+iD2f,wherei2 = 1. Then the minimum total-variation problem may be expressed as

min 1 subject to F= Fd, (1.8)

9


10/41

where Fis a partial Fourier transform. One then obtains a statement for piecewise constant2D functions, which is similar to that for sparse 1D signals provided that the support offbe replaced by{(t1, t2) : |D1f(t1, t2)|2 + |D2f(t1, t2)|2 = 0}. We omit the details.The main point here is that there actually are a variety of results similar to Theorem1.3. Theorem 1.5 serves as another recovery example, and provides a precise quantitative

understanding of the surprising result discussed at the beginning of this paper.

To be complete, we would like to mention that for complex valued signals, the minimum1 problem (1.5) and, therefore, the minimum TV problem (1.1) can be recast as specialconvex programs known as second order cone programs (SOCPs). For example, (1.8) isequivalent to

mint

u(t) subject to

|1(t)|2 + |2(t)|2 u(t), (1.9)

F(1+ i2) =Fd,

with variables u, 1 and 2 in RN (1 and 2 are the real and imaginary parts of). If in

addition, is real valued, then this is a linear program. Much progress has been made inthe past decade on algorithms to solve both linear and second-order cone programs [25],and many off-the-shelf software packages exist for solving problems such as (P1) and (1.9).

1.6 Relationship to uncertainty principles

From a certain point of view, our results are connected to the so-called uncertainty principles[7,8] which say that it is difficult to localize a signal f CN both in time and frequencyat the same time. Indeed, classical arguments show that f is the unique minimizer of (P1)if and only if

tZN|f(t) + h(t)| >

tZN|f(t)|, h = 0, h|= 0

PutT= supp(f) and apply the triangle inequalityZN

|f(t) + h(t)| =T

|f(t) + h(t)| +Tc

|h(t)| T

|f(t)| |h(t)| +Tc

|h(t)|.

Hence, a sufficient condition to establish that f is our unique solution would be to showthat

T

|h(t)| |T|, and ifFT is injective (has full column rank), there are many trigono-metric polynomials supported on in the Fourier domain which satisfy (2.1). We choose,with the hope that its magnitude on Tc is small, the one with minimum energy:

P := FFT(FTFT)1 sgn(f) (2.3)

whereF =FZN is the Fourier transform followed by a restriction to the set ; theembedding operator : 2(T) 2(ZN) extends a vector onTto a vector onZNby placingzeros outside ofT; and is the dual restriction map f=f|T. It is easy to see that P issupported on , and noting thatF= FT,Palso satisfies (2.1)

P=sgn(f).

Fixing fand its support T, we will prove Theorem 1.3 by establishing that if the set ischosen uniformly at random from all sets of size N C1M |T| log N, then

1. Invertibility. The operatorFT is injective, meaning thatFTFT in (2.3) isinvertible, with probability 1 O(NM).

2. Magnitude onTc. The function P in (2.3) obeys|P(t)|


15/41

2.3 The Bernoulli model

A set of Fourier coefficients is sampled using the Bernoulli model with parameter 0 <


16/41

With the above in mind, we continue

P

Failure() N

k=1

P (Failure(k)) P|| =k

P (Failure()) Nk=1

P|| =k 12 P (Failure()) .

Thus if we can bound the probability of failure for the Bernoulli model, we know that thefailure rate for the uniform model will be no more than twice as large.

3 Construction of the Dual Polynomial

The Bernoulli model holds throughout this section, and we carefully examine the minimumenergy dual polynomialPdefined in (2.3) and establish Theorem 2.2 and Lemma 2.3. The

main arguments hinge on delicate moment bounds for random matrices, which are presentedin Section 4. From here on forth, we will assume that| N| > Mlog Nsince the claim isvacuous otherwise (as we will see, CM 1/Mand thus (1.6) will force f 0, at whichpoint it is clear that the solution to (P1) is equal tof= 0).

We will find it convenient to rewrite (2.3) in terms of the auxiliary matrix

Hf(t) :=

tT:t=t

e2i(tt)

N f(t), (3.1)

and defineH0=

H.

To see the relevance of the operators H andH0, observe that

1||H = 1

||FFT

IT 1||H0 = 1

||FTFT

whereIT is the identity for2(T) (note that = IT). Then

P=

1||H

IT 1||H0

1sgnf.

The point here is to separate the constant diagonal ofFTFT(which is || everywhere)from the highly oscillatory off-diagonal. We will see that choosing at random makes H0essentially a noise matrix, making IT 1||H0 well conditioned.

3.1 Invertibility

We would like to establish invertibility of the matrix IT 1||H0 with high probability. Oneway to proceed would be to show that the operator norm (i.e. the largest eigenvalue) ofH0

16


17/41

is less than||. A straightforward way to do this is to bound the operator norm H0 bythe Frobenius normH0F:

H02 H02F := Tr(H0H0 ) =t1,t2

|(H0)t1,t2 |2. (3.2)

where (H0)t1,t2 is the matrix element at row t1 and column t2.

Using relatively simple statistical arguments, we can show that with high probability|(H0)t1,t2|2 ||. Applying (3.2) would then yield invertibility when|T|

||. Toshow that H0 is small for larger sets T (recall that|T| || (log N)1 is the desiredresult), we use estimates of the Frobenius norm of a large powerofH0, taking advantageof cancellations arising from the randomness of the matrix coefficients ofH0.

Our argument relies on a key estimate which we introduce now and shall be discussed ingreater detail in Section 3.2. Assume that 1/(1 +e) and n N/[4|T|(1 )]. Thenthe 2nth moment ofH0 obeys

E(Tr(H2n0 )) 2 4e(1 )n

nn+1 | N|n |T|n+1. (3.3)

Now this moment bound gives an estimate for the operator norm ofH0. To see this, notethat since H0 is self-adjoint

H02n = Hn0 2 Hn0 2F= Tr(H2n0 ).

Letting be a positive number 0 <


18/41

Proof The first part of the theorem follows from (3.4). For the second part, we begin byobserving that a typical application of the large deviation theorem gives

P(||


19/41

3.3 Magnitude of the polynomial on the complement ofT

In the remainder of Section 3, we argue that max t/T|P(t)|< 1 with high probability andprove Lemma 2.3. We first develop an expression for P(t) by making use of the algebraicidentity

(1 M)1

= (1 Mn

)1

(1 + M+ . . . + Mn

1

).Indeed, we can write

(IT 1||nHn0 )

1 =IT+ R where R=p=1

1

||pnHpn0 ,

so that the inverse is given by the truncated Neumann series

(IT 1||H0)1 = (IT+ R)

n1m=0

1

||mHm0 . (3.10)

The point is that the remainder term R is quite small in the Frobenius norm: suppose thatHF ||, then

RF n

1 n .In particular, the matrix coefficients ofR are all individually less than n/(1 n). Intro-duce the -norm of a matrix asM= supx1 M x which is also given by

M = supi

j

|M(i, j)|.

It follows from the Cauchy-Schwarz inequality that

M2 supi

#col(M)j

|M(i, j)|2 #col(M) M2F,

where by # col(M) we mean the number of columns ofM. This observation gives the crudeestimate

R |T|1/2 n

1 n . (3.11)As we shall soon see, the bound (3.11) allows us to effectively neglect the R term in thisformula; the only remaining difficulty will be to establish good bounds on the truncatedNeumann series 1||H

n1m=0

1||m H

m0 .

3.4 Estimating the truncated Neumann series

From (2.3) we observe that on the complement ofT

P = 1

||H(IT 1

||H0)1sgn(f),

since the component in (2.3) vanishes outside ofT. Applying (3.10), we may rewriteP as

P(t) =P0(t) + P1(t), t Tc,

19


20/41

where

P0= Snsgn(f), P1= 1

||HR(I+ Sn1)sgn(f)

and

Sn =

n

m=1 |

|m(H)m.

Leta0, a1> 0 be two numbers with a0+ a1= 1. Then

P

suptTc

|P(t)| >1

P(P0> a0) + P(P1> a1),

and the idea is to bound each term individually. Put Q0 = Sn1sgn(f) so that P1 =1||HR

(sgn(f) + Q0). With these notations, observe that

P1 1||HR(1 + Q0).

Hence, bounds on the magnitude ofP1 will follow from bounds onHR together withbounds on the magnitude of Q0. It will be sufficient to derive bounds onQ0 (sinceQ0 Q0) which will follow from those onP0 sinceQ0 is nearly equal to P0 (theydiffer by only one very small term).

Fix t Tc and write P0(t) as

P0(t) =

nm=1

||mXm(t), Xm = (H)m sgn(f).

The idea is to use moment estimates to control the size of each term Xm(t).

Lemma 3.4 Setn= km. ThenE|Xm(t0)|2k

obeys the same estimate as that in Theorem3.3 (up to a multiplicative factor|T|1), namely,

E|Xm(t0)|2k 1|T|Bn, (3.12)

whereBn is the right-hand side of (3.9). In particular, following (3.3)

E|Xm(t0)|2k 2 en (4/(1 ))n nn+1 |T|n| N|n, (3.13)provided thatn N4|T|(1) .

The proof of these moment estimates mimics that of Theorem 3.3 and may be found in theAppendix.

Lemma 3.5 Fix a0 = .91. Suppose that|T| obeys (3.5) and let BM be the set where|| < (1 M) | N| with M as in (3.8). For each t ZN, there is a set At with theproperty

P(At)> 1 n, n = 2(1 M)2n n2 en2n (0.42)2n,and

|P0(t)| < .91, |Q0(t)| < .91 onAt BcM.

20


21/41

As a consequence,P(sup

t|P0(t)| > a0) NM+ N n,

and similarly forQ0.

Proof We suppose that n is of the form n= 2J

1 (this property is not crucial and onlysimplifies our exposition). For each m and k such that kmn, it follows from (3.5) and(3.13) together with some simple calculations that

E|Xm(t)|2k 2n en2n | N|2n. (3.14)

Again|| | N| and we will develop a bound on the set BcM where|| (1 M)| N|.On this set

|P0(t)| nm=1

Ym, Ym= 1

(1 M)m | N|m|Xm(t)|.

Fix j >0, 0 j < J, such thatJ1j=0 2j j a0. Obviously,P(

nm=1

Ym> a0) J1j=0

2j+11m=2j

P(Ym > j) J1j=0

2j+11m=2j

2Kjj E|Ym|2Kj .

whereKj = 2Jj . Observe that for eachm with 2j m .91) 2(1 M)2n n2 en2n (0.42)2n. (3.15)

The claim for Q0 is identical and the lemma follows.

Lemma 3.6 Fixa1= .09. Suppose that the pair(, n) obeys|T|3/2 n1n a1/2. ThenP1 a1

on the eventA {HF ||}, for someA obeyingP(A) 1 O(NM).

Proof As we observed before, (1)P1 HR(1 + Q0), and (2) Q0 obeysthe bound stated in Lemma 3.5. Consider then the event{Q0 1}. On this event,P1 a1 if 1||HR a1/2. The matrix H obeys 1||H |T| since H has

21


22/41

|T| columns and each matrix element is bounded by|| (note that far better bounds arepossible). It then follows from (3.11) that

H R |T|3/2 n

1 n ,

with probability at least 1 O(NM). We then simply need to choose and n such thatthe right-hand side is less than a1/2.

3.5 Proof of Lemma 2.3

We have now assembled all the intermediate results to prove Lemma 2.3 (and hence our maintheorem). Indeed, we proved that|P(t)| < 1 for all t Tc (again with high probability),provided that and n be selected appropriately as we now explain.

Fix M > 0. We choose = .42(1 M), where M is taken as in (3.8), and n to be the

nearest integer to (M+ 1) log N.

1. With this special choice,n = 2[(M+1)log N]2 N(M+1) and, therefore, Lemma 3.5

implies that both P0 and Q0 are bounded by .91 outside ofTc with probability at

least 1 [1 + 2((M+ 1) log N)2] NM.2. Lemma 3.6 assures that it is sufficient to have N3/2n/(1n) .045 to have

|P1(t)| < .09 on Tc. Because log(.42) .87 and log(.045) 3.10, this condition isapproximately equivalent to

(1.5 .87(M+ 1)) log N 3.10.

Take M 2, for example; then the above inequality is satisfied as soon as N 17.To conclude, Lemma 2.3 holds with probability exceeding 1 O([(M+ 1) log N)2] NM)ifT obeys

|T| CM| N|log N

, CM= .422(1 )

4 (M+ 1)(1 + o(1)).

In other words, we may take CMin Theorem 1.3 to be of the form

CM= 1

22.6(M+ 1)(1 + o(1)). (3.16)

4 Moments of Random Matrices

This section is devoted entirely to proving Theorem 3.3 and it may be best first to sketchhow this is done. We begin in Section 4.1 by giving a preliminary expansion of the quantityE(Tr(H2n0 )). However, this expansion is not easily manipulated, and needs to be rearrangedusing the inclusion-exclusion formula, which we do in Section 4.2, and some elements ofcombinatorics (the Stirling number identities) which we give in Section 4.3. This allowsus to establish a second, more usable, expansion forE(Tr(H2n0 )) in Section 4.4. The proof

22


23/41

of the theorem then proceeds by bounding the individual terms in this second expansion.There are two quantities in the expansion that need to be estimated; a purely combinatorialquantity P(n, k) which we estimate in Section 4.5, and a power series Fn() which weestimate in Section 4.6. Both estimates are combined in Section 4.7 to prove the theorem.

Before we begin, we wish to note that the study of the eigenvalues of operators like H0

has a bit of historical precedence in the information theory community. Note that ITH0 is essentially the composition of three projection operators; one that time limits afunction toT, followed by a bandlimiting to , followed by a final restriction to T. Thedistribution of the eigenvalues of such operators was studied by Landau and others [2022]while developing the prolate spheroidal wave functions that are now commonly used insignal processing and communications. This distribution was inferred by examining thetrace of large powers of this operator (see [22] in particular), much as we will do here.

4.1 A First Formula for the Expected Value of the Trace of(H0)2n

Recall thatH0(t, t), t, t T, is the|T| |T| matrix whose entries are defined by

H0(t, t) =

0 t= t,c(t t) t =t, c(u) =

e2iN u. (4.1)

A diagonal element of the 2nth power ofH0 may be expressed as

H2n0 (t1, t1) =

t2,...,t2n: tj=tj+1c(t1 t2) . . . c(t2n t1),

where we adopt the convention that t2n+1= t1 whenever convenient and, therefore,

E(Tr(H2n0 )) =

t1,...,t2n: tj=tj+1E 1,...,2n

e2iNP2n

j=1j(tjtj+1) .Using (2.5) and linearity of expectation, we can write this as

t1,...,t2n: tj=tj+1

01,...,2nN1

e2iN

P2nj=1j(tjtj+1) E

2nj=1

I{j}

.

The idea is to use the independence of the I{j}s to simplify this expression substantially;however, one has to be careful with the fact that some of the j s may be the same, atwhich point one loses independence of those indicator variables. These difficulties require

a certain amount of notation. We let ZN ={0, 1, . . . , N 1} be the set of all frequenciesas before, and letA be the finite set A := {1, . . . , 2n}. For all := (1, . . . , 2n), we definethe equivalence relation on A by saying that j j if and only ifj = j. We letP(A) be the set of all equivalence relations on A. Note that there is a partial ordering onthe equivalence relations as one can say that12 if1 is coarser than2, i.e. a2 bimpliesa 1b for alla, b A. Thus, the coarsest element inP(A) is the trivial equivalencerelation in which all elements of A are equivalent (just one equivalence class), while thefinest element is the equality relation =, i.e. each element ofA belongs to a distinct class(|A| equivalence classes).

23


24/41

For each equivalence relationinP, we can then define the sets () Z2nN by() := { Z2nN :=}

and the sets () Z2nN by

() := P:() = { Z2n

N :}.

Thus the sets{() : P} form a partition ofZ2nN . The sets () can also be definedas

() := { Z2nN :a= b whenever a b}.For comparison, the sets () can be defined as

() := { Z2nN :a = b whenever a b, and a=b whenever a b}.We give an example: suppose n = 2 and fix such that 1 4 and 2 3 (exactly 2equivalence classes); then () :={ Z4N : 1 = 4, 2 = 3, and 1= 2} while() := { Z

4N :1=4, 2 = 3}.

Now, let us return to the computation of the expected value. Because the random variablesIk in (2.4) are independent and have all the same distribution, the quantity E[

2nj=1 Ij ]

depends only on the equivalence relation and not on the value of itself. Indeed, wehave

E(

2nj=1

Ij) =|A/|,

where A/ denotes the equivalence classes of. Thus we can rewrite the precedingexpression as

E(Tr(H2n

0 )) =

t1,...,t2n: tj=tj+1

P(A) |A/

| () e

2iN

P2nj=1j(tj

tj+1)

(4.2)

whereranges over all equivalence relations.We would like to pause here and consider (4.2). Taken= 1, for example. There are onlytwo equivalent classes on{1, 2}and, therefore, the right-hand side is equal to

t1,t2: t1=t2

(1,2)Z2N:1=2e2iN 1(t1t1) + 2

(1,2)Z2N:1=2

e2iN 1(t1t2)+i2(t2t1)

.

Our goal is to rewrite the expression inside the brackets so that the exclusion 1=2 doesnot appear any longer, i.e. we would like to rewrite the sum over Z

2N : 1= 2 in

terms of sums over Z2N :1 =2, and over Z2N. In this special case, this is quiteeasy as

Z2N:1=2=

Z2N

Z2N:1=2The motivation is as follows: removing the exclusion allows to rewrite sums as product,e.g.

Z2N=1

e2iN 1(t1t2)

2

e2iN 2(t2t1);

24


25/41

and each factor is equal to either Nor 0 depending on whether t1= t2 or not.

The next section generalizes these ideas and develop an identity, which allows us to rewritesums over () in terms of sums over ().

4.2 Inclusion-Exclusion formulae

Lemma 4.1 (Inclusion-Exclusion principle for equivalence classes) Let A and Gbe nonempty finite sets. For any equivalence class P(A) on G|A|, we have

()

f() =

1P:1(1)|A/||A/1|

AA/1

(|A/ | 1)!

(1)f(). (4.3)

Thus, for instance, ifA={1, 2, 3} and is the equality relation, i.e. j k if and only ifj = k, this identity is saying that

1,2,3G:1,2,3 distinct

=

1,2,3G

1,2,3:1=2

1,2,3G:2=3

1,2,3G:3=1

+2

1,2,3G:1=2=3

where we have omitted the summands f(1, 2, 3) for brevity.

ProofBy passing from Ato the quotient space A/ if necessary we may assume thatis the equality relation =. Now relabelingA as{1, . . . , n},1 as, andA asA, it sufficesto show that

Gn:1,...,n distinct

f() =

P({1,...,n})

(1)n|{1,...,n}/| A{1,...,n}/

(|A| 1)!

()f(). (4.4)

We prove this by induction on n. Whenn= 1 both sides are equal to

G f(). Nowsuppose inductively thatn >1 and the claim has already been proven for n1. We observethat the left-hand side of (4.4) can be rewritten as

Gn1:1,...,n1 distinct

nG

f(, n)

n1

j=1

f(, j) ,where := (1, . . . , n1). Applying the inductive hypothesis, this can be written as

P({1,...,n1})(1)n1|{1,...,n1}/|

A{1,...,n1}/

(|A| 1)!

()

nG

f(, n)

1jnf(, j)

. (4.5)

25


26/41

Now we work on the right-hand side of (4.4). If is an equivalence class on{1, . . . , n},let be the restriction of to{1, . . . , n 1}. Observe that can be formed fromeither by adjoining the singleton set{n}as a new equivalence class (in which case we write= {, {n}}, or by choosing aj {1, . . . , n 1}and declaringn to be equivalent to j (inwhich case we write= {, {n}}/(j =n)). Note that the latter construction can recover

the same equivalence class in multiple ways if the equivalence class [j]

ofj in hassize larger than 1, however we can resolve this by weighting each j by 1|[j] | . Thus we havethe identity

P({1,...,n})

F() =

P({1,...,n1})F({, {n}})

+

P({1,...,n1})

n1j=1

1

|[j]|F({, {n}}/(j=n))

for any complex-valued function F onP({1, . . . , n}). Applying this to the right-hand sideof (4.4), we see that we may rewrite this expression as the sum of

P({1,...,n1})

(1)n(|{1,...,n1}/|+1) A{1,...,n1}/

(|A| 1)!

()f(, n)

and P({1,...,n1})

(1)n|{1,...,n1}/|n1j=1

T(j)

()f(, j),

where we adopt the convention = (1, . . . , n1). But observe that

T(j) :=

1

|[j] | A{1,...,n}/({,{n}}/(j=n))(|A| 1)! = A{1,...,n1}/(|A| 1)!and thus the right-hand side of (4.4) matches (4.5) as desired.

4.3 Stirling Numbers

As emphasized earlier, our goal is to use our inclusion-exclusion formula to rewrite the sum(4.2) as a sum over (). In order to do this, it is best to introduce another element ofcombinatorics, which will prove to be very useful.

For anyn, k 0, we define theStirling number of the second kindS(n, k) to be the numberof equivalence relations on a set of n elements which have exactly k equivalence classes,thus

S(n, k) := # {P(A) : |A/ | =k}.Thus for instance S(0, 0) = S(1, 1) = S(2, 1) = S(2, 2) = 1, S(3, 2) = 3, and so forth. Weobserve the basic recurrence

S(n + 1, k) =S(n, k 1) + kS(n, k) for all k , n 0. (4.6)

26


27/41

This simply reflects the fact that ifa is an element ofA and is an equivalence relationon A with k equivalence classes, then either a is not equivalent to any other element ofA(in which case has k 1 equivalence classes on A\{a}), or a is equivalent to one of thek equivalence classes ofA\{a}.We now need an identity for the Stirling numbers1.

Lemma 4.2 For anyn 1 and0


28/41

Proof Elementary calculus shows that for x >0, the function g(x) = xxn1

(1)x is increasingfor x < x and decreasing for x > x, where x := (n 1)/ log 1 . If 1 e1n, thenx 1, and so the alternating series Fn() =

k=1(1)n+kg(k) has magnitude at most

g(1) = 1. Otherwise the series has magnitude at most

g(x) = exp((n 1)(log(n 1) log log1 1))and the claim follows.

Roughly speaking, this means that Fn() behaves like for n = O(log[1/]) and behaveslike (n/ log[1/])n for n log[1/]. In the sequel, it will be convenient to express thisbound as

Fn() G(n),where

G(n) = 1, log

1 1 n,

exp((n 1)(log(n 1) log log1 1)), log 1 >1 n.(4.9)

Note that we voluntarily exchanged the function arguments to reflect the idea that we shallviewGas a function ofnwhile will serve as a parameter.

4.4 A Second Formula for the Expected Value of the Trace ofH2n0

Let us return to (4.2). The inner sum of (4.2) can be rewritten as

P(A)|A/|

()f()

with f() :=e2iN

P1j2nj(tjtj+1). We prove the following useful identity:

Lemma 4.4

P(A)

|A/|()

f() =

1P(A)

(1)

f()

AA/1

F|A|(). (4.10)

Proof Applying (4.3) and rearranging, we may rewrite this as

1P(A) T(1)

(1) f(

),

whereT(1) =

P(A):1

|A/|(1)|A/||A/1| AA/1

(|A/ | 1)!.

Splitting A into equivalence classes A ofA/ 1, we observe that

T(1) =

AA/1

P(A)

|A/|(1)|A/||A|(|A/ | 1)!;

28


29/41

splitting based on the number of equivalence classes|A/ |, we can write this as

AA/1

|A|k=1

S(|A|, k)k(1)|A|k(k 1)! =

AA/1F|A|()

by (4.8). Gathering all this together, we have proven the identity (4.10).

We specialize (4.10) to the function f() := exp(i

1j2n j(tj tj+1)) and obtain

E[Tr(H2n0 )] =

P(A)

t1,...,t2nT: tj=tj+1

()

e2iN

P2nj=1j(tjtj+1)

AA/

F|A|().

(4.11)We now compute

I() =

()e2iN

P1j2nj(tjtj+1).

For every equivalence classA

A/

, lettAdenote the expressiont

A :=

aA(ta

ta+1),

and let A denote the expression A := a for any a A (these are all equal since ()). Then

I() =

(A)AA/Z|A/|N

e2iN

PAA/ AtA =

AA/

AZN

e2iN AtA .

We now see the importance of (4.11) as the inner sum equals|ZN|= N when tA = 0 andvanishes otherwise. Hence, we proved the following:

Lemma 4.5 For every equivalence classA A/ , lettA := aA(ta ta+1). Then

E[Tr(H2n0 )] =

P(A)

tT2n: tj=tj+1 andtA=0 for allA

N|A/|AA/

F|A|(). (4.12)

This formula will serve as a basis for all of our estimates. In particular, because of theconstraint tj= tj+1, we see that the summand vanishes if A/ contains any singletonequivalence classes. This means, in passing, that the only equivalence classes which con-tribute to the sum obey|A/ | n.

4.5 Proof of Theorem 3.3

Let be an equivalence which does not contain any singleton. Then the following inequalityholds

#{t T2n : tA = 0 for all A A/ } |T|2n|A/|+1.To see why this is true, observe that as linear combinations of t1, . . . , t2n, the expressionstjtj+1are all linearly independent of each other except for the constraint

2nj=1 tjtj+1=

0. Thus we have|A/ | 1 independent constraints in the above sum, and so the numberofts obeying the constraints is bounded by|T|2n|A/|+1.

29


30/41

It then follows from (4.12) and from the bound on the individual terms F|A|() (4.9) that

E(Tr(H2n0 )) nk=1

Nk|T|2nk+1 P(A,k)

AA/

G(|A|), (4.13)

whereP(A, k) denotes all the equivalence relations on A with k equivalence classes andwith no singletons. In other words, the expected value of the trace obeys

E(Tr(H2n0 )) nk=1

Nk|T|2nk+1Q(n, k),

whereQ(n, k) :=

P(A,k)

AA/

G(|A|). (4.14)

The idea is to estimate the quantityQ(n, k) by obtaining a recursive inequality. Before wedo this, however, observe that for 1/(1 + e),

G(n + 1) nG(n)

for all n 1. To see this, we use the fact that log Gis convex and hence

log G(n + 1) log G(n) + ddn

log G(n + 1).

The claim follows by a routine computation which shows that ddnlog G(n+ 1) log nwhenever log log 1 0.We now claim the recursive inequality

Q(n, k) (n 1)Q(n 1, k) + (n 1)G(2)Q(n 2, k 1), (4.15)which is valid for all n 2, k 1. To see why this holds, suppose that is an elementofA and is inP(n, k). Then either (1) belongs to an equivalence class that has onlyone other element ofA (for which there are n 1 choices), and on taking that class outone obtains the (n 1)G(2)Q(n 2, k 1) term, or (2) belongs to an equivalence classwith more than two elements, thus removing from A gives rise to an equivalence classinP(A\{}, k). To control this contribution, let be an element ofP(A\{}, k) andlet A1, . . . , Ak be the corresponding equivalence classes. The element is attached to oneof the classes Ai, and causes G(|Ai|) to increase by at most|Ai|. Therefore, this termscontribution is less than

P(A\{},k)

ki=1

|Ai|

AA/G(|A|).

But clearlyk

i=1 |Ai| =n 1, and so this expression simplifies to (n 1)Q(n 1, k).From the recursive inequality, one obtains from induction that

Q(n, k) G(2)k(2n)nk. (4.16)

30


31/41

The claim is indeed valid for allQ(1, k)s andQ(2, k)s. Then if one assumes that the claimis established for all pairs (m, k) with m < n, the inequality (4.15) shows the property form= n. We omit the details.

The bound (4.16) then automatically yields a bound on the trace,

E(Tr(H2n0 )) nk=1

Nk|T|2nk+1G(2)k(4n)2nk.

With = N G(2)/(4n|T|), the right-hand side can be rewritten as|T|2n+1(4n)2nk andsince

k n max(, n), we established that

E(Tr(H2n0 ) n

Nn|T|n+1G(2)n(4n)n, n NG(2)4|T| ,N|T|2nG(2)(4n)2n1 otherwise. (4.17)

We recall that G(2) = /(1) and thus, (4.17) is nearly the content of Theorem 3.3except for the loss of the factor en in the case where n is not too large.

To recover this additional factor, we begin by observing that (4.15) gives

Q(2k, k) (2k 1)G(2)Q(2(k 1), k 1)since Q(n, k) = 0 for n < 2k. It follows that

Q(2k, k) (2k 1)(2k 3) . . . 3 G(2)k = (2k 1)!2k1(k 1)!G(2)

k,

and a simple induction shows that

Q(n, k) (n 1)(n 2) . . . 2k 2nk Q(2k, k) =(n 1)!(k

1)!

2n2k+1 G(2)k, (4.18)

which is slightly better than (4.16). In short,

E(Tr(H2n0 )) nk=1

B(2n, k), B(2n, k) :=(2n 1)!

(k 1)! Nk|T|2nk+1 22n2k+1G(2)k.

One computesB(2n, k)

B(2n, k 1)= NG(2)

4|T|(k 1)and, therefore, for a fixed n obeying nN G(2)/[4|T|], B(2n, k) is nondecreasing with k.Whence,

E(Tr(H2n0 )) nB(2n, n) =n 2n!n! G(2)n|T|n+1Nn. (4.19)The ration (2n)!/n! can be simplified using the classical Stirling approximation:

2 nn+1/2 en+1/(12n+1) < n!

Documents

Exact Recovery