Suprema of Chaos Processes and the Restricted Isometry ...rauhut/files/ChaosRIP.pdfSuprema of Chaos Processes and the Restricted Isometry Property Felix Krahmer, Shahar Mendelsony,

Suprema of Chaos Processes and the Restricted Isometry

Property

Felix Krahmer∗, Shahar Mendelson†, and Holger Rauhut‡

June 30, 2012; revised June 20, 2013

Abstract

We present a new bound for suprema of a special type of chaos processes indexed bya set of matrices, which is based on a chaining method. As applications we show signifi-cantly improved estimates for the restricted isometry constants of partial random circulantmatrices and time-frequency structured random matrices. In both cases the required con-dition on the number m of rows in terms of the sparsity s and the vector length n ism & s log2 s log2 n.

Key words. Compressive sensing, restricted isometry property, structured random ma-trices, chaos processes, γ2-functionals, generic chaining, partial random circulant matrices,random Gabor synthesis matrices.

1 Introduction and Main Results

1.1 Compressive Sensing

Compressive sensing [7, 13] is a method aimed at recovering sparse vectors from highlyincomplete information using efficient algorithms, see [18, 37] for expository articles. Thisdiscovery has recently triggered various applications in signal and image processing.

To formulate the procedure, a vector x ∈ Cn is called s-sparse if ‖x‖0 := |` : x` 6= 0| ≤s. Given a matrix Φ ∈ Cm×n, called the measurement matrix, the task is to reconstruct xfrom the linear measurements

y = Φx.

We are interested in the case m n, so that this system is under-determined, and thus,without additional information it is impossible to reconstruct x. On the other hand, if it isknown a priori that x is s-sparse then the situation changes. Although the naive approachfor reconstruction, namely, `0-minimization,

min ‖z‖0 subject to Φz = y

∗Institute for Numerical and Applied Mathematics, University of Gottingen, Lotzestraße 16-18, 37083Gottingen, Germany, [email protected]†Department of Mathematics, Technion, Haifa, 32000, Israel, [email protected]‡Hausdorff Center for Mathematics and Institute for Numerical Simulation, University of Bonn, Endenicher

Allee 60, 53115 Bonn, Germany, [email protected]

1

is NP-hard in general, there are several tractable alternatives. A widely applied method is`1-minimization [11, 13, 7]

minz‖z‖1 subject to Φz = y,

(where ‖z‖p denotes the usual `p-norm) which is a convex optimization problem and maybe solved efficiently.

The restricted isometry property streamlines the analysis of recovery algorithms. For amatrix Φ ∈ Cm×n and s < n, the restricted isometry constant δs is defined as the smallestnumber such that

(1− δs)‖x‖22 ≤ ‖Φx‖22 ≤ (1 + δs)‖x‖22 for all s-sparse x.

One may show that under conditions of the form δκs ≤ δ∗ for some δ∗ < 1 and someappropriate small integer κ, a variety of recovery algorithms reconstruct every s-sparsex from y = Φx. Among these are `1-minimization as mentioned above [8, 19, 31], or-thogonal matching pursuit [48], CoSaMP [45, 19], iterative hard thresholding [4] and hardthresholding pursuit [20].

Remarkably, all optimal measurement matrices known so far are random matrices. Forexample, a Bernoulli random matrix Φ ∈ Rm×n has entries Φjk = εjk/

√m, where the

εjk are independent, symmetric −1, 1-valued random variables. Its restricted isometryconstant satisfies δs ≤ δ with probability at least 1− η provided that

m ≥ Cδ−2(s ln(en/s) + ln(η−1)),

where C is an absolute constant [9, 30, 3], see also Appendix A.3.In practice, structure is an additional requirement on the measurement matrix Φ. In-

deed, certain applications impose constraints on the matrix and recovery algorithms canbe accelerated when fast matrix vector multiplication routines are available for Φ. Unfor-tunately, a Bernoulli random matrix does not possess any structure. This motivates thestudy of random matrices with more structure. Also, structured random matrix construc-tions usually involve a reduced degree of randomness. For example, partial random Fouriermatrices Φ ∈ Cm×n arise as random row submatrices of the discrete Fourier matrix andtheir restricted isometry constants satisfy δs ≤ δ with high probability provided that

m ≥ Cδ−2s log3 s log n,

see [9, 41].This article provides a similar estimate for two further types of structured random

matrices, namely partial random circulant matrices and time-frequency structured randommatrices. The key proof ingredients will be new estimates for suprema of chaos processesof a certain type.

1.2 Partial random circulant matrices

Circulant matrices are connected to circular convolution, defined for two vectors x, z ∈ Cnby

(z ∗ x)j :=

n∑k=1

zjkxk, j = 1, . . . , n,

where jk = j−k mod n is the cyclic subtraction. The circulant matrixH = Hz ∈ Cn×nassociated with z is given by Hx = z ∗ x and has entries Hjk = zjk.

We are interested in sparse recovery from subsampled convolutions with a randomvector. Formally, let Ω ⊂ 1, . . . , n be an arbitrary (fixed) set of cardinality m, and

2

denote by RΩ : Cn → Cm the operator that restricts a vector x ∈ Cn to its entriesin Ω. Let ε = (εi)

ni=1 be a Rademacher vector of length n, i.e., a random vector with

independent entries distributed according to P(εi = ±1) = 12 . Then the associated partial

random circulant matrix is given by Φ = m−1/2RΩHε ∈ Rm×n and acts on vectors x ∈ Cnvia

Φx =1√mRΩ(ε ∗ x).

In other words, Φ is a circulant matrix generated by a Rademacher vector, where the rowsoutside Ω are removed. Our first main result establishes the restricted isometry propertyof Φ in a near-optimal parameter regime:

Theorem 1.1 Let Φ ∈ Rm×n be a draw of a partial random circulant matrix generated bya Rademacher vector ε. If

m ≥ cδ−2s (log2 s)(log2 n), (1.1)

then with probability at least 1− n−(logn)(log2 s), the restricted isometry constant of Φ sat-isfies δs ≤ δ. The constant c > 0 is universal.

In Section 4, we will prove a more general version of this theorem, just requiring thatthe generating random variable is mean-zero, variance one, and subgaussian. These resultsimprove the best previously known estimates for a partial random circulant matrix [38],namely that m ≥ Cδ(s log n)3/2 is a sufficient condition for achieving δs ≤ δ with highprobability (see also [22] for an earlier work on this problem). In particular, Theorem 1.1removes the exponent 3/2 of the sparsity s, which was already conjectured in [38] to be anartefact of the proof.

A related non-uniform recovery result is contained in [36, 37] where one considers theprobability that a fixed s-sparse vector is reconstructed via `1-minimization using a draw ofa partial random circulant matrix. The condition derived there is m ≥ Cs log2 n, which isslightly better than (1.1). However, the statement of Theorem 1.1 is considerably strongerbecause the restricted isometry property implies uniform and stable recovery of all s-sparsevectors via `1-minimization and other recovery methods for a single matrix Φ.

Note that in [39], the restricted isometry property has been established for partialrandom circulant matrices with random sampling sets and random generators under thecondition m ≥ Cs log6 n. In contrast, our result holds for an arbitrary fixed selection of aset Ω ⊂ 1, . . . , n, which is important in applications since in many practical problems, itis natural or desired to consider structured sampling sets such as Ω = L, 2L, 3L, . . . ,mLfor some L ∈ N; these sets are clearly far from being random.

Potential applications of compressive sensing with subsampled random convolutionsinclude system identification, radar and cameras with coded aperature. We refer to [22,39, 38] for a discussion on these applications.

Combining our result with the work [26] on the relation between the restricted isometryproperty and the Johnson-Lindenstrauss lemma we also obtain an improved estimate forJohnson-Lindenstrauss embeddings arising from partial random circulant matrices, see also[24, 47] for earlier work in this direction.

Theorem 1.2 Fix η, δ ∈ (0, 1), and consider a finite set E ⊂ Rn of cardinality |E| = p.Choose

m ≥ C1δ−2 log(C2p)(log log(C2p))

2(log n)2,

where the constants C1, C2 depend only on η. Let Φ ∈ Cm×n be a partial circulant matrixgenerated by a Rademacher vector ε. Furthermore, let ε′ ∈ Rn be a Rademacher vector in-dependent of ε and set Dε′ to be the diagonal matrix with diagonal ε′. Then with probabilityexceeding 1− η, for every x ∈ E,

(1− δ)‖x‖22 ≤ ‖ΦDε′x‖22 ≤ (1 + δ)‖x‖22.

3

1.3 Time-frequency structured random matrices

The translation and modulation operators on Cm are defined by (Th)j = hj1 and(Mh)j = e2πij/mhj = ωjhj , where ω = e2πi/m and again denotes cyclic subtraction,this time modulo m. Observe that

(T kh)j = hjk and (M `h)j = e2πi`j/nhj = ω`jhj . (1.2)

The time-frequency shifts are given by

π(λ) = M `T k, λ = (k, `) ∈ Z2m = 0, . . . ,m− 12.

For h ∈ Cm \ 0 the system π(λ)h : λ ∈ Z2m, is called a Gabor system [16, 21, 25], and

the m×m2 matrix Ψh whose columns are the vectors π(λ)h for λ ∈ Z2m is called a Gabor

synthesis matrix,

Ψh =[π(λ)h

]λ∈Z2

m∈ Cm×m

2

.

Note that here the signal length n is coupled to the embedding dimension m via n = m2

(so that log n = 2 logm below).Our second main result establishes the restricted isometry property for Gabor synthe-

sis matrices generated by a random vector. The following formulation again focuses onnormalized Rademacher vectors, postponing a more general version of our results untilSection 5.

Theorem 1.3 Let ε be a Rademacher vector and consider the Gabor synthesis matrixΨh ∈ Cm×m2

generated by h = 1√mε. If

m ≥ cδ−2s (log s)2(logm)2, (1.3)

then with probability at least 1 −m−(logm)·(log2 s), the restricted isometry constant of Ψh

satisfies δs ≤ δ.

Again, Theorem 1.3 improves the best previously known estimate from [34], in which thesufficient condition of m ≥ Cs3/2 log3m was derived. In particular, it implies the firstuniform sparse recovery result with a linear scaling of the number of samples m in thesparsity s (up to log-factors).

A non-uniform recovery result for Gabor synthesis matrices with Steinhaus generator(see Section 2 for the definition) appears in [32], where it was shown that a fixed s-sparsevector is recovered from its image under a random draw of the m ×m2 Gabor synthesismatrix via `1-minimization with high probability provided that m ≥ Cs logm. Again, theconclusion of Theorem 1.3 is stronger than this previous result in the sense that it impliesuniform and stable s-sparse recovery. Further related material may be found in [33, 2].

Applications of random Gabor synthesis matrices include operator identification (chan-nel estimation in wireless communications), radar and sonar [2, 23, 33].

1.4 Suprema of chaos processes

Both for partial random circulant matrices and for time-frequency structured random ma-trices generated by Rademacher vectors, the restricted isometry constants δs can be writtenas a random variable X of the form

X = supA∈A

∣∣‖Aε‖22 − E‖Aε‖22∣∣ , (1.4)

where A is a set of matrices and ε is a Rademacher vector. Due to the identity (1.7) below,X is the supremum of a chaos process.

4

Our third main result – the main ingredient of the proofs of Theorems 1.1 and 1.3,but also of independent interest – provides expectation and deviation bounds for randomvectors X of this form in terms of two types of complexity parameters of the set of matricesA. The first one, denoted by dF (A) and d2→2(A), is the radius of A in the Frobenius norm‖A‖F =

√tr(A∗A) and the operator norm ‖A‖2→2 = sup‖x‖2≤1 ‖Ax‖2, respectively.

That is, dF (A) = supA∈A

‖A‖F and d2→2(A) = supA∈A

‖A‖2→2. For the second one, Talagrand’s

functional γ2(A, ‖ · ‖2→2), we refer to Definition 2.1 below for a precise description.With these notions, our result reads as follows.

Theorem 1.4 Let A ⊂ Cm×n be a symmetric set of matrices, A = −A. Let ε be aRademacher vector of length n. Then

E supA∈A

∣∣‖Aε‖22 − E‖Aε‖22∣∣ ≤ C1

(dF (A)γ2(A, ‖ · ‖2→2) + γ2(A, ‖ · ‖2→2)2

)=: C1E. (1.5)

Furthermore, for t > 0,

P(

supA∈A

∣∣‖Aε‖22 − E‖Aε‖22∣∣ ≥ C2E + t

)≤ 2 exp

(−C3 min

t2

V 2,t

U

), (1.6)

whereV = d2→2 (A)(γ2(A, ‖ · ‖2→2) + dF (A)) and U = d2

2→2(A).

The constants C1, C2, C3 > 0 are universal.

The symmetry assumption A = −A was made for the sake of simplicity. The more generalTheorem 3.1 below does not use this assumption but requires an additional term on theright hand side of the estimate. Furthermore, Theorem 3.1 will actually be stated undermore general conditions on the generating random vector.

Let us relate our new bound to previous estimates. By expanding the `2-norms we canrewrite X in (1.4) as

X = supA∈A

∣∣∣∣∣∣∑j 6=k

εjεk(A∗A)j,k

∣∣∣∣∣∣ , (1.7)

which is a homogeneous chaos processes of order 2 indexed by the positive semidefinitematrices A∗A. Talagrand [44] considers general homogeneous chaos process of the form

Y = supB∈B

∣∣∣∣∣∣∑j 6=k

εjεkBj,k

∣∣∣∣∣∣ ,where B ⊂ Cn×n is a set of (not necessarily positive semidefinite) matrices. He derives thebound

EY ≤ C1γ2(B, ‖ · ‖F ) + C2γ1(B, ‖ · ‖2→2) (1.8)

(see Section 2 for the definition of the γα-functional). This estimate was an essentialcomponent in the proofs of the previous bounds for the restricted isometry constants ofpartial random circulant matrices [38] and of random Gabor synthesis matrices [34]. Infact, the appearance of the γ1-functional leads to the non-optimal exponent 3/2 in thesparsity s in the estimate of the required number m of samples. In contrast, as our boundfor the chaos at hand does not involve the γ1-functional but only the γ2-functional, thisissue does not arise here.Remark. The benchmark problems of estimating the singular values and the restrictedisometry constant of a Bernoulli matrix (with independent ±1 entries) can also be recast asa supremum of chaos processes of the form (1.4). The bounds resulting from Theorem 1.4are then optimal up to a constant factor. We refer to Appendix A.3 for this derivation.Again, we are not aware of a way to deduce such bounds from (1.8).

5

Acknowledgements

H.R. and F.K. acknowledge support by the Hausdorff Center for Mathematics. H.R. isfunded by the European Research council by the Starting Grant StG 258926. Parts of thisresearch were developed during a stay of H.R. and F.K. at the Institute for Mathematicsand Its Applications, University of Minnesota. Both are grateful for the support and thestimulating research environment. S.M. acknowledges the support of the Centre for Math-ematics and its Applications, The Australian National University, Canberra, Australia.Additional support to S.M. was given by an Australian Research Council Discovery grantDP0986563, the European Community’s Seventh Framework Programme (FP7/2007-2013)under ERC grant agreement 203134, and the Israel Science Foundation grant 900/10.

2 Preliminaries

2.1 Chaining

The following definition is due to Talagrand [44] and forms the core of the generic chainingmethodology.

Definition 2.1 For a metric space (T, d), an admissible sequence of T is a collection ofsubsets of T , Tr : r ≥ 0, such that for every s ≥ 1, |Tr| ≤ 22r

and |T0| = 1. For β ≥ 1,define the γβ functional by

γβ(T, d) = inf supt∈T

∞∑r=0

2r/βd(t, Tr),

where the infimum is taken with respect to all admissible sequences of T .

Recall that for a metric space (T, d) and u > 0, the covering number N(T, d, u) is theminimal number of open balls of radius u in (T, d) needed to cover T . The γα-functionalscan be bounded in terms of such covering numbers by the well-known Dudley integral (see,e.g., [44]). A more specific formulation for the γ2-functional of a set of matrices A endowedwith the operator norm, the scenario which we will focus on in this article, is

γ2(A, ‖ · ‖2→2) ≤ c∫ d2→2(A)

0

√logN(A, ‖ · ‖2→2, u)du (2.1)

This type of entropy integral was introduced by Dudley [15] to bound the supremum ofGaussian processes, and was extended by Pisier [35] as a way of bounding processes thatsatisfy different decay properties.

When considered for a set T ⊂ L2, γ2 has close connections with properties of thecanonical Gaussian process indexed by T ; we refer the reader to [14, 44] for detailed expo-sitions on these connections. One can show that under mild measurability assumptions, ifGt : t ∈ T is a centered Gaussian process indexed by a set T , then

c1γ2(T, d) ≤ E supt∈T

Gt ≤ c2γ2(T, d), (2.2)

where c1 and c2 are absolute constants, and for every s, t ∈ T , d2(s, t) = E|Gs − Gt|2.The upper bound is due to Fernique [17] and the lower bound is Talagrand’s majorizingmeasures theorem [42, 44].

6

2.2 Subgaussian random vectors

In this section, we will discuss different classes of random vectors that are needed in theformulation of the main results in a more general framework. We refer to [46] for furtherbackground material. In the following definition, Sn−1 denotes the unit sphere in Rn (resp.in Cn).

Definition 2.2 A mean-zero random vector X on Cn is called isotropic if for everyθ ∈ Sn−1, E|

⟨X, θ

⟩|2 = 1. A random vector X is called L-subgaussian if it is isotropic

and P(|⟨X, θ

⟩| ≥ t) ≤ 2 exp(−t2/2L2) for every θ ∈ Sn−1 and any t > 0.

It is well known that, up to an absolute constant, the tail estimates in the definition ofa subgaussian random vector are equivalent to the moment characterization

supθ∈Sn−1

(E|⟨X, θ

⟩|p)1/p ≤ √pL. (2.3)

Assume that a random vector ξ has independent coordinates ξi, each of which is anL-subgaussian random variable of mean zero and variance one. One may verify by a directcomputation that ξ is L-subgaussian. Rademacher vectors, standard Gaussian vectors,(that is, random vectors with independent normally distributed entries of mean zero andvariance one), as well as Steinhaus vectors (that is, random vectors with independententries that are uniformly distributed on z ∈ C : |z| = 1), are examples of isotropic,L-subgaussian random vectors for an absolute constant L.

Our derivation of the probability bound (1.6) requires the following well-known esti-mate relating strong and weak moments. For convenience, a proof based on chaining andthe majorizing measures theorem is provided in the appendix. Note, however, that thenext theorem will not be required for the estimate (1.5) of the expectation, and in theRademacher case different techniques may be applied, see also Remark 3.8.

Theorem 2.3 Let x1, . . . ,xn ∈ CN and T ⊂ CN . If ξ is an isotropic, L-subgaussianrandom vector and Y =

∑nj=1 ξjxj, then for every p ≥ 1,(

E supt∈T|〈t,Y 〉|p

)1/p

≤ c(E supt∈T|〈t,G〉|+ sup

t∈T(E|〈t,Y 〉|p)1/p

), (2.4)

where c is a constant which depends only on L and G =∑Nj=1 gjxj for g1, . . . , gN inde-

pendent standard normal random variables.

Note that if ‖ · ‖ is some norm on CN and B∗ is the unit ball in the dual norm of ‖ · ‖then the above theorem implies that

(E‖Y ‖p)1/p ≤ c(E‖G‖+ sup

t∈B∗(E|〈t,Y 〉|p)1/p

).

In the remainder of this article, we will state and prove generalizations of our mainresults Theorem 1.1, Theorem 1.3, and Theorem 1.4 to arbitrary isotropic vectors, whosecoordinates are independent L-subgaussian random variables. Since a Rademacher vectorhas all these properties, the above formulations of our results will directly follow.

2.3 Further probabilistic tools

The following decoupling inequality is a slight variation of a result found for instance in[12], see also [6, 37].

7

Theorem 2.4 Let ξ = (ξ1, . . . , ξn) be a sequence of independent, centered random vari-ables, and let F be a convex function. If B is a collection of matrices and ξ′ is an indepen-dent copy of ξ, then

E supB∈B

F

n∑j,k=1j 6=k

ξjξkBj,k

≤ E supB∈B

F

4

n∑j,k=1

ξjξ′kBj,k

. (2.5)

In order to handle a certain diagonal term in the general subgaussian case, we requirealso a slightly stronger decoupling inequality which is valid in the Gaussian case and followsfrom specifying results from [1, Section 2] to an order 2 Gaussian chaos.

Theorem 2.5 There exists an absolute constant C such that the following holds for allp ≥ 1. Let g = (g1, . . . , gn) be a sequence of independent standard normal random variables.If B is a collection of Hermitian matrices and g′ is an independent copy of g, then

E supB∈B

∣∣ n∑j,k=1j 6=k

gjgkBj,k +

n∑j=1

(g2j − 1)Bj,j

∣∣p ≤ CpE supB∈B

∣∣ n∑j,k=1

gjg′kBj,k

∣∣p. (2.6)

Since some steps in our estimates are formulated in terms of moments, the transition to atail bound can be established by the following standard estimate, which is a straightforwardconsequence of Markov’s inequality.

Proposition 2.6 Suppose Z is a random variable satisfying

(E|Z|p)1/p ≤ α+ β√p+ γp for all p ≥ p0

for some α, β, γ, p0 > 0. Then, for u ≥ p0,

P(|Z| ≥ e(α+ β

√u+ γu)

)≤ e−u .

2.4 Notation

Absolute constants will be denoted by c1, c2, . . .; their value may change from line to line.We write A . B if there is an absolute constant c1 for which A ≤ c1B. A ∼ B meansthat c1A ≤ B ≤ c2A for absolute constants c1 and c2. If the constants depend on someparameter r we will write A .r B or A ∼r B.

The Lp-norm of a random variable, or its p-th moment, is given by ‖X‖Lp= (E|X|p)1/p

.For a random variable X independent from all other random variables which appear, wedenote the expectation and probability conditional on all variables except X by EX andPX , respectively. The canonical unit vectors in Cn are denoted by ej and Bn2 is the unit`2-ball in Cn.

Finally, we introduce shorthand notations for some quantities that we will study. Tothat end, let A be a set of matrices on Rn or on Cn and set a random vector ξ = (ξi)

ni=1.

For a given matrix A, denote its j-th column by Aj and set

NA(ξ) := supA∈A

‖Aξ‖2, BA(ξ) := supA∈A

∣∣∣∣∣∣∣∣n∑

j,k=1j 6=k

ξjξk〈Aj ,Ak〉

∣∣∣∣∣∣∣∣ ,DA(ξ) := sup

A∈A

∣∣∣∣∣∣n∑j=1

(|ξj |2 − E|ξj |2

)‖Aj‖22

∣∣∣∣∣∣ , and CA(ξ) := supA∈A

∣∣‖Aξ‖22 − E‖Aξ‖22∣∣ .

8

3 Chaos processes

We are now well-equipped to prove the following generalized version of Theorem 1.4.

Theorem 3.1 Let A be a set of matrices, and let ξ be a random vector whose entries ξjare independent, mean-zero, variance 1, and L-subgaussian random variables. Set

E = γ2(A, ‖ · ‖2→2) (γ2(A, ‖ · ‖2→2) + dF (A)) + dF (A)d2→2(A),

V = d2→2(A)(γ2(A, ‖ · ‖2→2) + dF (A)), and U = d22→2(A).

Then, for t > 0,

P (CA(ξ) ≥ c1E + t) ≤ 2 exp

(−c2 min

t2

V 2,t

U

).

The constants c1, c2 depend only on L.

The proof is based on estimating the moments of the random variables NA and CA,followed by applying Proposition 2.6. The first step is a bound on the moments of adecoupled version of NA.

Lemma 3.2 Let A be a set of matrices, let ξ = (ξi)ni=1 be an L-subgaussian random vector,

and let ξ′ be an independent copy of ξ. Then for every p ≥ 1,∥∥∥∥ supA∈A

⟨Aξ,Aξ′

⟩∥∥∥∥Lp

.L γ2(A, ‖ · ‖2→2)‖NA(ξ)‖Lp+ supA∈A

‖⟨Aξ,Aξ′

⟩‖Lp

.

Proof. The proof is based on a chaining argument. Since the space involved is finitedimensional, one may assume without loss of generality that A is finite. Let (Tr) be anadmissible sequence of A for which the minimum in the definition of γ2(A, ‖ · ‖2→2) isattained. Let πrA = argminB∈Tr

‖B −A‖2→2 and set ∆rA = πrA − πr−1A. Since A is

finite, there is some r0 for which |A| ≤ 22r0. Given p ≥ 1, let ` be the largest integer for

which 2` ≤ p, and we may assume that ` < r0 as the modifications needed when ` ≥ r0 areminimal.

Note that for every A ∈ A,

|⟨Aξ,Aξ′

⟩−⟨(πÀ)ξ, (πÀ)ξ′

⟩|

≤r0−1∑r=`

|⟨(∆r+1A) ξ, (πr+1A)ξ′

⟩|+

r0−1∑r=`

|⟨(πrA)ξ, (∆r+1A) ξ′

⟩| =: S1 + S2. (3.1)

Furthermore, conditionally on ξ′,⟨(∆r+1A)ξ, (πr+1A)ξ′

⟩=⟨ξ, (∆r+1A)∗(πrA)ξ′

⟩is a subgaussian random variable, as for every u > 0,

Pξ(|⟨ξ, (∆r+1A)∗(πr+1A)ξ′

⟩| ≥ uL‖(∆r+1A)∗(πr+1A)ξ′‖2

)≤ 2 exp(−u2/2). (3.2)

Recall that |πrA : A ∈ A| = |Tr| ≤ 22r

, so there are at most 22r+2

possible values that∆r(A)∗πr+1(A) can assume in (3.2). Therefore, via a union bound over all these choicesand on all the levels ` < r ≤ r0 (cf. (A.2) below), it is evident that there are constantsc1, c2 > 0 for which if t ≥ c1 then with ξ-probability at least 1− 2 exp(−c22`t2) one has forevery ` < r ≤ r0 and every A ∈ A

|⟨(∆r+1A)ξ, (πr+1A)ξ′

⟩| ≤ t2r/2‖(∆r+1A)∗(πr+1A)ξ′‖2. (3.3)

9

Let Et(ξ′) be the event for which (3.3) holds for all the possible choices of r and A asabove. Since

‖ (∆r+1A)∗

(πr+1A)ξ′‖2 ≤ ‖∆r+1A‖2→2 supA∈A

‖Aξ′‖2 = ‖∆r+1A‖2→2NA(ξ′),

one has on Et(ξ′)

S1 =

r0−1∑r=`

|⟨(∆r+1A) ξ, (πr+1A)ξ′

⟩| ≤

r0−1∑r=`

t2r/2‖∆r+1A‖2→2NA(ξ′)

≤tγ2(A, ‖ · ‖2→2)NA(ξ′).

We will now estimate

‖S1‖pLp= EξEξ′Sp1 = Eξ′

∫ ∞0

ptp−1Pξ(S1 > t|ξ′)dt. (3.4)

Setting W (ξ′) = γ2(A, ‖ · ‖2→2)NA(ξ′), observe that∫ ∞0

ptp−1Pξ(S1 > t|ξ′)dt ≤ cp3W (ξ′)p +

∫ ∞c3W (ξ′)

ptp−1Pξ(S1 > t|ξ′)dt

≤cp3W (ξ′)p +W (ξ′)p∫ ∞c3

pup−1Pξ(S1 > uW (ξ′)|ξ′)du ≤ cp4W (ξ′)p,

where c3 ≥ c1 and c4 are constants that depends only on L. Indeed, for u ≥ c1,

Pξ(S1 > uW (ξ′)|ξ′) ≤ Pξ(Ecu(ξ′)|ξ′) ≤ 2 exp(−c2u22`) ≤ 2 exp(−c2u2p/2).

Repeating this argument for S2 =∑r0−1r=` |

⟨(πrA)ξ, (∆r+1A) ξ′

⟩|, it follows that

‖S1 + S2‖Lp ≤ c5(L)γ2(A, ‖ · ‖2→2)‖NA(ξ)‖Lp . (3.5)

Finally, since |πÀ : A ∈ A| ≤ 22` ≤ exp(p), we conclude

E supA∈A

|⟨(πÀ)ξ, (πÀ)ξ′

⟩|p ≤

∑A∈T`

E|⟨Aξ, Aξ′

⟩|p

≤ exp(p) supA∈A

E|⟨Aξ,Aξ′

⟩|p,

and thus‖ supA∈A

|⟨(πÀ)ξ, (πÀ)ξ′

⟩| ‖Lp

≤ e supA∈A

‖⟨Aξ,Aξ′

⟩‖Lp

. (3.6)

Combining (3.1), (3.5), and (3.6) via the triangle inequality completes the proof.

With these preliminary results at hand, we can now proceed to establish moment boundsfor the quantities in questions. To illustrate the main idea, we start by considering the firstmoment in the Rademacher case.

Corollary 3.3 Let A ⊂ Cm×n be a set of matrices and let ε be a Rademacher vector oflength n. Then

E supA∈A

∣∣‖Aε‖22 − E‖Aε‖22∣∣ ≤ C1

(γ2(A, ‖ · ‖2→2)2 + dF (A)γ2(A, ‖ · ‖2→2) + dF (A)d

2→2(A)).

10

Proof. By the decoupling inequality of Theorem 2.4, we have

ECA(ε) = E supA∈A

∣∣‖Aε‖22 − E‖Aε‖22∣∣ = E sup

A∈A

∣∣∣∣∣∣∑j 6=k

εjεk(A∗A)j,k

∣∣∣∣∣∣≤ 4E sup

A∈A

∣∣∣∣∣∣∑j,k

εjε′k(A∗A)j,k

∣∣∣∣∣∣ . γ2(A, ‖ · ‖2→2

)ENA(ε) + supA∈A

E|〈Aε,Aε′〉|. (3.7)

Since

E|〈Aε,Aε′〉| ≤√

E|〈Aε,Aε′〉|2 =√

E‖A∗Aε‖22 = ‖A∗A‖F ≤ ‖A‖2→2‖A‖F

we have

supA∈A

E|〈Aε,Aε′〉| ≤ d2→2(A)dF (A) ≤ dF (A)2. (3.8)

We conclude that

(ENA(ε))2 ≤ ENA(ε)2 ≤ ECA(ε) + dF (A) . γ2(A, ‖ · ‖2→2

)ENA(ε) + dF (A)2,

so thatENA(ε) . γ2(A, ‖ · ‖

2→2) + dF (A).

Plugging into (3.7) and using the first inequality in (3.8) yields the claim.

Remark 3.4 Noting that the symmetry assumption A = −A ensures that d2→2(A) ≤γ2(A, ‖ · ‖2→2), Corollary 3.3 directly implies the expectation bound given in Theorem 1.4.The tail bound – and hence the remaining part of the proof of Theorem 1.4 – can be deducedusing the concentration inequality in [5, Theorem 17], see also [43]. Note that this proof doesnot use Theorem 2.3 above and hence does not rely on the majorizing measures theorem.The proof, however, is specific to the case of Rademacher random vectors.

Theorem 3.5 Let L ≥ 1 and ξ = (ξj)nj=1, where ξj, j = 1, . . . , n, are independent mean-

zero, variance one, L-subgaussian random variables, and let A be a class of matrices. Thenfor every p ≥ 1,

(a)‖NA(ξ)‖Lp

.L γ2(A, ‖ · ‖2→2) + dF (A) +√pd2→2(A),

(b)

‖CA(ξ)‖Lp.Lγ2(A, ‖ · ‖2→2) (γ2(A, ‖ · ‖2→2) + dF (A))

+√pd2→2(A) (γ2(A, ‖ · ‖2→2) + dF (A)) + pd2

2→2(A).

Two important steps of the proof are established by the following lemmas. The lemmasuse the same notation as Theorem 3.5. The first lemma bounds a decoupled variant of theabove quantities.

Lemma 3.6 If ξ′ is an independent copy of ξ, then

supA∈A

‖⟨Aξ,Aξ′

⟩‖Lp

.L√pdF (A)d2→2(A) + pd2

2→2(A). (3.9)

11

Proof. Fix A ∈ A and set S = A∗Ax : x ∈ Bn2 . Since the random vector ξ is L-subgaussian, the random variable

⟨ξ,A∗Aξ′

⟩is subgaussian conditionally on ξ′. Therefore,

by (2.3),

‖⟨Aξ,Aξ′

⟩‖Lp

=(Eξ′((

Eξ|⟨ξ,A∗Aξ′

⟩|p)1/p)p)1/p

. (Eξ′(L√p)p‖A∗Aξ′‖p2)1/p

= L√p

(Eξ′ sup

y∈S|〈y, ξ′〉|p

)1/p

.

We will estimate this quantity using Theorem 2.3. We will bound the two terms on theright hand side of (2.4) separately. For the first term, let g denote a standard Gaussianvector and estimate

E supy∈S|〈y, g〉| = E‖A∗Ag‖2 ≤

(E‖A∗Ag‖22

)1/2= ‖A∗A‖F ≤ ‖A‖F ‖A‖2→2.

For the second term, applying (2.3) again yields

supy∈S

(E|〈y, ξ′〉|p)1/p = supz∈Bn

2

(E|〈A∗Az, ξ′〉|p)1/p . L supz∈Bn

2

√p‖A∗Az‖2 = L

√p‖A‖22→2.

Hence the lemma follows by applying Theorem 2.3 and taking the supremum over A ∈ A.

The second lemma concerns first moments for the special case of a Gaussian vector.

Lemma 3.7 Let g denote a standard Gaussian vector. Then

ENA(g) .L γ2(A, ‖ · ‖2→2) + dF (A)

Proof. Observe that by Theorem 2.5 and Lemma 3.2,

‖CA(g) ‖Lp=

∥∥∥∥∥∥∥∥ supA∈A

∣∣∣∣∣∣∣∣∑j,kj 6=k

gjgk⟨Aj ,Ak

⟩+∑j

(g2j − 1)‖Aj‖22

∣∣∣∣∣∣∣∣∥∥∥∥∥∥∥∥Lp

.

∥∥∥∥∥∥ supA∈A

∣∣∣∣∣∣∑j,k

gjg′k

⟨Aj ,Ak

⟩∣∣∣∣∣∣∥∥∥∥∥∥Lp

=

∥∥∥∥ supA∈A

∣∣⟨Ag,Ag′⟩∣∣ ∥∥∥∥Lp

.L γ2(A, ‖ · ‖2→2) ‖NA(g)‖Lp+ supA∈A

∥∥|⟨Ag,Ag′⟩|∥∥Lp. (3.10)

Combining Lemma 3.6 with (3.10), it follows that

‖CA(g)‖Lp.L γ2(A, ‖ · ‖2→2) ‖NA(g)‖Lp

+√pdF (A)d2→2(A) + pd2

2→2(A). (3.11)

Specifying p = 1 and using that dF (A) ≥ d2→2(A) as well as E‖Ag‖22 = ‖A‖2F , we concludethat

EN2A(g) ≤ ECA(g) + d2

F (A) .L γ2(A, ‖ · ‖2→2)ENA(g) + d2F (A).

Therefore,ENA(g) ≤ (EN2

A(g))1/2 .L γ2(A, ‖ · ‖2→2) + dF (A),

as desired.

12

Proof of Theorem 3.5. We apply Theorem 2.3 with the set S = A∗x : x ∈ Bn2 , A ∈A. Since ξ is L-subgaussian we obtain

‖NA(ξ)‖Lp = (E supA∈A,x∈Bn

2

|⟨Aξ,x

⟩|p)1/p = (E sup

u∈S|⟨ξ,u

⟩|p)1/p

.L ENA(g) + supu∈S

(E|⟨ξ,u

⟩|p)1/p .L ENA(g) +

√p supA∈A,x∈Bn

2

‖A∗x‖2

.L ENA(g) +√pd2→2(A),

.L γ2(A, ‖ · ‖2→2) + dF (A) +√pd2→2(A),

which proves (a).

For (b), observe that as the ξj are unit variance, we have E‖Aξ‖22 = ‖A‖2F =n∑j=1

‖Aj‖22

and, consequently, CA(ξ) can be split up into the diagonal and the off-diagonal contribu-tions as follows.

CA(ξ) = supA∈A

∣∣∣∣∣∣∣∣n∑

j,k=1j 6=k

ξjξk〈Aj ,Ak〉+

n∑j=1

(|ξj |2 − 1)‖Aj‖2

∣∣∣∣∣∣∣∣ ≤ BA(ξ) +DA(ξ).

Hence it suffices to estimate the moments of BA(ξ) and DA(ξ); one concludes using thetriangle inequality.

For the off-diagonal term, we use Theorem 2.4 and Lemma 3.2 to bound

‖BA(ξ)‖Lp≤ 4

∥∥∥∥∥∥ supA∈A

∣∣∣∣∣∣∑j,k

ξjξ′k⟨Aj ,Ak

⟩∣∣∣∣∣∣∥∥∥∥∥∥Lp

= 4

∥∥∥∥ supA∈A

∣∣⟨Aξ,Aξ′⟩∣∣∥∥∥∥Lp

.L γ2(A, ‖ · ‖2→2)‖NA(ξ)‖Lp+ supA∈A

‖⟨Aξ,Aξ′

⟩‖Lp

.

Combining this estimate with Lemma 3.6 and part (a), we obtain that

‖BA(ξ)‖Lp.Lγ2(A, ‖ · ‖2→2) (γ2(A, ‖ · ‖2→2) + dF (A) +

√pd2→2(A))

+√pdF (A)d2→2(A) + pd2

2→2(A). (3.12)

For the diagonal term, observe that by a standard symmetrization argument (see, e.g.,[27, Lemma 6.3]),

‖DA(ξ)‖Lp=

∥∥∥∥∥∥ supA∈A

∣∣∣∣∣∣∑j=1

(|ξj |2 − E|ξj |2)‖Aj‖22

∣∣∣∣∣∣∥∥∥∥∥∥Lp

≤ 2

∥∥∥∥∥∥ supA∈A

∣∣∣∣∣∣∑j

εj |ξj |2‖Aj‖22

∣∣∣∣∣∣∥∥∥∥∥∥Lp

where ε = (ε1, . . . , εn) is a Rademacher vector independent of ξ. Furthermore, let g =(g1, . . . , gn) be a sequence of independent standard normal variables. Then, as ξj is L-subgaussian, there is an absolute constant c for which P(|ξj |2 ≥ tL2) ≤ cP(g2

j ≥ t) for

every t > 0. Moreover, εj |ξj |2 and εjg2j are symmetric, so by the contraction principle (see

13

[27, Lemma 4.6]), a rescaling argument, and de-symmetrization [27, Lemma 6.3],

‖DA(ξ) ‖Lp.L

∥∥∥∥∥∥ supA∈A

|∑j

εjg2j ‖Aj‖22|

∥∥∥∥∥∥Lp

≤ 2

∥∥∥∥∥∥ supA∈A

|∑j

(|gj |2 − 1)‖Aj‖22|

∥∥∥∥∥∥Lp

+

∥∥∥∥∥∥ supA∈A

|∑j

εj‖Aj‖22|

∥∥∥∥∥∥Lp

= 2 ‖DA(g)‖Lp+

∥∥∥∥∥∥ supA∈A

|∑j

εj‖Aj‖22|

∥∥∥∥∥∥Lp

.

Now observe that DA(g) ≤ CA(g) +BA(g), and thus, by (3.11) and (3.12),

‖DA(g) ‖Lp≤‖CA(g) ‖Lp

+ ‖BA(g) ‖Lp

.γ2(A, ‖ · ‖2→2) (γ2(A, ‖ · ‖2→2) + dF (A))

+√pd2→2(A)(dF (A) + γ2(A, ‖ · ‖2→2)) + pd2

2→2(A).

Finally, note that A→∑j εj‖Aj‖22 is a subgaussian process relative to the metric

d(A,B) =

n∑j=1

(‖Aj‖22 − ‖Bj‖22)2

1/2

≤

n∑j=1

‖Aj −Bj‖22 · (‖Aj‖2 + ‖Bj‖2)2

1/2

≤ 2dF (A)‖A−B‖2→2. (3.13)

Therefore, by Theorem 2.3 and a standard chaining argument,∥∥∥∥∥∥ supA∈A

|∑j

εj‖Aj‖22|

∥∥∥∥∥∥Lp

. dF (A)γ2(A, ‖ · ‖2→2) +√p dF (A)d2→2(A).

Here the second term corresponds to the second term in (2.4), which is bounded using (2.3)and (3.13). This shows that

‖DA(ξ)‖Lp.Lγ2(A, ‖ · ‖2→2) (γ2(A, ‖ · ‖2→2) + dF (A)) .

+√pd2→2(A) (γ2(A, ‖ · ‖2→2) + dF (A)) + pd2

2→2(A),

which, together with (3.12), proves (b).

Remark 3.8 (a) Observe that Theorem 3.1 can be deduced from Theorem 3.5 usingProposition 2.6.

(b) In the Rademacher case, one has DA ≡ 0, so the contraction principle and the moresophisticated decoupling inequality for Gaussian random variables are not needed inthe proof.

(c) Note that the assumption that ξ has independent coordinates has only been used inthe decoupling steps of the proof of Theorem 3.5.

14

4 The Restricted Isometry Property of Partial Ran-dom Circulant Matrices

In this section we study the restricted isometry constants of a partial random circulantmatrix Φ ∈ Rm×n generated by a random vector ξ = (ξi)

ni=1, where the ξi’s are independent

mean-zero, L-subgaussian random variables of variance one. Arguably the most importantcase is when ξ = ε is a Rademacher vector, as introduced in Section 1.2.

Throughout this section and following the notation of the introduction, let Vxz =1√mPΩ(x ∗ z), where the projection operator PΩ : Cn → Cn is given by PΩ = R∗ΩRΩ, that

is, (PΩx)` = x` for ` ∈ Ω and (PΩx)` = 0 for ` /∈ Ω.Setting Ds,n = x ∈ Cn : ‖x‖2 ≤ 1, ‖x‖0 ≤ s, the restricted isometry constant of Φ is

δs = supx∈Ds,n

∣∣∣∣ 1

m‖RΩ(ξ ∗ x)‖22 − ‖x‖22

∣∣∣∣ = supx∈Ds,n

∣∣∣∣ 1

m‖PΩ(x ∗ ξ)‖22 − ‖x‖22

∣∣∣∣= supx∈Ds,n

|‖Vxξ‖22 − ‖x‖22|.

Since |Ω| = m, it follows that

E‖Vxξ‖22 =1

m

∑`∈Ω

En∑

k,j=1

ξjξkx`jx`k =1

m

∑`∈Ω

n∑k=1

|x`k|2 = ‖x‖22

and henceδs = sup

x∈Ds,n

∣∣‖Vxξ‖22 − E‖Vxξ‖22∣∣ ,

which shows that δs is the process CA studied in the previous section for A = Vx : x ∈Ds,n; thus the tail decay can be analyzed using Theorem 3.1.

Theorem 4.1 Let ξ = (ξj)nj=1 be a random vector with independent, mean-zero, variance

one, L-subgaussian entries. If, for s ≤ n and η, δ ∈ (0, 1),

m ≥ cδ−2smax(log s)2(log n)2, log(η−1), (4.1)

then with probability at least 1 − η, the restricted isometry constant of the partial randomcirculant matrix Φ ∈ Rm×n generated by ξ satisfies δs ≤ δ. The constant c > 0 dependsonly on L.

The proof of Theorem 4.1 requires a Fourier domain description of Φ. Let F be theunnormalized Fourier transform with elements Fjk = e2πijk/n. By the convolution theorem,for every 1 ≤ j ≤ n, F (x ∗ y)j = (Fx)j · (Fy)j . Therefore,

Vxξ =1√mPΩF

−1XFξ,

where X = diag(Fx) is the diagonal matrix, whose diagonal is the Fourier transform Fx.In short,

Vx =1√mPΩXF ,

where PΩ = PΩF−1.

Proof of Theorem 4.1. In light of Theorem 3.5 and Theorem 3.1, it suffices to controlthe parameters d2→2(A), dF (A), and γ2(A, ‖ · ‖2→2) for the set A = Vx : x ∈ Ds,n.

15

Since the matrices Vx consist of shifted copies of x in all of their m nonzero rows, the`2-norm of each nonzero row is m−1/2‖x‖2; thus ‖Vx‖F = ‖x‖2 ≤ 1 for all x ∈ Ds,n and

dF (A) = 1.

Also, observe that for every x ∈ Ds,n with the associated diagonal matrix X,

‖Vx‖2→2 =1√m‖PΩXF ‖2→2 ≤

√n

m‖PΩF

−1‖2→2‖X‖2→2 ≤1√m‖X‖2→2

=1√m‖Fx‖∞. (4.2)

Setting ‖x‖∞ := ‖Fx‖∞ it is evident that ‖Fx‖∞ ≤ ‖x‖1 ≤√s‖x‖2 ≤

√s for every

x ∈ Ds,n, and thus

d2→2(A) ≤√s/m.

Next, to estimate the γ2 functional, recall from (2.1) that

γ2(A, ‖ · ‖2→2) .∫ d2→2(A)

0

log1/2N(A, ‖ · ‖2→2, u)du.

By (4.2),‖Vx − Vy‖2→2 = ‖Vx−y‖2→2 ≤ m−1/2‖x− y‖∞,

and hence for every u > 0, N(A, ‖ · ‖2→2, u) ≤ N(Ds,n,m−1/2‖ · ‖∞, u).

To estimate this covering number, let us recall the following simple modification of theMaurey Lemma, essentially due to Carl [10] (see also [41] or [37, Lemma 8.3] for refinedestimates specific to Ds,n). For convenience, a proof is provided in the appendix. Below,for a set U in a vector space, conv(U) denotes its convex hull.

Lemma 4.2 There exists an absolute constant c for which the following holds. Let Xbe a normed space, consider a finite set U ⊂ X of cardinality N , and assume that forevery L ∈ N and (u1, . . . ,uL) ∈ UL, Eε‖

∑Lj=1 εjuj‖X ≤ A

√L, where (εj)

Lj=1 denotes a

Rademacher vector. Then for every u > 0,

logN(conv(U), ‖ · ‖X , u) ≤ c(A/u)2 logN.

We set ∆ =√s/m and apply the lemma for the set

U = ±√

2e1, . . . ,±√

2en,±√

2ie1, . . . ,±√

2ien,

where the ei are the standard basis vectors. Noting that Bn1 ⊂ conv(U) and that we maychoose A = c

√log(n), we obtain in this case

logN(Ds,n,m−1/2‖ · ‖∞, u) ≤ logN(s1/2Bn1 ,m

−1/2‖ · ‖∞, u)

.

(∆

u

)2

log2(n).

Since Ds,n is the union of s-dimensional Euclidean balls, a standard volumetric argument(see, e.g., [41] or [37, Chapter 8.4]) yields

logN(Ds,n,m−1/2‖ · ‖∞, u) . s log(en/su)

(which is stronger than the bound above for u . 1/√m).

16

Combining the two covering number estimates, a straightforward computation of theentropy integral, (see also [41] or [37, eq. (8.15)]), reveals that

γ2(A, ‖ · ‖2→2) .

√s

m(log s)(log n),

which implies that γ2(A, ‖ · ‖2→2) . δ for the given choice of m.Now, by choosing the constant c in (4.1) appropriately (depending only on L), one

obtains

E ≤ δ

2c1,

where E and c1 are chosen as in Theorem 3.1. Then Theorem 3.1 yields

P(δs ≥ δ) ≤ P (δs ≥ c1E + δ/2) ≤ exp(−c2(m/s)δ2) ≤ η,

which, after possibly increasing the value of c enough to compensate c2, completes theproof.

5 Time-Frequency Structured Random Matrices

In this section, we will treat the restricted isometry property of random Gabor synthesismatrices, as described in Section 1.3.

Theorem 5.1 Let ξ = (ξj)mj=1 be a random vector with independent mean-zero, variance

one, L-subgaussian entries. Let Ψh ∈ Cm×m2

be the random Gabor synthesis matrix gen-erated by h = 1√

mξ. If, for s ∈ N and δ, η ∈ (0, 1),

m ≥ cδ−2smax(log2 s)(log2m), log(η−1)

then with probability at least 1 − η the restricted isometry constant of Ψh satisfies thatδs ≤ δ. The constant c > 0 depends only on L.

Before presenting the proof we will need several observations. First, note that for x ∈ Cm,Ψhx = Vxξ, where the m×m matrix Vx is given by

Vx =1√m

∑λ∈Z2

m

xλπ(λ).

It is straightforward to check that 1√mπ(λ) : λ ∈ Z2

m is an orthonormal system in the

space of complex m×m matrices endowed with the Frobenius norm. Therefore,

E‖Vxξ‖22 = ‖Vx‖2F = ‖m−1/2∑λ∈Z2

m

xλπ(λ)‖2F = ‖x‖22.

Hence, if Ds,n = x ∈ Cm2

: ‖x‖2 ≤ 1, ‖x‖0 ≤ s and A = Vx : x ∈ Ds,n, the restrictedisometry constant is

δs = supx∈Ds,n

∣∣‖Ψhx‖22 − ‖x‖22∣∣ = sup

Vx∈A

∣∣‖Vxξ‖22 − E‖Vxξ‖22∣∣ .

Thus Theorem 3.1 applies again and it suffices to estimate the associated Dudley integral.Note that, as π(λ) is unitary, one has for x ∈ Ds,n

‖Vx‖2→2 ≤1√m

∑λ∈Z2

m

|xλ| ‖π(λ)‖2→2 ≤ ‖x‖1/√m ≤

√s/m ‖x‖2, (5.1)

so the upper limit in the Dudley integral is d2→2(A) ≤√

sm .

17

Lemma 5.2 There exists an absolute constant c such that for every 0 < u ≤√

sm ,

logN(A, ‖ · ‖2→2, u) ≤ cs(

log(em2/s) + log(3√s/m/u)

),

and

logN(A, ‖ · ‖2→2, u) ≤ cs log2m

mu2.

Proof. Define the norm ‖ · ‖ on Rn by ‖x‖ = ‖Vx‖2→2, fix S ⊂ Z2m of cardinality s and

put BS = x ∈ Cm2

: ‖x‖2 ≤ 1, supp(x) ⊂ S. Then, by (5.1) and a volumetric estimate,

N(BS , ‖ · ‖, u) ≤ N(BS ,√s/m‖ · ‖2, u) ≤

(1 + 2

√s/m

u

)2s

≤

(3

√s/m

u

)2s

,

where the last step uses that u ≤√s/m. Since there are at most

(m2

s

)≤ (em2/s)s such

subsets S of Z2m, the first part of the claim follows.

To prove the second part, note that Ds,n ⊂√

2s(conv(eλ, ieλ,−eλ,−ieλ)λ∈Z2

m

). Con-

sider (uj)Lj=1 selected from the extreme points (with possible repetitions). Then, by the

non-commutative Khintchine inequality, due to Lust-Piquard, [28, 29, 40],

Eε‖L∑j=1

εjVuj‖2→2 .√

logm max

‖L∑j=1

VujV ∗uj‖2→2, ‖

L∑j=1

V ∗ujVuj‖2→2

1/2

.

Recall that for every such uj , Vuj= απ(λ) with |α| =

√2s/m. Therefore, V ∗uj

Vuj=

VujV ∗uj

= (2s/m)I and thus

Eε‖L∑j=1

εjVuj‖2→2 .

√logm

√s/m√L.

Applying Lemma 4.2 for A ∼√s/m√

logm, it follows that

logN(Ds,n, ‖ · ‖, u) . (A/u)2 log(m2) .s log2m

mu2.

Proof of Theorem 5.1. The proof follows an identical path to that of Theorem 4.1.First, as was noted above, dF (A) ≤ 1 and d2→2(A) ≤

√s/m. Also, using the bound (2.1)

by the Dudley type integral and by a direct application of Lemma 5.2,

γ2(A, ‖ · ‖2→2) .∫ d2→2(A)

0

√logN(A, ‖ · ‖, u)du .

√s/m(log s)(logm).

Here we used the first bound of Lemma 5.2 for u . m−1/2, the second bound for u & m−1/2.The claim is now a direct application of Theorem 3.1.

Remark 5.3 The only properties of the system π(λ) : λ ∈ Z2m that have been used in

the proof are the facts that all π(λ) are unitary and that m−1/2π(λ) : λ ∈ Z2m is an

orthonormal system with respect to the Frobenius inner product. Therefore, Theorem 5.1also holds true for general systems of operators with these two properties.

18

A Appendix

A.1 Proof of Theorem 2.3

Without loss of generality assume that T is finite. Fix an optimal admissible sequence (Tr)of T . For t ∈ T , let πr(t) ∈ Tr be an element in Tr with the smallest `2-distance to t, andchoose ` for which 2`−1 ≤ 2p ≤ 2`. Since one may assume that πr(t) = t for a sufficientlylarge r, one has

supt∈T|〈t, Y 〉| ≤ sup

t∈T|〈π`(t), Y 〉|+ sup

t∈T

∞∑r=`

|〈πr+1(t)− πr(t), Y 〉|. (A.1)

The p-th moment of the first term satisfies(E supt∈T|〈π`(t), Y 〉|p

)1/p

≤

(E∑t∈T`

|〈t, Y 〉|p)1/p

≤ (|T`|)1/p supt∈T`

(E|〈t, Y 〉|p)1/p

≤ (22`

)1/p supt∈T

(E|〈t, Y 〉|p)1/p ≤ 16 supt∈T

(E|〈t, Y 〉|p)1/p,

where the last inequality follows from the choice of p.Since ξ is an L-subgaussian vector, one obtains for the second term in (A.1)

P

(supt∈T

∞∑r=`

|〈πr+1(t)− πr(t), Y 〉| ≥ uL∞∑r=`

2r/2‖(〈πr+1(t)− πr(t),xj〉)nj=1‖2

)

≤∞∑r=`

∑t∈Tr+1

∑t′∈Tr

P

| m∑j=1

ξj〈t− t′,xj〉| ≥ uL2r/2‖〈t− t′,xj〉nj=1‖2

≤∞∑r=`

22r+1

· 22r

exp(−2ru2/2) ≤ 2 exp(−2ù2/4) ≤ 2 exp(−pu2/2), (A.2)

when u ≥ c for an appropriate choice of c (independent of `). Therefore, by integration,(E supt∈T

∞∑r=`

|〈πr+1(t)− πr(t), Y 〉|p)1/p

.L

∞∑r=`

2r/2‖(〈πr+1(t)− πr(t),xj〉)mj=1‖2

.L γ2(T ′, ‖ · ‖2)

where T ′ = (〈t,xj〉)nj=1 : t ∈ T. By the majorizing measures theorem,

γ2(T ′, ‖ · ‖2) . E supz∈T ′

|m∑j=1

zjgj | = E supt∈T|m∑j=1

gj〈t,xj〉| = E supt∈T|〈t, G〉|,

which yields the claim.

A.2 Proof of Lemma 4.2

If x ∈ conv(U) then x =∑Nj=1 θjuj with θj ≥ 0,

∑Nj=1 θj = 1. Let Z ∈ X be a random

vector which takes the value uj with probability θj for j = 1, . . . , N and thus satisfiesEZ = x. Let L be a number to be determined later, set Z1, . . . ,ZL be independent copiesof Z, and put

Y =1

L

L∑j=1

Z`.

19

If (εj)mj=1 is a Rademacher sequence independent of (Zj) then by a standard symmetrization

argument (see, e.g., [27]) and because (Zj)Lj=1 ranges over UL

E‖x− Y ‖X =1

LE‖

L∑j=1

(x−Zj)‖X ≤2

LE‖

L∑j=1

εjZj‖X ≤ 2A/√L.

Thus, for L ∼ (A/u)2 there exists a realization y of Y of the form y = 1L

∑Lj=1 z`, for some

z` ∈ U , of Y for which‖x− y‖X ≤ u.

As this argument applies for any x ∈ conv(U), any such x can be approximated by somey of this form. Since y can assume at most NL different values, this yields

logN(conv(U), ‖ · ‖X , u) ≤ L logN ≤ c(A/u)2 logN,

as claimed.

A.3 Restricted Isometry Property of Subgaussian Random Matri-ces

We finally show that our bound for chaos processes yields an alternative proof of the well-known fact that subgaussian matrices (including Bernoulli and Gaussian matrices) satisfythe restricted isometry property.

An m× n subgaussian matrix Φ takes the form

Φjk =1√mξjk, j = 1, . . . ,m, k = 1, . . . , n,

where the ξjk are independent, mean-zero, variance one, L-subgaussian random variables.

Theorem A.1 Let δ, ε ∈ (0, 1). A random draw of an m× n subgaussian random matrixsatisfies δs ≤ δ with probability at least 1− ε provided

m ≥ Cδ−2 maxs log(en/s), log(ε−1). (A.3)

The constant C > 0 depends only on the subgaussian parameter L.

Proof. We writeΦx = Vxξ,

where ξ is an L-subgaussian random vector of length mn and Vx is the m × nm block-diagonal matrix

Vx =1√m

xT 0 · · · 00 xT · · · 0...

......

...0 · · · 0 xT

.

Clearly, ‖Vx‖F = ‖x‖2 so that A := Vx : x ∈ Ds,n has radius

dF (A) = 1.

Moreover, since the operator norm of a block-diagonal matrix is the maximum of theoperator norms of the diagonal blocks and the operator norm of a vector is its `2-norm wehave

‖Vx‖2→2 =1√m‖x‖2 .

20

Hence, d2→2(A) = 1√m

. The bound of the γ2-functional via the Dudley type integral yields

γ2(A, ‖·‖2→2) ≤ C∫ 1/

√m

0

√logN(Ds,n, ‖ · ‖2/

√m,u)du = C

1√m

∫ 1

0

√logN(Ds,n, ‖ · ‖2, u)du.

The volumetric argument yields

N(Ds,n, ‖ · ‖2→2, u) ≤(n

s

)(1 + 2/u)s ≤ (en/s)s(1 + 2/u)s,

so that

γ2(A, ‖ · ‖2→2) ≤ C√

s

m

(√log(en/s) +

∫ 1

0

√log(1 + 2/u)du

)≤ C ′

√s log(en/s)

m.

The claim follows then from Theorem 1.4.

References

[1] M. Arcones and E. Gine. On decoupling, series expansions, and tail behavior of chaosprocesses. J. Theoret. Probab., 6(1):101–122, 1993.

[2] W. Bajwa, J. Haupt, A. M. Sayeed, and R. Nowak. Compressed channel sensing: Anew approach to estimating sparse multipath channels. Proc. IEEE, 98(6):1058–1076,June 2010.

[3] R. G. Baraniuk, M. Davenport, R. A. DeVore, and M. Wakin. A simple proof of theRestricted Isometry Property for random matrices. Constr. Approx., 28(3):253–263,2008.

[4] T. Blumensath and M. Davies. Iterative hard thresholding for compressed sensing.Appl. Comput. Harmon. Anal., 27(3):265–274, 2009.

[5] S. Boucheron, G. Lugosi, and P. Massart. Concentration inequalities using the entropymethod. Ann. Probab., 31(3):1583–1614, 2003.

[6] J. Bourgain and L. Tzafriri. Invertibility of ’large’ submatrices with applications tothe geometry of Banach spaces and harmonic analysis. Israel J. Math., 57(2):137–224,1987.

[7] E. J. Candes, J. Romberg, and T. Tao. Robust uncertainty principles: exact signalreconstruction from highly incomplete frequency information. IEEE Trans. Inform.Theory, 52(2):489–509, 2006.

[8] E. J. Candes, J. Romberg, and T. Tao. Stable signal recovery from incomplete andinaccurate measurements. Comm. Pure Appl. Math., 59(8):1207–1223, 2006.

[9] E. J. Candes and T. Tao. Near optimal signal recovery from random projections:universal encoding strategies? IEEE Trans. Inform. Theory, 52(12):5406–5425, 2006.

[10] B. Carl. Inequalities of Bernstein-Jackson-type and the degree of compactness ofoperators in Banach spaces. Ann. Inst. Fourier (Grenoble), 35(3):79–118, 1985.

[11] S. S. Chen, David L. Donoho, and M. A. Saunders. Atomic decomposition by BasisPursuit. SIAM J. Sci. Comput., 20(1):33–61, 1999.

[12] V. de la Pena and Evarist Gine. Decoupling. From Dependence to Independence.Probability and its Applications (New York). Springer-Verlag, New York, 1999.

[13] D. L. Donoho. Compressed sensing. IEEE Trans. Inform. Theory, 52(4):1289–1306,2006.

21

[14] R. Dudley. Uniform central limit theorems, volume 63 of Cambridge Studies in Ad-vanced Mathematics. Cambridge University Press, 1999.

[15] R. M. Dudley. The sizes of compact subsets of Hilbert space and continuity of Gaussianprocesses. J. Functional Analysis, 1:290–330, 1967.

[16] H. G. Feichtinger, F. Luef, and T. Werther. A guided tour from linear algebra to thefoundations of Gabor analysis. In Gabor and Wavelet Frames, volume 10 of Lect. NotesSer. Inst. Math. Sci. Natl. Univ. Singap., pages 1–49. World Sci. Publ., Hackensack,2007.

[17] X. Fernique. Regularite des trajectoires des fonctions aleatoires gaussiennes. In Ecoled’Ete de Probabilites de Saint-Flour, IV-1974, pages 1–96. Lecture Notes in Math.,Vol. 480. Springer, 1975.

[18] M. Fornasier and H. Rauhut. Compressive sensing. In O. Scherzer, editor, Handbookof Mathematical Methods in Imaging, pages 187–228. Springer, 2011.

[19] S. Foucart. Sparse recovery algorithms: sufficient conditions in terms of restrictedisometry constants. In Proceedings of the 13th International Conference on Approxi-mation Theory, 2010.

[20] S. Foucart. Hard thresholding pursuit: an algorithm for compressive sensing. SIAMJ. Numer. Anal., 49(6):2543–2563, 2011.

[21] K. Grochenig. Foundations of Time-Frequency Analysis. Appl. Numer. Harmon. Anal.Birkhauser Boston, 2001.

[22] J. Haupt, W. Bajwa, G. Raz, and R. D. Nowak. Toeplitz compressed sensing ma-trices with applications to sparse channel estimation. IEEE Trans. Inform. Theory,56(11):5862–5875, 2010.

[23] M. Herman and T. Strohmer. High-resolution radar via compressed sensing. IEEETrans. Signal Process., 57(6):2275–2284, 2009.

[24] A. Hinrichs and J. Vybıral. Johnson-Lindenstrauss lemma for circulant matrices.Random Struct. Algorithms, 39(3):391–398, 2011.

[25] F. Krahmer, G. E. Pfander, and P. Rashkov. Uncertainty principles for time–frequencyrepresentations on finite abelian groups. Appl. Comput. Harmon. Anal., 25(2):209–225, 2008.

[26] F. Krahmer and R. Ward. New and improved Johnson-Lindenstrauss embeddings viathe restricted isometry property. SIAM J. Math. Anal., 43(3):1269–1281, 2011.

[27] M. Ledoux and M. Talagrand. Probability in Banach Spaces. Springer-Verlag, Berlin,Heidelberg, NewYork, 1991.

[28] F. Lust-Piquard. Inegalites de Khintchine dans Cp (1 < p < ∞). C. R. Acad. Sci.Paris S’er. I Math., 303:289–292, 1986.

[29] F. Lust Piquard and G. Pisier. Noncommutative Khintchine and Paley inequalities.Ark. Mat., 29(2):241–260, 1991.

[30] S. Mendelson, A. Pajor, and N. Tomczak Jaegermann. Reconstruction and subgaus-sian operators in asymptotic geometric analysis. Geom. Funct. Anal., 17(4):1248–1282,2007.

[31] Q. Mo and S. Li. New bounds on the restricted isometry constant δ2k. Appl. Comput.Harmon. Anal., 31(3):460–468, 2011.

[32] G. E. Pfander and H. Rauhut. Sparsity in time-frequency representations. J. FourierAnal. Appl., 16(2):233–260, 2010.

[33] G. E. Pfander, H. Rauhut, and J. Tanner. Identification of matrices having a sparserepresentation. IEEE Trans. Signal Process., 56(11):5376–5388, 2008.

22

[34] G. E. Pfander, H. Rauhut, and J. A. Tropp. The restricted isometry property fortime-frequency structured random matrices. Probab. Theory Relat. Fields, to appear.

[35] G. Pisier. Some applications of the metric entropy condition to harmonic analysis. InBanach spaces, harmonic analysis, and probability theory (Storrs, Conn., 1980/1981),volume 995 of Lecture Notes in Math., pages 123–154. Springer, 1983.

[36] H. Rauhut. Circulant and Toeplitz matrices in compressed sensing. In Proc.SPARS’09, Saint-Malo, France, 2009.

[37] H. Rauhut. Compressive sensing and structured random matrices. In M. Fornasier,editor, Theoretical Foundations and Numerical Methods for Sparse Recovery, volume 9of Radon Series Comp. Appl. Math., pages 1–92. deGruyter, 2010.

[38] H. Rauhut, J. K. Romberg, and J. A. Tropp. Restricted isometries for partial randomcirculant matrices. Appl. Comput. Harmon. Anal., 32(2):242–254, 2012.

[39] J. K. Romberg. Compressive sensing by random convolution. SIAM J. Imaging Sci.,2(4):1098–1128, 2009.

[40] M. Rudelson. Random vectors in the isotropic position. J. Funct. Anal., 164(1):60–72,1999.

[41] M. Rudelson and R. Vershynin. On sparse reconstruction from Fourier and Gaussianmeasurements. Comm. Pure Appl. Math., 61:1025–1045, 2008.

[42] M. Talagrand. Regularity of Gaussian processes. Acta Math., 159(1-2):99–149, 1987.

[43] M. Talagrand. New concentration inequalities in product spaces. Invent. Math.,126(3):505–563, 1996.

[44] M. Talagrand. The Generic Chaining. Springer Monographs in Mathematics. Springer-Verlag, 2005.

[45] J. Tropp and D. Needell. CoSaMP: Iterative signal recovery from incomplete andinaccurate samples. Appl. Comput. Harmon. Anal., 26(3):301–321, 2008.

[46] R. Vershynin. Introduction to the non-asymptotic analysis of random matrices. InY. Eldar and G. Kutyniok, editors, Compressed Sensing: Theory and Applications,pages xii+544. Cambridge Univ Press, Cambridge, 2012.

[47] J. Vybıral. A variant of the Johnson-Lindenstrauss lemma for circulant matrices. J.Funct. Anal., 260(4):1096–1105, 2011.

[48] T. Zhang. Sparse recovery with orthogonal matching pursuit under RIP. IEEE Trans.Inform. Theory, 57(9):6215 – 6221, 2011.

23

Documents

Suprema of Chaos Processes and the Restricted Isometry ...rauhut/files/ChaosRIP.pdfSuprema of Chaos Processes and the Restricted Isometry Property Felix Krahmer, Shahar Mendelsony,