32
Random matrices with independent rows or columns Nicole Tomczak-Jaegermann Phenomena in High Dimensions in geometric analysis, random matrices, and computational geometry Roscoff, June 25–29, 2012 Nicole Tomczak-Jaegermann May, 2012 1 / 32

Random matrices with independent rows or columns

Embed Size (px)

Citation preview

Page 1: Random matrices with independent rows or columns

Random matrices with independent rows or columns

Nicole Tomczak-Jaegermann

Phenomena in High Dimensionsin geometric analysis, random matrices,

and computational geometry

Roscoff, June 25–29, 2012

Nicole Tomczak-Jaegermann (U of A) May, 2012 1 / 32

Page 2: Random matrices with independent rows or columns

Project on random matrices with indepeendent rows or columns, norms andcondition numbers of their submatrices.

Involved in this project:Alain Pajor and various subsets of:

Radosław Adamczak,Olivier Guedon,Rafał Latała,Alexander Litvak,Krzysztof Oleszkiewicz,Nicole Tomczak-Jaegermann

Nicole Tomczak-Jaegermann (U of A) May, 2012 2 / 32

Page 3: Random matrices with independent rows or columns

Basic definitions and Notation

Let X = (X(1), . . . ,X(N)) be a random vector in RN with full dimensional support.We say that the distribution of X is

logaritmically concave, if X has density of the form e−h(x) withh : RN → (−∞,∞] convex (one of equivalent definitions by C. Borell)

isotropic, if EX(i) = 0 and EX(i)X(j) = δi,j.

For x ∈ RN we put

|x| = ‖x‖2 =(∑N

i=1 x2i

)1/2

PIx - canonical projection of x onto {y ∈ RN : supp(y) ⊂ I}, I ⊂ {1, . . . ,N}.

For integers k 6 ` we we use the shorthand notation [k, `] = {k, . . . , `}.

Nicole Tomczak-Jaegermann (U of A) May, 2012 3 / 32

Page 4: Random matrices with independent rows or columns

Examples

1. Let K ⊂ Rn be a convex body ( = compact convex, with non-empty interior)(symmetric means −K = K).X a random vector uniformly distributed in K. Then the corresponding probabilitymeasure on Rn

µK(A) =|K ∩A|

|K|

is log-concave (by Brunn-Minkowski).Moreover, for every convex body K there exists an affine map T such that µTK isisotropic.

2. The Gaussian vector G = (g1, ...,gn), where gi’s have N(0, 1) distribution, isisotropic and log-concave.

3. Similarly the vector X = (ξ1, ..., ξn), where ξi’s have exponential distribution(i.e., with density f(t) = 1√

2exp(−

√2|t|), for t ∈ R)

is isotropic and log-concave.

Nicole Tomczak-Jaegermann (U of A) May, 2012 4 / 32

Page 5: Random matrices with independent rows or columns

Random Matrices

Let n,N > 1 be integers (a priori, no relation between them); fixed throughout.Our interest in behaviour of invariants as functions of n,N

Random matrix: A is n×N matrix, defined either by a sequence of rows orcolumns, which will be independent random vectors

A =

. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .

, A =

......

......

......

......

......

......

......

......

......

......

Difference with RMT where entries are independent,limiting behaviour of invariants when the size→∞

Nicole Tomczak-Jaegermann (U of A) May, 2012 5 / 32

Page 6: Random matrices with independent rows or columns

Random Matrices, Norms of submatrices, Ak,m

Let k 6 n and m 6 N integers.

Ak,m = maximal operator norm over submatrices of A with k rows,m columns.

“operator norm” means the norm A : Rm → Rk with the Euclidean norms.

Example: Let X1, . . . ,Xn ∈ RN be independent random vectors.Let A be n×N random matrix with rows X1, . . . ,Xn. This acts as an operator

A : RN → Rn Ax =(〈Xj, x〉

)nj=1∈ Rn, for x ∈ RN.

Ak,m = supJ⊂[1,n]|J|=k

supx∈Um

∑j∈J

|〈Xj, x〉|21/2

.

Um = {x ∈ SN−1 : | supp x| 6 m}.

Nicole Tomczak-Jaegermann (U of A) May, 2012 6 / 32

Page 7: Random matrices with independent rows or columns

Ak,m – examples: independent columns

I: A is n×N matrix defined by independent isotropic log-concave columns

A =

......

......

......

......

......

......

......

......

......

......

Main application to approximation of a covariance matrix by empirical covariancematricesAn,m (i.e., k = n) was sufficient. It corresponds to submatrices of full columns,thus preserving the structure of the matrix.

Nicole Tomczak-Jaegermann (U of A) May, 2012 7 / 32

Page 8: Random matrices with independent rows or columns

Approximation of a covariance matrixLet X ∈ Rn isotropic and log-concave,(Xi)i6N independent copies of X.By isotropicity, EX⊗ X = Id.

By the law of large numbers, the empirical covariance matrix converges to Id.

1N

N∑i=1

Xi ⊗ Xi −→ Id as N→∞..

Kannan-Lovasz-Simonovits asked (around 1995), motivated by a problem ofcomplexity in computing volume in high dimension:Under the above assumptions, estimate the size N for which, given ε ∈ (0, 1),

∥∥∥ 1N

N∑i=1

Xi ⊗ Xi − Id∥∥∥ 6 ε

holds with high probability.

Typical “translation” of a limit law into a quantitative statement in the non-limittheory.

Nicole Tomczak-Jaegermann (U of A) May, 2012 8 / 32

Page 9: Random matrices with independent rows or columns

KLS questionKLS showed that for any ε, δ ∈ (0, 1) (under a finite third moment assumption),N > (C/εδ)n2 gives the required approximation, with probability 1 − δ.

Bourgain (1996): for any ε, δ ∈ (0, 1), there exists C(ε, δ) > 0 such thatN = C(ε, δ)n log3 n gives the approximation with probability 1 − δ.

Rudelson:using non-commutative Khinchine inequalities of Pisier andLust-Piquard/Pisier;by majorazing measure approach of Talagrand.

Several other authors improved powers of logarithm, from late 1990’s to 2010

ALPT: N proportional to n is sufficient (JAMS 2010), improved in CRAS 2011.Let X ∈ Rn be isotropic log-concave,X1, . . . ,XN be independent copies of X.

P

(∥∥∥ 1N

N∑i=1

Xi ⊗ Xi − Id∥∥∥ 6 C

√n/N

)> 1 − e−c

√n.

So letting ε = C√n/N we get N = Cn/ε2.

Nicole Tomczak-Jaegermann (U of A) May, 2012 9 / 32

Page 10: Random matrices with independent rows or columns

Extremal s-numbers of matrices with independent rows

As the corollary of ALPT we geta quantitative version of Bai-Yin theorem for matrices of a fixed size:

Nicole Tomczak-Jaegermann (U of A) May, 2012 10 / 32

Page 11: Random matrices with independent rows or columns

Ak,m – examples: independent rows

II: A is n×N matrix, defined by independent (isotropic log-concave) rows

A =

. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . .

Ak,m is harder to tackle if m < N; because the row structure of A is destroyed.

Ak,m = supJ⊂[1,n]|J|=k

supx∈Um

∑j∈J

|〈Xj, x〉|21/2

.

Applicable for studies of reconstruction problems and in particular RIP; for uniformversions of some geometric questions on large deviation estimates....

Nicole Tomczak-Jaegermann (U of A) May, 2012 11 / 32

Page 12: Random matrices with independent rows or columns

Large Deviation for Ak,m

Intuition: A – a random matrix with independent isotropic log-concave rows. Thenfor a submatrix AJ,I with k rows and m columns,(

E‖AI,J‖2)1/2

>√

max{k,m}.

One of main results by ALLPT is a large deviation theorem for Ak.m.

Let n,N, k 6 n, and m 6 N, let A be a n×N matrix with independent isotropiclog-concave rows. For t > 1 we have

P(Ak,m > Ct λ

)6 exp(−tλ/

√log(3m)),

where

λ =√

log log(3m)√m log

(emax{N,n}

m

)+√k log

(enk

)and C is a universal constant.

The bound is essentially optimal up to√

log logm factor.

Nicole Tomczak-Jaegermann (U of A) May, 2012 12 / 32

Page 13: Random matrices with independent rows or columns

Paouris’ large deviation theorem

Paouris’ large deviation (2005):There exists c > 0 such that if X is an isotropic log-concave random vector in RN,then for all t > 1,

P{

|X| > c t√N}

6 exp(−t√N).

Equivalent formulations, for pth moments, etc....

Weak parameterFor a vector X in RN we define

σX(p) := supt∈SN−1

(E|〈t,X〉|p)1/p p > 1.

Examples

For isotropic log-concave vectors X, σX(p) 6 p/√

2.

For subgaussian vectors X, σX(p) 6 C√p.

Nicole Tomczak-Jaegermann (U of A) May, 2012 13 / 32

Page 14: Random matrices with independent rows or columns

Paouris’ theorem with weak parameter

For any log-concave random vector X,

(E|X|p)1/p 6 C(E|X| + σX(p)

)for p > 2,

and if X is isotropic,

P (|X| > t) 6 exp(

− σ−1X

( tC

))for t > C(E|X|2)1/2.

Nicole Tomczak-Jaegermann (U of A) May, 2012 14 / 32

Page 15: Random matrices with independent rows or columns

Uniform large deviation theorem

Uniform Paouris-type theorem [ALLPT]:For 1 6 m 6 N and an isotropic log-concave vector X in RN we have, for t > 1,

P(

supI⊂[1,N]|I|=m

|PIX| > ct√m log

(eNm

))6 exp

(− σ−1

X

( t√m√

log(em)log(eNm

))).

If X is isotropic log-concave in RN, then so is PIX, for every I ⊂ [1,N]. Howeverthe probability is too high to beat the complexity of the family of subsets (which is(Nm

)). So a direct union bound argument cannot be used.

Trade off of an extra logarithm in the threshold; is based on new non-trivialestimates for order statistics.

Nicole Tomczak-Jaegermann (U of A) May, 2012 15 / 32

Page 16: Random matrices with independent rows or columns

Order Statistics

For an N–dimensional random vector X by X∗1 > X∗2 > . . . > X∗N we denote thenonincreasing rearrangement of |X(1)|, . . . , |X(N)|.

In particular, X∗1 = max{|X(1)|, . . . , |X(N)|} and X∗N = min{|X(1)|, . . . , |X(N)|}.Random variables X∗k, 1 6 k 6 N, are called order statistics of X.

Problem Find upper bound for P (X∗k > t).

Nicole Tomczak-Jaegermann (U of A) May, 2012 16 / 32

Page 17: Random matrices with independent rows or columns

Order Statistics for isotropic log-concave vectors

Let X be N-dimensional log-concave isotropic vector. Then

P (X∗k > t) 6 exp(

− σ−1X

( 1Ct√k))

for t > C log(eNk

).

The weak parameter is needed for a better control of a probability for randomvectors which are sums of independent random vectors, in terms of sequences ofcoefficients in these sums.Latala (2010) proved a version without the weak parameter.ALLPT (2012) the present version

The approach is based on the suitable estimate of moments of the process NX(t)

NX(t) :=

n∑i=1

1{X(i)>t}, t > 0.

That is, NX(t) is equal to the number of coordinates of X larger than or equal to t.

Nicole Tomczak-Jaegermann (U of A) May, 2012 17 / 32

Page 18: Random matrices with independent rows or columns

Estimate for NX

For any isotropic log-concave vector X and p > 1 we have

E(t2NX(t))p 6 (CσX(p))2p for t > C log( Nt2

σ2X(p)

).

To get estimate for order statistics we observe that X∗k > t implies thatNX(t) > k/2 or N−X(t) > k/2 and vector −X is also isotropic and log-concave.Estimates for NX and Chebyshev’s inequality give

P (X∗k > t) 6(2k

)p(ENX(t)p + EN−X(t)p

)6 2( Cpt√k

)2p

provided that t > C log(Nt2/p2). We take p = 1eCt√k and notice that the

restriction on t follows by the assumption that t > C log(eN/k).

Nicole Tomczak-Jaegermann (U of A) May, 2012 18 / 32

Page 19: Random matrices with independent rows or columns

Estimate for NX(t)

Proof of estimate for NX(t) is based on two ideas.

the restriction of a log-concave vector X to a convex set is log-concave;

Paouris’ large deviation theorem.

Nicole Tomczak-Jaegermann (U of A) May, 2012 19 / 32

Page 20: Random matrices with independent rows or columns

Uniform Paouris-type estimate

For any m 6 N and any isotropic log-concave vector X in RN we have for t > 1,

P(

supI⊂[1,N]|I|=m

|PIX| > ct√m log

(eNm

))6 exp

(− σ−1

X

( t√m√

log(em)log(eNm

))).

Idea of the proof. It is easy.

supI⊂[1,N]|I|=m

|PIX| =( m∑k=1

|X∗k|2)1/2

6 2( s−1∑i=0

2i|X∗2i |2)1/2

,

where s = dlog2me.

Nicole Tomczak-Jaegermann (U of A) May, 2012 20 / 32

Page 21: Random matrices with independent rows or columns

Applications – reconstruction, compressed sensing

Let n,N > 1. Let T ⊂ RN and Γ be an n×N matrix.

Consider any vector x ∈ T . Assuming that Γx is known, the problem is toreconstruct x with a fast algorithm.

Hypothesis on T and on Γ . The common hypothesis is that T = Um.

the Restricted Isometry Property (RIP) of order m: for all m-sparse vectors x,

(1 − δ)|x| 6 |Γx| 6 (1 + δ)|x|.

The RIP parameter:

δm = δm(Γ) = supx∈Um

∣∣|Γx|2 − E|Γx|2∣∣

Introduced by E. Candes, J. Romberg and T. Tao around 2006.If δ2m is appropriately small then every m-sparse vector x can be reconstructedfrom Γx by the `1-minimization method.

Nicole Tomczak-Jaegermann (U of A) May, 2012 21 / 32

Page 22: Random matrices with independent rows or columns

More notation

Upper estimates for

δm = δm(Γ) = supx∈Um

∣∣|Γx|2 − E|Γx|2∣∣

More generally, for any T ⊂ SN−1,

δT (Γ) = supx∈T

∣∣|Γx|2 − E|Γx|2∣∣ .

Let X1, . . . ,Xn ∈ RN independent; Γ the n×N matrix with rows Xi. (Inreconstruction problems – we look for vectors given by their measurements)

Let 1 6 k 6 n and define the parameter Γk(T) by

Γk(T)2 = sup

y∈Tsup

I⊂{1,...,n}|I|=k

∑i∈I

| 〈Xi,y〉 |2.

We write Γk,m = Γk(Um).It agrees with the definition introduced earlier – of Ak,m

Nicole Tomczak-Jaegermann (U of A) May, 2012 22 / 32

Page 23: Random matrices with independent rows or columns

Fundamental Lemma:

[ALPT, CRAS], [ALLPT]:Let X1, . . . ,Xn ∈ RN be independent isotropic, T ⊂ SN−1 finite. Let 0 < θ < 1and B > 1. Then with probability at least 1 − |T | exp

(−3θ2n/8B2

),

δT

(Γ√n

)= supy∈T

∣∣∣∣∣ 1nn∑i=1

(|〈Xi,y〉|2 − E|〈Xi,y〉|2)

∣∣∣∣∣6 θ+

1n

(supy∈T

n∑i=1

|〈Xi,y〉|21{|〈Xi,y〉|>B}

+ supy∈T

En∑i=1

|〈Xi,y〉|21{|〈Xi,y〉|>B}

)

6 θ+1n

(Γk(T)

2 + EΓk(T)2) .

where k 6 n is the largest integer satisfying k 6 (Γk(T)/B)2.

Nicole Tomczak-Jaegermann (U of A) May, 2012 23 / 32

Page 24: Random matrices with independent rows or columns

Corollary for RIP:

Let Xi, Γ , 0 < θ < 1 and B > 1, as before. Assume that m 6 N satisfies

m log11eNm

63θ2n

16B2 .

Then with probability at least 1 − exp(− 3θ2n

16B2

)one has

δm

(Γ√n

)= supy∈Um

∣∣∣∣∣ 1nn∑i=1

(|〈Xi,y〉|2 − E|〈Xi,y〉|2)

∣∣∣∣∣6 2 θ+

2n

(Γ 2k,m + EΓ 2

k,m

),

where k 6 n is the largest integer satisfying k 6 (Γkm/B)2.

Nicole Tomczak-Jaegermann (U of A) May, 2012 24 / 32

Page 25: Random matrices with independent rows or columns

RIP Theorem for matrices with independent rows:

Let n,N > 1 and 0 < θ < 1. Let Γ be an n×N matrix, whose rows areindependent isotropic log-concave random vectors Xi, i 6 n.There exists an absolute constant c > 0, such that if m 6 N satisfies

m log log 3m(

log3 max{N,n}

m

)2

6 c

log(3/θ)

)2

n

thenδm(Γ/

√n) 6 θ

with high probability.

Optimal up to a log log factor.For unconditional distributions we know that this factor can be removed; weconjecture that in general can be removed as well.

Nicole Tomczak-Jaegermann (U of A) May, 2012 25 / 32

Page 26: Random matrices with independent rows or columns

Return to Large Deviation for Ak,m

Recall the result for Ak,m.For n 6 N, k 6 n, m 6 N,

Ak,m = supJ⊂[1,n]|J|=k

supx∈Um

∑j∈J

|〈Xj, x〉|21/2

.

Then for t > 1 we have

P(Ak,m > Ct λ

)6 exp(−tλ/

√log(3m)),

whereλ =

√log log(3m)

√m log(eN/m) +

√k log(en/k).

Nicole Tomczak-Jaegermann (U of A) May, 2012 26 / 32

Page 27: Random matrices with independent rows or columns

Ak,m, idea of proof

To bound Ak,m one has then to prove uniformity with respect to two families ofdifferent character:one being {I ⊂ [1,N] : |I| = k}; and the other equal to Um(RN).

X1, . . . ,Xn independent isotropic N-dimensional log-concave vectors.x = (xi) ∈ Rn with some structural assumptions, like sparsity.... we considerY =∑ni=1 xiXi.

By duality we need to estimate probability that supJ⊂[1,N]|J|=m

∣∣∣∣∣PJ(n∑i=1

xiXi

)∣∣∣∣∣ > t

=

supJ⊂[1,N]|J|=m

|PJY| > t

for every t > 0, depending on the norms |x| and ‖x‖∞.Complexity of these families are too high for using a union bound argument, andso we need to come up with some chaining.

Nicole Tomczak-Jaegermann (U of A) May, 2012 27 / 32

Page 28: Random matrices with independent rows or columns

Ak,m, continuation

This leads us to distinguishing two cases, depending on the relation between kand k ′:

k ′ = inf{` > 1 : m log(eN/m) 6 ` log(en/`)}.

Step 1. when k > k ′. We reduce to the case k 6 k ′.Step 2. Case k 6 k ′.

To build intuition we may take k ′ ∼ k.

Nicole Tomczak-Jaegermann (U of A) May, 2012 28 / 32

Page 29: Random matrices with independent rows or columns

Ak,m, Step 1

Step 1. We take only the family of k-sparse vectors, but do not need projections.{supx∈Uk

∣∣∣∣∣n∑i=1

xiXi

∣∣∣∣∣ > t

}.

Assume first that x is a “flat” vector: xi = ±a or 0 and a = k−1/2, wherek = | supp(x)|. That is, |x| = 1 and ‖x‖∞ = k−1/2.Direct argument shows that the estimate is right for such vectors..

We may have 0 < |x1| 6 |x2| 6 . . . |xk| and xj = 0 for j > k, which may consist ofsome number of “flat” vectors.The first natural try is to consider separately each flat vector and then add theresults together. This works but may produce an extra logarithmic factor.

Nicole Tomczak-Jaegermann (U of A) May, 2012 29 / 32

Page 30: Random matrices with independent rows or columns

Ak,m, Step 1, chaining

Chaining:let k1 ∼ k/2, k2 ∼ k/4, . . ., ks ∼ k/2s ∼ k ′. So

∑sj=1 kj = k ′.

Given x ∈ Uk, let x1 be the restriction of x to the k1 smallest coordinates;x2 be the restriction of x to the next k2 smallest coordinates, etc.

This way,

x =

s∑i=1

xi

where xi’s have mutually disjoint supports, each of cardinality 6 ki, andcoordinates of xi are larger than coordinates of xj if i < j.We use Paouris-type estimates for each xi...This is similar to ALPT (JAMS).

Nicole Tomczak-Jaegermann (U of A) May, 2012 30 / 32

Page 31: Random matrices with independent rows or columns

Ak,m, Step 2

Step 2. Another chaining argument, more delicate in definitions of ε-nets. We usethe uniform estimate for projections of sums, which in general is weaker than inCase 1.At this step we lose log logm.

Nicole Tomczak-Jaegermann (U of A) May, 2012 31 / 32

Page 32: Random matrices with independent rows or columns

Congratulations Alain!

Nicole Tomczak-Jaegermann (U of A) May, 2012 32 / 32