Largest Eigenvalues and Eigenvectors in … Eigenvalues and Eigenvectors in Multivariate Analysis Iain Johnstone, Statistics, Stanford [email protected] NESS, April 17, 2010 NESS 4/10

Largest Eigenvalues and Eigenvectors in MultivariateAnalysis

Iain Johnstone, Statistics, Stanford

[email protected]

NESS, April 17, 2010

NESS 4/10 – p.1

Outline

Fisher Hotelling Wigner Tracy Widom

NESS 4/10 – p.2

Outline

Fisher Hotelling Wigner Tracy Widom

I. Setting : Single & Double Wishart, examples

II. Largest Eigenvalue: Limiting Law ( H0, some HA)

III. Largest Eigenvalue: Concentration

IV. Largest Eigenvector(s)

Thanks to: Peter Forrester, Zongming Ma, Debashis Paul, Zongming Ma, P atrick

Perry, Morteza Shahram, NSF, NIH

NESS 4/10 – p.2

Eigenvalues: Theme

N Many problems of classical multivariate statistics involv e

eigenvalues x1 > x2 > . . . > xp

N Asymptotics in p⇒ useful information, [even p small]

N Illustrate for null distributions for x1

NESS 4/10 – p.3

Eigenvalues: Theme

N Many problems of classical multivariate statistics involv e

eigenvalues x1 > x2 > . . . > xp

N Asymptotics in p⇒ useful information, [ espec. p small]

N Illustrate for null distributions for x1

Example: (multiple) Regression.

y = X β + ǫn×p n×q q×p n×p

ǫ ∼ N(0, In ⊗ Σp).

Tests of H0 : β = 0 use roots of det[A− xi(A+B)] = 0.

A = SSH = βTXTXβ “Hypothesis”

B = SSE = yT (I − PX)y “Error”

NESS 4/10 – p.4

Example: PCA & population structure from genetic data

Patterson et. al. (2006), Price et. al. (2006)

n = #individuals , p = #SNPs/markers

yij = (normalized) allele count ,

case i = 1, . . . , n, marker j = 1, . . . , p.

A = n× sample covariance matrix of yij

N Eigenvalues x1 > x2 > . . . > xmin(n,p)

N How many xi are significant?

N Under H0, distribution of x1 if A ∼ Wp(n, I)?

NESS 4/10 – p.5

Wishart Distribution

Y = (Yij) n× p

Rows yi = (Yij)indep∼ Np(µ,Σ)

Definition: (unnormalized) sample covariance

A =

n∑

i=1

(yi − y)T (yi − y) ∼ Wp(n− 1,Σ)

p variables, n− 1 degrees of freedom.

[ If µ = 0, A = Y TY ∼ Wp(n,Σ) ]

NESS 4/10 – p.6

Wishart Eigenvalues

2 draws of eigenvalues from W15(60, I)

– Spreading of sample eigenvalues from 4 to [1,9].

2 draws of 15 independent U(1, 9) variates – very different!

wresvec

wreplic

1

2

2 4 6 8

order statsorder stats

1

2

wishart-eigenvalueswishart-eigenvalues

NESS 4/10 – p.7

The Quarter Circle Law

Description of spreading phenomenon:

Empirical distribution function: for eigenvalues {ℓi}pj=1

Gp(t) = p−1#{ℓj ≤ t} → G(t) = g(t)dt.

Marcenko-Pastur, (67) For A ∼ Wp(n, I) p/n → γ

For Σ = I ,

gMP (t) =

√

(b+ − t)(t− b−)

2πγt,

b± = (1±√γ)2.

0 0.5 1 1.5 2 2.5 3 3.5 40

0.5

1

1.5

n=4pn=p

NESS 4/10 – p.8

Double Wishart Setting

A ∼ Wp(n1, I)

B ∼ Wp(n2, I)

2 independent Wisharts, p ≤ n1, n2

“null hypothesis” setting

Common feature: roots := (xi)pi=1 of generalized eigenproblem:

det[A− xi(A+B)] = 0, [Double]

det[A− xiI] = 0. [Single]

Single Wishart

N Principal Component analysis

N Factor analysis

N Multidimensional scaling

Double Wishart

N Canonical correlation analysis

N Multivariate Analysis of Variance

(MANOVA)

N Multivarate regression analysis

N Discriminant analysis

N Tests of equality of covariance matrices

NESS 4/10 – p.9

Joint density of eigenvalues

Single Wishart: det[A− xiI] = 0.

Double Wishart: det[A− xi(A+B)] = 0.

In each case, ( Fisher, Girshick, Hsu, Mood, Roy, (1939) ):

f(x1, . . . , xp) = c∏

i

w1/2(xi)∏

i<j

(xi−xj) x1 ≥ . . . ≥ xp

Single Wishart: w(x) = xn−pe−x, (Laguerre )

Double Wishart: w(x) = xp−q−1(1− x)n−p−q−1. (Jacobi )

[Gaussian: w(x) = e−x2/2. (Hermite )]

NESS 4/10 – p.10

Joint density of eigenvalues [R,C]

Single Wishart: det[A− xiI] = 0.

Double Wishart: det[A− xi(A+B)] = 0.

In each case, ( Fisher, Girshick, Hsu, Mood, Roy, (1939) ):

f(x1, . . . , xp) = c∏

i

wβ/2(xi)∏

i<j

(xi−xj)β x1 ≥ . . . ≥ xp

β = 1(R),= 2(C)

Single Wishart: w(x) = xn−pe−x, (Laguerre )

Double Wishart: w(x) = xp−q−1(1− x)n−p−q−1. (Jacobi )

[Gaussian: w(x) = e−x2/2. (Hermite )]

NESS 4/10 – p.11

Focus on Largest Eigenvalue(s)

N target of methods, e.g. PCA, CCA

N Roy’s Union-Intersection approach

–leads to largest root tests

N conservative tests via sequential approach

–having rejected xk, test xk+1 using largest root

N smallest eigenvalues

N RMT brings new tools, results

NESS 4/10 – p.12

Outline

I. Setting

1. Single & Double Wishart, examples

II. Largest Eigenvalue: Limiting Law ( H0)

1. Gaussian

2. Laguerre/Single W.

3. Jacobi/Double W.


1. Single W.

2. Double W.


NESS 4/10 – p.13

Gaussian eigenvalues vs. order statistics

Differing scales for five largest of:

a) eigenvalues from GUE(200), b) 200 N(0, 1) variates.

Largest: 19.5 ± 0.3 2.6 ± 0.4

resvec

replic

1

2

3

4

5

6

18.0 18.5 19.0 19.5 20.0

eigenvalueseigenvalues

2.0 2.2 2.4 2.6 2.8

order statsorder stats

NESS 4/10 – p.14

Symmetric Gaussian matrices

A – symmetric p× p matrix drawn from ( β = 1 for GOE)

f(A) = c exp{−β2

tr(ATA)}.

[for GUE: complex Hermitian]. Largest eigenvalue:

θp = λmax(A)

Centering and scaling constants:

µp =√

2p, σp = 2−1/2p−1/6 ratio p−2/3!

Tracy-Widom limit: TW (1994, 1996) For β = 1, 2, 4 (R,C,Q),

θp − µp

σp

D⇒ Fβ.

NESS 4/10 – p.15

Tracy-Widom distributions (1994,96)

-4 -2 0 2 4

q′′ = sq + 2q3 (Painlev e II)

q(s) ∼ Ai(s) as s → ∞

F2(s) = e−∫∞

s (x−s)2q(x)dx

F1(s)2 = F2(s)e

−∫∞

s q(x)dx.

Right tail decay: 1− Fβ(s) ≈ e−(2β/3)s3/2 .

Rate of convergence: First order p−1/3, Choup (2006,07)

“Second-order”: J + Ma (2008) For β = 1, 2, & µp(β=1)=

√2p−1,

|P{λmax(A) ≤ µp + σps} − Fβ(s)| ≤ Cp−2/3e−s/2.

NESS 4/10 – p.16

Approximations at p = 2

−8 −6 −4 −2 0 2 4 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35Density Plot: n =2

x

f(x)

TW1empirical

−6 −4 −2 0 2 4 6

−6

−4

−2

0

2

4

6

Probability Plot: n =2

TW percentiles

simula

ted pe

rcenti

les

– Better approximation in right tail : location of turning po int of

Airy function

– additional O(p−4/3) mean correction included

NESS 4/10 – p.17

Approximations at p = 10

−8 −6 −4 −2 0 2 4 60

0.05

0.1

0.15

0.2

0.25

0.3

0.35Density Plot: n =10

x

f(x)

TW1empirical

−6 −4 −2 0 2 4 6

−6

−4

−2

0

2

4

6

Probability Plot: n =10

TW percentiles

simula

ted pe

rcenti

les

NESS 4/10 – p.18

Outline

I. Setting



1. Gaussian


3. Jacobi/Double W.


1. Single W.

2. Double W.


NESS 4/10 – p.19

Wishart: Tracy Widom Limits

Theorem For {real, complex }, {single, double } Wishart matrices,

if p/n → γ, [or (p/n1, p/n2) → (γ1, γ2),] then

P{nx1 ≤ µnp + σnps|H0} → Fβ(s)

– “universality” in RMT (Deift, 06 ICM)

Single Wishart: β = 2 (Johansson, 00) , β = 1 (J, 01)

“Microarray case” n, p → ∞, n/p → 0,∞. (El Karoui, 03)

Non Gaussian: (Soshnikov, 02, 06; S. + Fyodorov, 06, P eche, 07)

N symm subGaussian: ⇒ TW limit; heavy tails ⇒ Poisson

NESS 4/10 – p.20

Second order accuracy

For {real, complex }, {single, double } Wishart matrices, if

p/n → γ, [or (p/n1, p/n2) → (γ1, γ2),] then

|P{nx1 ≤ µnp + σnps|H0} − Fβ(s)| ≤ Ce−csp−2/3.

Single Wishart Complex: El Karoui (2006) .

Real: Ma (2008)

µnp =(√

n−12+√

p−12

)2

σnp =(√

n−12+√

p−12

)

(

1√n−1

2

+1√p−1

2

)1/3

.

NESS 4/10 – p.21

Beyond the “Null Hypothesis”

Classical RMT ensembles (e.g. Wp(n, I))

↔ “null hypothesis”, symmetry, no structure.

For Wp(n,Σ): For what conditions on Σ does

P{x1 ≤ µnp(Σ) + σnp(Σ)s} → Fβ(s) ??

Some answers:

N sufficiently many ℓk(Σ) accumulate near ℓ1(Σ) (data in C)

El Karoui, 2007

N small number of (not too!) isolated ℓi(Σ)Baik-Ben Arous-P eche, 2005, Baik-Silverstein 2006, Paul 2007

Harding (2008, Economics Letters)

NESS 4/10 – p.22

Finite rank perturbations: heuristics

“Spiked” model: Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1)

ℓ1 ≥ ℓ2 ≥ · · · ≥ ℓM ≥ 1, M fixed as p ր ∞, p/n → γ.

Σ = Ifluctuation

Tracy-Widom

Marcenko-Pastur density for bulk

NESS 4/10 – p.23

Finite rank perturbations: heuristics

“Spiked” model: Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1)

ℓ1 ≥ ℓ2 ≥ · · · ≥ ℓM ≥ 1, M fixed as p ր ∞, p/n → γ.

Σ = Ifluctuation

Tracy-Widom


ℓ1 ≫ 1

M = 1 .near Gaussian

fluctuation


NESS 4/10 – p.23

Finite rank model: phase transition

Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.

Interior point transition at ℓ1 = 1 +√γ:

Baik-Ben Arous-Peche, Paul,

Baik-Silverstein

.Tracy-Widom

fluctuation

Critical point:

NESS 4/10 – p.24


Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.



Baik-Silverstein

.Tracy-Widom

fluctuation

Critical point:

NESS 4/10 – p.24


Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.



Baik-Silverstein

.Tracy-Widom

fluctuation

Critical point:

NESS 4/10 – p.24


Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.



Baik-Silverstein

Gaussian

Critical point:

.

NESS 4/10 – p.24


Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.



Baik-Silverstein

Gaussian

Critical point:

.

NESS 4/10 – p.24


Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.



Baik-Silverstein

Critical point:

fluctuationTracy-Widom

.

Gaussian

Critical point:

.

NESS 4/10 – p.25

Recent example: economics

How many factors are present in security returns? Use PCA??

S.J. Brown (1989) simulations, calibrated to NYSE data

4 factor model* → Σ = diag ( ℓ1, . . . , ℓ4, σ2e , . . . , σ

2e)

ℓ1 > ℓ2 = ℓ3 = ℓ4 > σ2e

Goal: Use PCA to estimate ℓ1, . . . , ℓ4.

Empirical puzzle (Brown, 1989):

many sample eigenvalues swamp ℓ2, ℓ3, ℓ4.

Rit = Σ4

k=1bikfkt + eit; i = 1, . . . , p securities ; t = 1, . . . , T times .(*)

bik ∼ N(β, σ2

b ); fkt ∼ N(0, σ2

f ); eit ∼ N(0, σ2

e) all independent

NESS 4/10 – p.26

Recent example: economics

How many factors are present in security returns? Use PCA??

S.J. Brown (1989) simulations, calibrated to NYSE data

4 factor model* → Σ = diag ( ℓ1, . . . , ℓ4, σ2e , . . . , σ

2e)

ℓ1 > ℓ2 = ℓ3 = ℓ4 > σ2e

Goal: Use PCA to estimate ℓ1, . . . , ℓ4.

Empirical puzzle (Brown, 1989):

many sample eigenvalues swamp ℓ2, ℓ3, ℓ4.

Explanation (Harding, 2007):

ℓ2, ℓ3, ℓ4 are below the 1 +√γ phase transition.

Rit = Σ4

k=1bikfkt + eit; i = 1, . . . , p securities ; t = 1, . . . , T times .(*)

bik ∼ N(β, σ2

b ); fkt ∼ N(0, σ2

f ); eit ∼ N(0, σ2

e) all independent

NESS 4/10 – p.26

Population values

50 100 150 2000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

p = # securities

eige

nval

ues

← Population eigenvalue 1

↓ Population eigenvalues 2−4 ↓ noise σ

e2

NESS 4/10 – p.27

Brown(1989) plot

p = # securities

eige

nval

ues

← Av. Sample eigenvalues 2−9

Av. top eigenvalue ↓

50 100 150 2000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

NESS 4/10 – p.28

Marcenko-Pastur & phase transition

p = # securities

eige

nval

ues

↓ Marcenko−Pastur limit

↓ Phase transition

50 100 150 2000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

Source: Harding(2007).

NESS 4/10 – p.29

Outline

I. Setting



1. Gaussian


3. Jacobi/Double W.


1. Single W.

2. Double W.


NESS 4/10 – p.30

Roy’s Greatest Root Test: Setting

Example: (multiple) Regression.

y = X β + ǫn×p n×q q×p n×p

ǫ ∼ N(0, In ⊗ Σp).

Tests of H0 : β = 0 use roots of det[A− xi(A+B)] = 0.

A = SSH = βTXTXβ “Hypothesis”

B = SSE = yT (I − PX)y “Error”

Roy’s greatest root test (union-intersection test):

Reject H0 if x1 > x1,α.

NESS 4/10 – p.31

Roy’s Greatest Root Test: Package Output

SAS: (From Gledhill et. al.: NOAA Fisheries Reef Fish Video Surve ys.)

!"#$%&'()&*+$,-."/-",%&0,",-0,-10&"23&4&"55/67-8",-620&96/&,:%&;<=>?@;&

5/61%3+/%&/"2&A-,:&"$$&8-216+2,&"23&:"#-,",&3",")!

"#$!!!!%#&!!!!'#()*$!

"+,+-.+-/! 0,123 4!0,123 '25!64 637!64 89!:!4!

;-1<.=!>,5?@,! B)C('BD'DE ')'C CB FCF)(G B)HIHC&

8-11,-=.!A9,/3! B)CB''CCIB ')'C CB C'C B)HIFH&

BC+311-7DE>,F13G!A9,/3! B)CDBIECB' ')'C CB IBC)(H B)HICC&

HCG=.!I93,+3.+!HCC+! B)HEGCD(HB H)GE 'B 'BI B)BBFG&

'JAKL!4!"+,+-.+-/!MC9!HCG=.!I93,+3.+!HCC+!-.!,7!2NN39!?C27@*!

R: (car, heplots packages, Fox et. al.)Multivariate Tests:

Df test stat approx F num Df den Df Pr(>F)

Pillai 5.0000 0.417938 1.845226 15.0000 171.0000 0.0320861 *

Wilks 5.0000 0.623582 1.893613 15.0000 152.2322 0.0276949 *

Hotelling-Lawley 5.0000 0.538651 1.927175 15.0000 161.0000 0.0239619 *

Roy 5.0000 0.384649 4.384997 5.0000 57.0000 0.0019053 **

---

Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1

NESS 4/10 – p.32

Tracy-Widom approximation for logit

Let W = logit (x1) = log(x1/(1− x1)).

Result: [J,08] As p ∝ n1, n2 → ∞, (and with O(p−2/3) error):

W − µp

σp

D⇒ W∞ ∼ F1.

In other words:x1 ≈

eµ+σW∞

1 + eµ+σW∞

.

Percentiles fα:

f.90 = 0.4501

f.95 = 0.9793

f.99 = 2.0234-4 -2 2 4

0.05

0.1

0.15

0.2

0.25

0.3

NESS 4/10 – p.33

Centering and Scaling Constants

N set N = n1 + n2−1 (n1, n2) = (error,hypoth) d.f.

N define γ, φ from

sin2(γ/2) = (p−1

2)/N, sin2(φ/2) = (n1−1

2)/N

N Then µ and σ are given by

µp = 2 log tan(φ+ γ

2

)

, σ3p =

16

N 2

1

sin2(φ+ γ) sinφ sin γ.

N (given a table of F1), straightforward to code:

papptw , qapptw , rapptw.

NESS 4/10 – p.34

Accuracy of approximate αth percentile

Approximate percentile using Tracy-Widom percentile fα:

xα = xTWα (p, n1, n2) = eµp+fασp/(1 + eµp+fασp).

William Chen’s tables: (2002-04) ⇒ can compare

N ‘Exact’ xα(p, n1, n2) with

N T-W approx xTWα (p, n1, n2).

Relative error: r = (xTWα /xα)− 1.

Tracy-Widom based on p → ∞, but try p = 2 as n1, n2 vary

NESS 4/10 – p.35

p = 2 at 95th percentile

0 5 10 150

0.5

1

m

θα

n = 2

n = 5n = 10

n = 20

n = 40

n = 100n = 500

TWExact

0.010.01

0.01

0.02

0.02

0.02

0.02

0.030.03

0.04

0.05

0.060.070.

08

m

log 10

n

Contours of relative error r = (θαTW/θα)−1

0 5 10 150

1

2

3

NESS 4/10 – p.36

p = 4 at 90th percentile

0 5 10 150

0.5

1

m

θα

n = 2

n = 5n = 10

n = 20

n = 40

n = 100n = 500

TWExact

0.0020.0040.004

0.004

0.0060.006

0.006

0.008

0.008

0.008

0.008

0.01

0.01

0.01

0.012 0.012

0.012

0.0140.0160.018

m

log 10

n

90th %tile, S=4, Contours of relative error r = (θαTW/θα)−1

0 5 10 150

1

2

3

NESS 4/10 – p.37

Summary, for Double Wishart

T-W approximation to null distribution of Roy’s largest roo t:

N Conventional percentiles: generally accurate to < 10%

relative error

N [Rough p−value assessments: qualitatively o.k. over many

orders of magnitude]

N Similar O(N−2/3) approximations for Wishart and Gaussian.

Propose: Tracy-Widom approximation replace F− lower bound in

default package printouts

NESS 4/10 – p.38

Software: RMTstat

In development, for R, MATLAB(Z. Ma, P. Perry, M. Shahram, IMJ)

[Preliminary Rversion at CRAN]

N Tracy-Widom distribution: ptw, qtw, dtw, rtw

(Prahofer-Spohn tables + interpolation)

N TW approximation to extreme eigenvalues

N null distributions

N ’spiked’ models

N common tests based on largest root

NESS 4/10 – p.39

Outline

I. Setting



1. Gaussian


3. Jacobi/Double W.


1. Single W.

2. Double W.


NESS 4/10 – p.40

Wishart: Concentration for x1

A ∼ Wp(n, I) ⇔ A = nXTX, X ∼ N(0, In ⊗ Ip)/√n

x1(A) = nσ21(X) (singular value σ1)

Large Deviations: (e.g. Davidson-Szarek, 2001)

P{σ1(X) > Eσ1(X) + t} ≤ e−nt2/2,

Eσ1(X) ≤ 1 +√

p/n.

N Concentration of measure for Lipschitz X → σ1(X)

N not sharp in random matrix context; OK for t > 1

N numerous statistics uses, e.g. sparse regression,

Candes-Tao, Donoho

NESS 4/10 – p.41

‘Small’ vs. ‘Large’ Deviations

Small Deviations: 0 < t ≤ 1

P{σ1(XTX) ≥ (1 +

√

p/n)2(1 + t)} ≤ C1e−nt3/2/C .

Ledoux, Johansson, Feldheim-Sodin

–Appropriate to t3/2 tails of Tracy-Widom.

Statistical application:

Kritchman-Nadler, 2009 X1, . . . , Xn ∼ CNp(0,Σ) (array sensing)

Σ = σ2I + diag (λ1, . . . , λq, 0, . . . , 0)

Determine q via MDL and modified AIC.

NESS 4/10 – p.42

Double Wishart

Motivations:

N analog of Wishart deviation results (large & small)

N arises in Candes-Tao sparse linear regression, Y = Aβ + ǫ,

restricted orthogonality constants θS,S′ ,

〈ATv,AT ′v′〉 ≤ θS,S′‖v‖‖v′‖

Use CCA form: X ∼ N(0, In ⊗ Ip) ⊥⊥ Y ∼ N(0, In ⊗ Iq)

Rp,q;n(X,Y ) = max{Corr (Xu, Y v) : ‖u‖ = ‖v‖ = 1}= σ1(U

TXUY )

[via SVD: X = UXDXVTX , Y = UYDY V

TY ]

NESS 4/10 – p.43

Double Wishart bounds

Large deviations: F = σ1(UTXUY ) = R(X,Y ) is Lipschitz on

Stiefel manifold Vn,p × Vn,q ∋ (UX , UY ).

Use concentration for (Stiefel) manifolds, ( Ledoux, 1992 ):

P{σ1 ≥ Eσ1 + t} ≤ e−(n−2)t2/8.

Small deviations. (data in C)

x1 = σ21 = largest (canonical corr.) 2. ( Ledoux, 2004):

P{x1 ≥ Ex1(1 + t)} ≤ C1e−nt3/2/C , 0 < t ≤ 1.

[“ultraspherical” case α = β; ↔ n = 2q.]

NESS 4/10 – p.44

Outline

I. Setting



1. Gaussian


3. Jacobi/Double W.


1. Single W.

2. Double W.

IV. Largest Eigenvector(s), briefly

NESS 4/10 – p.45

Estimation of Eigenvectors

S ∼ Wp(n,Σ), Σ = σ2I +∑M

ν=1 λνθνθTν

Estimation and inference for θν??

Classical: p fixed, n large:√n(θν − θν) → Np(0,Γν)

BUT: inconsistency when p/n → γ > 0:

Reimann, v.d.Broeck, Bex, Hoyle, Rattray; Paul, Baik, Silv erstein

〈θν , θν〉 →

0 λν ∈ [0,√γ]

1−γ/λ2ν

1+γ/λνλν >

√γ

,

NESS 4/10 – p.46

Some motivating models

Xi =M∑

ν=1

θνfνi + σZi, i = 1, . . . , n

1. Economics: Xi = vector of stocks (indices) at time i

θν = factor loadings, fνi factors, Zi idiosyncratic terms.

2. ECG: Xi = ith heartbeat ( p samples per cycle)

θν = may be sparse in wavelet basis.

3. Microarrays: Xi = expression of p genes in ith patient.

θν = may be sparse : few genes involved in each factor.

4. Sensors: Xi = observations at sensors

θν = cols. of steering matrix, fνi signals

NESS 4/10 – p.47

Eigenvectors: Elements of an Estimation Theory

N Assume ∃ a basis with sparse representation:

θ ∈ Θq(C) : e.g. |θν,(µ)| ≤ C|µ|−1/q q < 2

=⇒ near sharp upper & lower bounds for minimax risk :

infθ

supθν∈Θq(C)

E‖θν − θν‖2. (Paul, Ma)

N Goal: Approximate by “signal in Gaussian noise” model

LEMMA (M = 1) Let C = 〈θ, θ〉 Then

θ = Cθ + SU (Paul)

N U is uniform on ” Sp−2” ⇒ nearly Gaussian .

N move from eigenvectors to sparse mean estimation.

NESS 4/10 – p.48

Closing Remark

-4 -2 0 2 4

p = # securities

eige

nval

ues

← Av. Sample eigenvalues 2−9

Av. top eigenvalue ↓

50 100 150 2000

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

N Asymptotics in p ∝ n–or (n1, n2)–and tools from RMT, yield

approximations, methods, insight

even for p quite small.

THANK YOU!

NESS 4/10 – p.49

References

THANK YOU!

IMJ, “High Dimensional Statistical Inference and Random Ma trices”, Proc ICM

2006, Vol 1, 307–333.

IMJ, “Multivariate Analysis and Jacobi Ensembles: Largest eigenvalue,

Tracy-Widom Limits and Rates of Convergence”, Ann. Statist. , 2008

IMJ, “Approximate null distribution of the largest root in m ultivariate analysis”

Ann. Applied Stat. to appear.

D. Paul and IMJ, “Sparse principal component analysis for hi gh dimensional

data”, in preparation.

Webpage: www-stat.stanford.edu/ ˜ imj

NESS 4/10 – p.50

www-stat.stanford.edu/~imj

Documents

Largest Eigenvalues and Eigenvectors in … Eigenvalues and Eigenvectors in Multivariate Analysis Iain Johnstone, Statistics, Stanford [email protected] NESS, April 17, 2010 NESS 4/10