Upload
nguyendan
View
279
Download
2
Embed Size (px)
Citation preview
Largest Eigenvalues and Eigenvectors in MultivariateAnalysis
Iain Johnstone, Statistics, Stanford
NESS, April 17, 2010
NESS 4/10 – p.1
Outline
Fisher Hotelling Wigner Tracy Widom
NESS 4/10 – p.2
Outline
Fisher Hotelling Wigner Tracy Widom
I. Setting : Single & Double Wishart, examples
II. Largest Eigenvalue: Limiting Law ( H0, some HA)
III. Largest Eigenvalue: Concentration
IV. Largest Eigenvector(s)
Thanks to: Peter Forrester, Zongming Ma, Debashis Paul, Zongming Ma, P atrick
Perry, Morteza Shahram, NSF, NIH
NESS 4/10 – p.2
Eigenvalues: Theme
N Many problems of classical multivariate statistics involv e
eigenvalues x1 > x2 > . . . > xp
N Asymptotics in p⇒ useful information, [even p small]
N Illustrate for null distributions for x1
NESS 4/10 – p.3
Eigenvalues: Theme
N Many problems of classical multivariate statistics involv e
eigenvalues x1 > x2 > . . . > xp
N Asymptotics in p⇒ useful information, [ espec. p small]
N Illustrate for null distributions for x1
Example: (multiple) Regression.
y = X β + ǫn×p n×q q×p n×p
ǫ ∼ N(0, In ⊗ Σp).
Tests of H0 : β = 0 use roots of det[A− xi(A+B)] = 0.
A = SSH = βTXTXβ “Hypothesis”
B = SSE = yT (I − PX)y “Error”
NESS 4/10 – p.4
Example: PCA & population structure from genetic data
Patterson et. al. (2006), Price et. al. (2006)
n = #individuals , p = #SNPs/markers
yij = (normalized) allele count ,
case i = 1, . . . , n, marker j = 1, . . . , p.
A = n× sample covariance matrix of yij
N Eigenvalues x1 > x2 > . . . > xmin(n,p)
N How many xi are significant?
N Under H0, distribution of x1 if A ∼ Wp(n, I)?
NESS 4/10 – p.5
Wishart Distribution
Y = (Yij) n× p
Rows yi = (Yij)indep∼ Np(µ,Σ)
Definition: (unnormalized) sample covariance
A =
n∑
i=1
(yi − y)T (yi − y) ∼ Wp(n− 1,Σ)
p variables, n− 1 degrees of freedom.
[ If µ = 0, A = Y TY ∼ Wp(n,Σ) ]
NESS 4/10 – p.6
Wishart Eigenvalues
2 draws of eigenvalues from W15(60, I)
– Spreading of sample eigenvalues from 4 to [1,9].
2 draws of 15 independent U(1, 9) variates – very different!
wresvec
wreplic
1
2
2 4 6 8
order statsorder stats
1
2
wishart-eigenvalueswishart-eigenvalues
NESS 4/10 – p.7
The Quarter Circle Law
Description of spreading phenomenon:
Empirical distribution function: for eigenvalues {ℓi}pj=1
Gp(t) = p−1#{ℓj ≤ t} → G(t) = g(t)dt.
Marcenko-Pastur, (67) For A ∼ Wp(n, I) p/n → γ
For Σ = I ,
gMP (t) =
√
(b+ − t)(t− b−)
2πγt,
b± = (1±√γ)2.
0 0.5 1 1.5 2 2.5 3 3.5 40
0.5
1
1.5
n=4pn=p
NESS 4/10 – p.8
Double Wishart Setting
A ∼ Wp(n1, I)
B ∼ Wp(n2, I)
2 independent Wisharts, p ≤ n1, n2
“null hypothesis” setting
Common feature: roots := (xi)pi=1 of generalized eigenproblem:
det[A− xi(A+B)] = 0, [Double]
det[A− xiI] = 0. [Single]
Single Wishart
N Principal Component analysis
N Factor analysis
N Multidimensional scaling
Double Wishart
N Canonical correlation analysis
N Multivariate Analysis of Variance
(MANOVA)
N Multivarate regression analysis
N Discriminant analysis
N Tests of equality of covariance matrices
NESS 4/10 – p.9
Joint density of eigenvalues
Single Wishart: det[A− xiI] = 0.
Double Wishart: det[A− xi(A+B)] = 0.
In each case, ( Fisher, Girshick, Hsu, Mood, Roy, (1939) ):
f(x1, . . . , xp) = c∏
i
w1/2(xi)∏
i<j
(xi−xj) x1 ≥ . . . ≥ xp
Single Wishart: w(x) = xn−pe−x, (Laguerre )
Double Wishart: w(x) = xp−q−1(1− x)n−p−q−1. (Jacobi )
[Gaussian: w(x) = e−x2/2. (Hermite )]
NESS 4/10 – p.10
Joint density of eigenvalues [R,C]
Single Wishart: det[A− xiI] = 0.
Double Wishart: det[A− xi(A+B)] = 0.
In each case, ( Fisher, Girshick, Hsu, Mood, Roy, (1939) ):
f(x1, . . . , xp) = c∏
i
wβ/2(xi)∏
i<j
(xi−xj)β x1 ≥ . . . ≥ xp
β = 1(R),= 2(C)
Single Wishart: w(x) = xn−pe−x, (Laguerre )
Double Wishart: w(x) = xp−q−1(1− x)n−p−q−1. (Jacobi )
[Gaussian: w(x) = e−x2/2. (Hermite )]
NESS 4/10 – p.11
Focus on Largest Eigenvalue(s)
N target of methods, e.g. PCA, CCA
N Roy’s Union-Intersection approach
–leads to largest root tests
N conservative tests via sequential approach
–having rejected xk, test xk+1 using largest root
N smallest eigenvalues
N RMT brings new tools, results
NESS 4/10 – p.12
Outline
I. Setting
1. Single & Double Wishart, examples
II. Largest Eigenvalue: Limiting Law ( H0)
1. Gaussian
2. Laguerre/Single W.
3. Jacobi/Double W.
III. Largest Eigenvalue: Concentration
1. Single W.
2. Double W.
IV. Largest Eigenvector(s)
NESS 4/10 – p.13
Gaussian eigenvalues vs. order statistics
Differing scales for five largest of:
a) eigenvalues from GUE(200), b) 200 N(0, 1) variates.
Largest: 19.5 ± 0.3 2.6 ± 0.4
resvec
replic
1
2
3
4
5
6
18.0 18.5 19.0 19.5 20.0
eigenvalueseigenvalues
2.0 2.2 2.4 2.6 2.8
order statsorder stats
NESS 4/10 – p.14
Symmetric Gaussian matrices
A – symmetric p× p matrix drawn from ( β = 1 for GOE)
f(A) = c exp{−β2
tr(ATA)}.
[for GUE: complex Hermitian]. Largest eigenvalue:
θp = λmax(A)
Centering and scaling constants:
µp =√
2p, σp = 2−1/2p−1/6 ratio p−2/3!
Tracy-Widom limit: TW (1994, 1996) For β = 1, 2, 4 (R,C,Q),
θp − µp
σp
D⇒ Fβ.
NESS 4/10 – p.15
Tracy-Widom distributions (1994,96)
-4 -2 0 2 4
q′′ = sq + 2q3 (Painlev e II)
q(s) ∼ Ai(s) as s → ∞
F2(s) = e−∫∞
s (x−s)2q(x)dx
F1(s)2 = F2(s)e
−∫∞
s q(x)dx.
Right tail decay: 1− Fβ(s) ≈ e−(2β/3)s3/2 .
Rate of convergence: First order p−1/3, Choup (2006,07)
“Second-order”: J + Ma (2008) For β = 1, 2, & µp(β=1)=
√2p−1,
|P{λmax(A) ≤ µp + σps} − Fβ(s)| ≤ Cp−2/3e−s/2.
NESS 4/10 – p.16
Approximations at p = 2
−8 −6 −4 −2 0 2 4 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35Density Plot: n =2
x
f(x)
TW1empirical
−6 −4 −2 0 2 4 6
−6
−4
−2
0
2
4
6
Probability Plot: n =2
TW percentiles
simula
ted pe
rcenti
les
– Better approximation in right tail : location of turning po int of
Airy function
– additional O(p−4/3) mean correction included
NESS 4/10 – p.17
Approximations at p = 10
−8 −6 −4 −2 0 2 4 60
0.05
0.1
0.15
0.2
0.25
0.3
0.35Density Plot: n =10
x
f(x)
TW1empirical
−6 −4 −2 0 2 4 6
−6
−4
−2
0
2
4
6
Probability Plot: n =10
TW percentiles
simula
ted pe
rcenti
les
NESS 4/10 – p.18
Outline
I. Setting
1. Single & Double Wishart, examples
II. Largest Eigenvalue: Limiting Law ( H0)
1. Gaussian
2. Laguerre/Single W.
3. Jacobi/Double W.
III. Largest Eigenvalue: Concentration
1. Single W.
2. Double W.
IV. Largest Eigenvector(s)
NESS 4/10 – p.19
Wishart: Tracy Widom Limits
Theorem For {real, complex }, {single, double } Wishart matrices,
if p/n → γ, [or (p/n1, p/n2) → (γ1, γ2),] then
P{nx1 ≤ µnp + σnps|H0} → Fβ(s)
– “universality” in RMT (Deift, 06 ICM)
Single Wishart: β = 2 (Johansson, 00) , β = 1 (J, 01)
“Microarray case” n, p → ∞, n/p → 0,∞. (El Karoui, 03)
Non Gaussian: (Soshnikov, 02, 06; S. + Fyodorov, 06, P eche, 07)
N symm subGaussian: ⇒ TW limit; heavy tails ⇒ Poisson
NESS 4/10 – p.20
Second order accuracy
For {real, complex }, {single, double } Wishart matrices, if
p/n → γ, [or (p/n1, p/n2) → (γ1, γ2),] then
|P{nx1 ≤ µnp + σnps|H0} − Fβ(s)| ≤ Ce−csp−2/3.
Single Wishart Complex: El Karoui (2006) .
Real: Ma (2008)
µnp =(√
n−12+√
p−12
)2
σnp =(√
n−12+√
p−12
)
(
1√n−1
2
+1√p−1
2
)1/3
.
NESS 4/10 – p.21
Beyond the “Null Hypothesis”
Classical RMT ensembles (e.g. Wp(n, I))
↔ “null hypothesis”, symmetry, no structure.
For Wp(n,Σ): For what conditions on Σ does
P{x1 ≤ µnp(Σ) + σnp(Σ)s} → Fβ(s) ??
Some answers:
N sufficiently many ℓk(Σ) accumulate near ℓ1(Σ) (data in C)
El Karoui, 2007
N small number of (not too!) isolated ℓi(Σ)Baik-Ben Arous-P eche, 2005, Baik-Silverstein 2006, Paul 2007
Harding (2008, Economics Letters)
NESS 4/10 – p.22
Finite rank perturbations: heuristics
“Spiked” model: Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1)
ℓ1 ≥ ℓ2 ≥ · · · ≥ ℓM ≥ 1, M fixed as p ր ∞, p/n → γ.
Σ = Ifluctuation
Tracy-Widom
Marcenko-Pastur density for bulk
NESS 4/10 – p.23
Finite rank perturbations: heuristics
“Spiked” model: Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1)
ℓ1 ≥ ℓ2 ≥ · · · ≥ ℓM ≥ 1, M fixed as p ր ∞, p/n → γ.
Σ = Ifluctuation
Tracy-Widom
Marcenko-Pastur density for bulk
ℓ1 ≫ 1
M = 1 .near Gaussian
fluctuation
Marcenko-Pastur density for bulk
NESS 4/10 – p.23
Finite rank model: phase transition
Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.
Interior point transition at ℓ1 = 1 +√γ:
Baik-Ben Arous-Peche, Paul,
Baik-Silverstein
.Tracy-Widom
fluctuation
Critical point:
NESS 4/10 – p.24
Finite rank model: phase transition
Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.
Interior point transition at ℓ1 = 1 +√γ:
Baik-Ben Arous-Peche, Paul,
Baik-Silverstein
.Tracy-Widom
fluctuation
Critical point:
NESS 4/10 – p.24
Finite rank model: phase transition
Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.
Interior point transition at ℓ1 = 1 +√γ:
Baik-Ben Arous-Peche, Paul,
Baik-Silverstein
.Tracy-Widom
fluctuation
Critical point:
NESS 4/10 – p.24
Finite rank model: phase transition
Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.
Interior point transition at ℓ1 = 1 +√γ:
Baik-Ben Arous-Peche, Paul,
Baik-Silverstein
Gaussian
Critical point:
.
NESS 4/10 – p.24
Finite rank model: phase transition
Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.
Interior point transition at ℓ1 = 1 +√γ:
Baik-Ben Arous-Peche, Paul,
Baik-Silverstein
Gaussian
Critical point:
.
NESS 4/10 – p.24
Finite rank model: phase transition
Σ = diag (ℓ1, . . . , ℓM , 1, . . . , 1) p/n → γ.
Interior point transition at ℓ1 = 1 +√γ:
Baik-Ben Arous-Peche, Paul,
Baik-Silverstein
Critical point:
fluctuationTracy-Widom
.
Gaussian
Critical point:
.
NESS 4/10 – p.25
Recent example: economics
How many factors are present in security returns? Use PCA??
S.J. Brown (1989) simulations, calibrated to NYSE data
4 factor model* → Σ = diag ( ℓ1, . . . , ℓ4, σ2e , . . . , σ
2e)
ℓ1 > ℓ2 = ℓ3 = ℓ4 > σ2e
Goal: Use PCA to estimate ℓ1, . . . , ℓ4.
Empirical puzzle (Brown, 1989):
many sample eigenvalues swamp ℓ2, ℓ3, ℓ4.
Rit = Σ4
k=1bikfkt + eit; i = 1, . . . , p securities ; t = 1, . . . , T times .(*)
bik ∼ N(β, σ2
b ); fkt ∼ N(0, σ2
f ); eit ∼ N(0, σ2
e) all independent
NESS 4/10 – p.26
Recent example: economics
How many factors are present in security returns? Use PCA??
S.J. Brown (1989) simulations, calibrated to NYSE data
4 factor model* → Σ = diag ( ℓ1, . . . , ℓ4, σ2e , . . . , σ
2e)
ℓ1 > ℓ2 = ℓ3 = ℓ4 > σ2e
Goal: Use PCA to estimate ℓ1, . . . , ℓ4.
Empirical puzzle (Brown, 1989):
many sample eigenvalues swamp ℓ2, ℓ3, ℓ4.
Explanation (Harding, 2007):
ℓ2, ℓ3, ℓ4 are below the 1 +√γ phase transition.
Rit = Σ4
k=1bikfkt + eit; i = 1, . . . , p securities ; t = 1, . . . , T times .(*)
bik ∼ N(β, σ2
b ); fkt ∼ N(0, σ2
f ); eit ∼ N(0, σ2
e) all independent
NESS 4/10 – p.26
Population values
50 100 150 2000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
p = # securities
eige
nval
ues
← Population eigenvalue 1
↓ Population eigenvalues 2−4 ↓ noise σ
e2
NESS 4/10 – p.27
Brown(1989) plot
p = # securities
eige
nval
ues
← Av. Sample eigenvalues 2−9
Av. top eigenvalue ↓
50 100 150 2000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
NESS 4/10 – p.28
Marcenko-Pastur & phase transition
p = # securities
eige
nval
ues
↓ Marcenko−Pastur limit
↓ Phase transition
50 100 150 2000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
Source: Harding(2007).
NESS 4/10 – p.29
Outline
I. Setting
1. Single & Double Wishart, examples
II. Largest Eigenvalue: Limiting Law ( H0)
1. Gaussian
2. Laguerre/Single W.
3. Jacobi/Double W.
III. Largest Eigenvalue: Concentration
1. Single W.
2. Double W.
IV. Largest Eigenvector(s)
NESS 4/10 – p.30
Roy’s Greatest Root Test: Setting
Example: (multiple) Regression.
y = X β + ǫn×p n×q q×p n×p
ǫ ∼ N(0, In ⊗ Σp).
Tests of H0 : β = 0 use roots of det[A− xi(A+B)] = 0.
A = SSH = βTXTXβ “Hypothesis”
B = SSE = yT (I − PX)y “Error”
Roy’s greatest root test (union-intersection test):
Reject H0 if x1 > x1,α.
NESS 4/10 – p.31
Roy’s Greatest Root Test: Package Output
SAS: (From Gledhill et. al.: NOAA Fisheries Reef Fish Video Surve ys.)
!"#$%&'()&*+$,-."/-",%&0,",-0,-10&"23&4&"55/67-8",-620&96/&,:%&;<=>?@;&
5/61%3+/%&/"2&A-,:&"$$&8-216+2,&"23&:"#-,",&3",")!
"#$!!!!%#&!!!!'#()*$!
"+,+-.+-/! 0,123 4!0,123 '25!64 637!64 89!:!4!
;-1<.=!>,5?@,! B)C('BD'DE ')'C CB FCF)(G B)HIHC&
8-11,-=.!A9,/3! B)CB''CCIB ')'C CB C'C B)HIFH&
BC+311-7DE>,F13G!A9,/3! B)CDBIECB' ')'C CB IBC)(H B)HICC&
HCG=.!I93,+3.+!HCC+! B)HEGCD(HB H)GE 'B 'BI B)BBFG&
'JAKL!4!"+,+-.+-/!MC9!HCG=.!I93,+3.+!HCC+!-.!,7!2NN39!?C27@*!
R: (car, heplots packages, Fox et. al.)Multivariate Tests:
Df test stat approx F num Df den Df Pr(>F)
Pillai 5.0000 0.417938 1.845226 15.0000 171.0000 0.0320861 *
Wilks 5.0000 0.623582 1.893613 15.0000 152.2322 0.0276949 *
Hotelling-Lawley 5.0000 0.538651 1.927175 15.0000 161.0000 0.0239619 *
Roy 5.0000 0.384649 4.384997 5.0000 57.0000 0.0019053 **
---
Signif. codes: 0 ’***’ 0.001 ’**’ 0.01 ’*’ 0.05 ’.’ 0.1 ’ ’ 1
NESS 4/10 – p.32
Tracy-Widom approximation for logit
Let W = logit (x1) = log(x1/(1− x1)).
Result: [J,08] As p ∝ n1, n2 → ∞, (and with O(p−2/3) error):
W − µp
σp
D⇒ W∞ ∼ F1.
In other words:x1 ≈
eµ+σW∞
1 + eµ+σW∞
.
Percentiles fα:
f.90 = 0.4501
f.95 = 0.9793
f.99 = 2.0234-4 -2 2 4
0.05
0.1
0.15
0.2
0.25
0.3
NESS 4/10 – p.33
Centering and Scaling Constants
N set N = n1 + n2−1 (n1, n2) = (error,hypoth) d.f.
N define γ, φ from
sin2(γ/2) = (p−1
2)/N, sin2(φ/2) = (n1−1
2)/N
N Then µ and σ are given by
µp = 2 log tan(φ+ γ
2
)
, σ3p =
16
N 2
1
sin2(φ+ γ) sinφ sin γ.
N (given a table of F1), straightforward to code:
papptw , qapptw , rapptw.
NESS 4/10 – p.34
Accuracy of approximate αth percentile
Approximate percentile using Tracy-Widom percentile fα:
xα = xTWα (p, n1, n2) = eµp+fασp/(1 + eµp+fασp).
William Chen’s tables: (2002-04) ⇒ can compare
N ‘Exact’ xα(p, n1, n2) with
N T-W approx xTWα (p, n1, n2).
Relative error: r = (xTWα /xα)− 1.
Tracy-Widom based on p → ∞, but try p = 2 as n1, n2 vary
NESS 4/10 – p.35
p = 2 at 95th percentile
0 5 10 150
0.5
1
m
θα
n = 2
n = 5n = 10
n = 20
n = 40
n = 100n = 500
TWExact
0.010.01
0.01
0.02
0.02
0.02
0.02
0.030.03
0.04
0.05
0.060.070.
08
m
log 10
n
Contours of relative error r = (θαTW/θα)−1
0 5 10 150
1
2
3
NESS 4/10 – p.36
p = 4 at 90th percentile
0 5 10 150
0.5
1
m
θα
n = 2
n = 5n = 10
n = 20
n = 40
n = 100n = 500
TWExact
0.0020.0040.004
0.004
0.0060.006
0.006
0.008
0.008
0.008
0.008
0.01
0.01
0.01
0.012 0.012
0.012
0.0140.0160.018
m
log 10
n
90th %tile, S=4, Contours of relative error r = (θαTW/θα)−1
0 5 10 150
1
2
3
NESS 4/10 – p.37
Summary, for Double Wishart
T-W approximation to null distribution of Roy’s largest roo t:
N Conventional percentiles: generally accurate to < 10%
relative error
N [Rough p−value assessments: qualitatively o.k. over many
orders of magnitude]
N Similar O(N−2/3) approximations for Wishart and Gaussian.
Propose: Tracy-Widom approximation replace F− lower bound in
default package printouts
NESS 4/10 – p.38
Software: RMTstat
In development, for R, MATLAB(Z. Ma, P. Perry, M. Shahram, IMJ)
[Preliminary Rversion at CRAN]
N Tracy-Widom distribution: ptw, qtw, dtw, rtw
(Prahofer-Spohn tables + interpolation)
N TW approximation to extreme eigenvalues
N null distributions
N ’spiked’ models
N common tests based on largest root
NESS 4/10 – p.39
Outline
I. Setting
1. Single & Double Wishart, examples
II. Largest Eigenvalue: Limiting Law ( H0)
1. Gaussian
2. Laguerre/Single W.
3. Jacobi/Double W.
III. Largest Eigenvalue: Concentration
1. Single W.
2. Double W.
IV. Largest Eigenvector(s)
NESS 4/10 – p.40
Wishart: Concentration for x1
A ∼ Wp(n, I) ⇔ A = nXTX, X ∼ N(0, In ⊗ Ip)/√n
x1(A) = nσ21(X) (singular value σ1)
Large Deviations: (e.g. Davidson-Szarek, 2001)
P{σ1(X) > Eσ1(X) + t} ≤ e−nt2/2,
Eσ1(X) ≤ 1 +√
p/n.
N Concentration of measure for Lipschitz X → σ1(X)
N not sharp in random matrix context; OK for t > 1
N numerous statistics uses, e.g. sparse regression,
Candes-Tao, Donoho
NESS 4/10 – p.41
‘Small’ vs. ‘Large’ Deviations
Small Deviations: 0 < t ≤ 1
P{σ1(XTX) ≥ (1 +
√
p/n)2(1 + t)} ≤ C1e−nt3/2/C .
Ledoux, Johansson, Feldheim-Sodin
–Appropriate to t3/2 tails of Tracy-Widom.
Statistical application:
Kritchman-Nadler, 2009 X1, . . . , Xn ∼ CNp(0,Σ) (array sensing)
Σ = σ2I + diag (λ1, . . . , λq, 0, . . . , 0)
Determine q via MDL and modified AIC.
NESS 4/10 – p.42
Double Wishart
Motivations:
N analog of Wishart deviation results (large & small)
N arises in Candes-Tao sparse linear regression, Y = Aβ + ǫ,
restricted orthogonality constants θS,S′ ,
〈ATv,AT ′v′〉 ≤ θS,S′‖v‖‖v′‖
Use CCA form: X ∼ N(0, In ⊗ Ip) ⊥⊥ Y ∼ N(0, In ⊗ Iq)
Rp,q;n(X,Y ) = max{Corr (Xu, Y v) : ‖u‖ = ‖v‖ = 1}= σ1(U
TXUY )
[via SVD: X = UXDXVTX , Y = UYDY V
TY ]
NESS 4/10 – p.43
Double Wishart bounds
Large deviations: F = σ1(UTXUY ) = R(X,Y ) is Lipschitz on
Stiefel manifold Vn,p × Vn,q ∋ (UX , UY ).
Use concentration for (Stiefel) manifolds, ( Ledoux, 1992 ):
P{σ1 ≥ Eσ1 + t} ≤ e−(n−2)t2/8.
Small deviations. (data in C)
x1 = σ21 = largest (canonical corr.) 2. ( Ledoux, 2004):
P{x1 ≥ Ex1(1 + t)} ≤ C1e−nt3/2/C , 0 < t ≤ 1.
[“ultraspherical” case α = β; ↔ n = 2q.]
NESS 4/10 – p.44
Outline
I. Setting
1. Single & Double Wishart, examples
II. Largest Eigenvalue: Limiting Law ( H0)
1. Gaussian
2. Laguerre/Single W.
3. Jacobi/Double W.
III. Largest Eigenvalue: Concentration
1. Single W.
2. Double W.
IV. Largest Eigenvector(s), briefly
NESS 4/10 – p.45
Estimation of Eigenvectors
S ∼ Wp(n,Σ), Σ = σ2I +∑M
ν=1 λνθνθTν
Estimation and inference for θν??
Classical: p fixed, n large:√n(θν − θν) → Np(0,Γν)
BUT: inconsistency when p/n → γ > 0:
Reimann, v.d.Broeck, Bex, Hoyle, Rattray; Paul, Baik, Silv erstein
〈θν , θν〉 →
0 λν ∈ [0,√γ]
1−γ/λ2ν
1+γ/λνλν >
√γ
,
NESS 4/10 – p.46
Some motivating models
Xi =M∑
ν=1
θνfνi + σZi, i = 1, . . . , n
1. Economics: Xi = vector of stocks (indices) at time i
θν = factor loadings, fνi factors, Zi idiosyncratic terms.
2. ECG: Xi = ith heartbeat ( p samples per cycle)
θν = may be sparse in wavelet basis.
3. Microarrays: Xi = expression of p genes in ith patient.
θν = may be sparse : few genes involved in each factor.
4. Sensors: Xi = observations at sensors
θν = cols. of steering matrix, fνi signals
NESS 4/10 – p.47
Eigenvectors: Elements of an Estimation Theory
N Assume ∃ a basis with sparse representation:
θ ∈ Θq(C) : e.g. |θν,(µ)| ≤ C|µ|−1/q q < 2
=⇒ near sharp upper & lower bounds for minimax risk :
infθ
supθν∈Θq(C)
E‖θν − θν‖2. (Paul, Ma)
N Goal: Approximate by “signal in Gaussian noise” model
LEMMA (M = 1) Let C = 〈θ, θ〉 Then
θ = Cθ + SU (Paul)
N U is uniform on ” Sp−2” ⇒ nearly Gaussian .
N move from eigenvectors to sparse mean estimation.
NESS 4/10 – p.48
Closing Remark
-4 -2 0 2 4
p = # securities
eige
nval
ues
← Av. Sample eigenvalues 2−9
Av. top eigenvalue ↓
50 100 150 2000
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
N Asymptotics in p ∝ n–or (n1, n2)–and tools from RMT, yield
approximations, methods, insight
even for p quite small.
THANK YOU!
NESS 4/10 – p.49
References
THANK YOU!
IMJ, “High Dimensional Statistical Inference and Random Ma trices”, Proc ICM
2006, Vol 1, 307–333.
IMJ, “Multivariate Analysis and Jacobi Ensembles: Largest eigenvalue,
Tracy-Widom Limits and Rates of Convergence”, Ann. Statist. , 2008
IMJ, “Approximate null distribution of the largest root in m ultivariate analysis”
Ann. Applied Stat. to appear.
D. Paul and IMJ, “Sparse principal component analysis for hi gh dimensional
data”, in preparation.
Webpage: www-stat.stanford.edu/ ˜ imj
NESS 4/10 – p.50