Upload
builiem
View
255
Download
4
Embed Size (px)
A Tutorial on
Multivariate
Statistical Analysis
Craig A. Tracy
UC Davis
SAMSI
September 2006
1
ELEMENTARY STATISTICS
Collection of (real-valued) data from a sequence of experiments
X1, X2, . . . , Xn
Might make assumption underlying law is N(µ, σ2) with unknown
mean µ and variance σ2. Want to estimate µ and σ2 from the data.
Sample Mean & Sample Variance:
X =1
n
∑
j
Xj , S =1
n− 1
∑
j
(
Xj − X)2
Estimators are “unbiased”
E(X) = µ, E(S) = σ2
2
Theorem: If X1, X2, . . . are independent N(µ, σ2) variables then
X and S are independent. We have that X is N(µ, σ2/n) and
(n− 1)S/σ2 is χ2(n− 1).
Recall χ2(d) denotes the chi-squared distribution with d degrees of
freedom. Its density is
fχ2(x) =1
2d/2 Γ(d/2)xd/2−1 e−x/2 , x ≥ 0,
where
Γ(z) =
∫ ∞
0
tz−1 e−t dt, ℜ(z) > 0.
3
MULTIVARIATE GENERALIZATIONS
From the classic textbook of Anderson[1]:
Multivariate statistical analysis is concerned with data that
consists of sets of measurements on a number of individuals
or objects. The sample data may be heights and weights of
some individuals drawn randomly from a population of
school children in a given city, or the statistical treatment
may be made on a collection of measurements, such as
lengths and widths of petals and lengths and widths of
sepals of iris plants taken from two species, or one may
study the scores on batteries of mental tests administered
to a number of students.
p = # of sets of measurements on a given individual,
n = # of observations = sample size
4
Remarks:
• In above examples, one can assume that p≪ n since typically
many measurements will be taken.
• Today it is common for p≫ 1, so n/p is no longer necessarily
large.
Vehicle Sound Signature Recognition: Vehicle noise is a
stochastic signal. The power spectrum is discretized to a
vector of length p = 1200 with n ≈ 1200 samples from the
same kind of vehicle.
Astrophysics: Sloan Digital Sky Survey typically has many
observations (say of quasar spectrum) with the spectra of
each quasar binned resulting in a large p.
Financial data: S&P 500 stocks observed over monthly
intervals for twenty years.
5
GAUSSIAN DATA MATRICES
The data are now n independent column vectors of length p
~x1, ~x2, . . . , ~xn
from which we construct the n× p data matrix
X =
←− ~xT1 −→
←− ~xT2 −→
...
←− ~xTn −→
The Gaussian assumption is that
~xj ∼ Np(µ,Σ)
Many applications assume the mean has been already substracted
out of the data, i.e. µ = 0.
6
Multivariate Gaussian Distribution
If x and y are vectors, the matrix x⊗ y is defined by
(x⊗ y)jk = xjyk
If µ = E(x) is the mean of the random vector x, then the
covariance matrix of x is the p× p matrix
Σ = E [(x− µ)⊗ (x− µ)]]
Σ is a symmetric, non-negative definite matrix. If Σ > 0 (positive
definite) and X ∼ Np(µ,Σ), then the density function of X is
fX(x) = (2π)−p/2(detΣ)−1/2 exp
[
−1
2
(
x− µ,Σ−1(x− µ)
]
, > x ∈ Rp
Sample mean:
x =1
n
∑
j
~xj , E(x) = µ
7
Sample covariance matrix:
S =1
n− 1
n∑
j=1
(~xj − x)⊗ (~xj − x)
For µ = 0 the sample covariance matrix can be written simply as
1
n− 1XTX
Some Notation: If X is a n× p data matrix formed from the n
independent column vectors xj , cov(xj) = Σ, we can form one
column vector vec(X) of length pn
vec(X) =
x1
x2
...
xn
8
The covariance of vec(X) is the np× np matrix
In ⊗ Σ =
Σ 0 0 · · · 0
0 Σ 0 · · · 0...
...... · · ·
...
0 0 0 · · · Σ
In this case we say the data matrix X constructed from n
independent xj ∼ Np(µ,Σ) has distribution
Np(M, In ⊗ Σ)
where M = E(X) = 1⊗ µ, 1 is the column vector of all 1’s.
9
WISHART DISTRIBUTION
Definition: If A = XTX where the n× p matrix X is
Np(0, In ⊗ Σ), Σ > 0, then A is said to have Wishart distribution
with n degrees of freedom and covariance matrix Σ. We will say A
is Wp(n,Σ).
Remarks:
• The Wishart distribution is the multivariate generalization of
the chi-squared distribution.
• A ∼Wp(n,Σ) is positive definite with probability one if and
only if n ≥ p.• The sample covariance matrix,
S =1
n− 1A
is Wp(n− 1, 1n−1 Σ).
10
WISHART DENSITY FUNCTION, n ≥ p
Let Sp denote the space of p× p positive definite (symmetric)
matrices. If A = (ajk) ∈ Sp, let
(dA) = volume element of A =∧
j≤k
dajk
The multivariate gamma function is
Γp(a) =
∫
Sp
e−tr(A) (detA)a−(p+1)/2 (dA),ℜ(a) > (p− 1)/2.
Theorem: If A is Wp(n,Σ) with n ≥ p, then the density function
of A is
1
2np Γp(n/2) (detΣ)n/2e−
12 tr(Σ−1A) (detA)
(n−p−1)/2
11
Sketch of Proof:
• The density function for X is the multivariate Gaussian
(including volume element (dX))
(2π)−np/2 (detΣ)−n/2
e−12 tr(Σ−1XT X) (dX)
• Recall the QR factorization [8]: Let X denote an n× p matrix
with n ≥ p with full column rank. Then there exists an unique
n× p matrix Q, QTQ = Ip, and an unique n× p upper
triangular matrix R with positive diagonal elements so that
X = QR. Note A = XTX = RTR.
• A Jacobian calculation [1, 13]: If A = XTX , then
(dX) = 2−p (detA)(n−p−1)/2 (dA)(QTdQ)
12
where
(QTdQ) =
p∧
j=1
n∧
k=j+1
qTk dqj
and Q = (q1, . . . , qp) is the column representation of Q.
• Thus the joint distribution of A and Q is
(2π)−np/2 (detΣ)−n/2 e−12 tr(Σ−1A)×
2−p (detA)(n−p−1)/2
(dA)(QT dQ)
• Now integrate over all Q. Use fact that
∫
Vn,p
(QT dQ) =2pπnp/2
Γp(n/2)
and Vn,p is the set of real n× p matrices Q satisfying
QTQ = Ip. (When n = p this is the orthogonal group.)
13
Remarks regarding the Wishart density function
• Case p = 2 obtain by R. A. Fisher in 1915.
• General p by J. Wishart in 1928 by geometrical arguments.
• Proof outlined above came later. (See [1, 13] for complete
proof.)
• When Q is a p× p orthogonal matrix
Γp(p/2)
2pπp2/2
(
QT dQ)
is normalized Haar measure for the orthogonal group O(p). We
denote this Haar measure by (dQ).
• Siegel proved (see, e.g. [13])
Γp(a) = πp(p−1)/4
p∏
j=1
Γ
(
a− 1
2(j − 1)
)
14
EIGENVALUES OF A WISHART MATRIX
Theorem: If A is Wp(n,Σ) with n ≥ p the joint density function
for the eigenvalues ℓ1,. . . , ℓp of A is
πp2/2 2−np/2 (detΣ)−n/2
Γp(p/2) Γp(n/2)
p∏
j=1
ℓ(n−p−1)/2j
∏
j<k
|ℓj − ℓk| ×
∫
O(p)
e−12 tr(Σ−1QLQT ) (dQ), (λ1 > · · · > λp)
where L = diag(ℓ1, . . . , ℓp) and (dQ) is normalized Haar measure.
Note that ∆(ℓ) :=∏
j<k(ℓj − ℓk) is the Vandermonde.
Corollary: If A is Wp(n, Ip), then the integral over the orthogonal
group in the previous theorem is
e−12
P
j ℓj .
15
Proof: Recall that the Wishart density function (times the volume
element) is
1
2np Γp(n/2) (detΣ)n/2e−
12 tr(Σ−1A) (detA)(n−p−1)/2 (dA)
The idea is to diagonalize A by an orthogonal transformation and
then integrate over the orthogonal group thereby giving the density
function for the eigenvalues of A.
Let ℓ1 > · · · > ℓp be the ordered eigenvalues of A.
A = QLQT , L = diag(ℓ1, . . . , ℓp), Q ∈ O(p)
The jth column of Q is a normalized eigenvector of A. The
transformation is not 1–1 since Q = [±q1, . . . ,±qp] works for each
fixed A. The transformation is made 1–1 by requiring that the 1st
element of each qj is nonnegative. This restricts Q (as A varies) to
a 2−p part of O(p). We compensate for this at the end.
16
We need an expression for the volume element (dA) in terms of Q
and L. First we compute the differential of A
dA = dQLQT +QdLQT +QLdQT
QT dAQ = QT dQL+ dL+ LdQT Q
= −dQtQdL+ LdQT Q+ dL
=[
L, dQTQ]
+ dL
(We used QTQ = I implies QT dQ = −dQT Q.)
We now use the following fact (see, e.g., page 58 in [13]): If
X = BY BT where X and Y are p× p symmetric matrices, B is a
nonsingular p× p matrix, then (dX) = (detB)p+1(dY ). In our case
Q is orthogonal so the volume element (dA) equals the volume
element (QT dAQ). The volume element is the exterior product of
the diagonal elements of QT dAQ times the exterior product of the
elements above the diagonal.
17
Since L is diagonal, the commutator[
L, dQTQ]
has zero diagonal elements. Thus the exterior product of the
diagonal elements of QT dAQ is∧
j dℓj .
The exterior product of the elements coming from the commutator
is∏
j<k
(ℓj − ℓk)∧
j<k
qTk dqj
and so(dA) =
∧
j<k
qTk dqj ∆(ℓ)
∧
j
dℓj
=2pπp2/2
Γp(p/2)(dQ) ∆(ℓ)
∧
j
dℓj
The theorem now follows once integrate over all of O(p) and divide
the result by 2p.
18
• One is interested in limit laws as n, p→∞. For Σ = Ip,
Johnstone [11] proved, using RMT methods, for centering and
scaling constants
µnp =(√n− 1 +
√p)2,
σnp =(√n− 1 +
√p)
(
1√n− 1
+1√p
)1/3
thatℓ1 − µnp
σnp
converges in distribution as n, p→∞, n/p→ γ <∞, to the
GOE largest eigenvalue distribution [15].
• El Karoui [6] has extended the result to γ ≤ ∞. The case
p≫ n appears, for example, in microarray data.
• Soshnikov [14] has lifted Gaussian assumption under the
additional restriction n− p = O(p1/3).
19
• For Σ 6= Ip, the difficulty in establishing limit theorms comes
from the integral∫
O(p)
e−12 tr(Σ−1QΛQT ) (dQ)
Using zonal polynomials infinite series expansions have been
derived for this integral, but these expansions are difficult to
analyze. See Muirhead [13].
• For complex Gaussian data matrices X similar density formulas
are known for the eigenvalues of X∗X . Limit theorems for
Σ 6= Ip are known since the analogous group integral, now over
the unitary group, is known explicitly—the Harish
Chandra–Itzykson–Zuber (HCIZ) integral (see, e.g. [17]). See
the work of Baik, Ben Arous and Peche [2, 3] and El Karoui [7].
20
PRINCIPAL COMPONENT ANALYSIS (PCA),
H. Hotelling, 1933
Population Principal Components: Let x be a p× 1 random
vector with E(x) = µ and cov(x) = Σ > 0. Let λ1 ≥ λ2 ≥ · · · ≥ λp
denote the eigenvalues of Σ and H an orthogonal matrix
diagonalizing Σ: HT ΣH = Λ = diag(λ1, . . . , λp). We write H in
column vector form
H = [h1, . . . , hp]
so that hj is the p× 1 eigenvector of Σ corresponding to eigenvalue
λj . Define the p× 1 vector
u = HTx = (u1, . . . , up)T
then cov(u) = E(
(HTx−HTµ)⊗ (HTx−HTµ))
= HTE ((x− µ)⊗ (x− µ))H
= HT ΣH = Λ ⇒ uj uncorrelated.
21
Definition: uj is called the jth principal component of x. Note
var(uj) = λj .
Statistical interpretations: The claim is that u1 is that linear
combination of components of x that has maximum variance.
Proof: For simplicity of notation, set µ = 0. Let b denote any p× 1
vector, bT b = 1, and form bTx.
var(bTx) = E(
bTx · bTx)
= E(
bTx · (bTx)T)
= bT E(xxT )b = bT Σ b.
We want to maximize the right hand side subject to the constraint
bT b = 1. By the method of Lagrange multipliers we maximize
bT Σ b− λ(bT b− 1)
Since Σ is symmetric the vector of partial derivatives is
2Σ b− 2λb
Thus b must be an eigenvector with eigenvalue λ.
22
The largest variance corresponds to choosing the largest eigenvalue.
The general result is that ur has maximum variance of all
normalized combinations uncorrelated with u1,. . . , ur−1.
Sample principal components: Let S denote the sample
covariance matrix of the data matrix X and let Q = [q1, . . . , qp] a
p× p orthogonal matrix diagonalizing S:
QT S Q = diag(ℓ1, . . . , ℓp)
The ℓj are the sample variances that are estimates for λj . The
vectors qj are sample estimates for the vectors hj .
If x is the random vector and u = HTx is the vector of principal
components, then u = QTx is the vector of sample principal
components.
23
SCREE PLOTS
In applications: How many of the ℓj ’s are significant?
24
CANONICAL CORRELATION ANALYSIS (CCA)
H. Hotelling, 1936
Suppose a large data set is naturally decomposed into two groups.
For example, p× 1 random vectors ~x1,. . . , ~xn make up one set and
q × 1 random vectors ~y1,. . . , ~ym the other. We are interested in the
correlations between these two data sets. For example, in medicine
we might have n measurements of age, height, and weight (p = 3)
and m measurements of systolic and diastolic blood pressures
(q = 2). We are interested in what combination of the components
of x is most correlated with a combination of the components of y.
Population Canonical Correlations: Let x and y be two
random vectors of size p× 1 and q × 1, respectively. We assume
E(x) = E(y) = 0 and p ≤ q. Form the (p+ q)× 1 vector
x
y
25
and its (p+ q)× (p+ q) covariance matrix
Σ =
Σ11 Σ12
Σ21 Σ22
.
Let
u := αTx ∈ R, v := γT y ∈ R
where α and γ are vectors to be determined. We want to maximize
the correlation
corr(u, v) =cov(u, v)
√
var(u)var(v)
The correlation does not change under scale transformations
u→ cu, etc. so we can maximize this correlation subject to the
constraints
E(u2) = E(αTx · αTx) = αT Σ11α = 1 (1)
E(v2) = γT Σ22γ = 1 (2)
26
Under these constraints
corr(u, v) = E(αTx · γT y) = αT Σ12γ.
Let
ψ = αT Σ12γ −1
2ρ (αT Σ11α− 1)− 1
2λ (γtΣ22γ − 1)
where ρ and λ are Lagrange multipliers. Set the vector of partial
derivatives to zero:
∂ψ
∂α= Σ12γ − ρΣ11α = 0 (3)
∂ψ
∂γ= ΣT
12α− λΣ22γ = 0 (4)
If we left multiply (3) by αT and (4) by γT , use the normalization
conditions (1) and (2) we conclude λ = ρ. Thus (3) and (4) become
−ρΣ11 Σ12
Σ21 −ρΣ22
α
γ
= 0 (5)
27
with corr(u, v) = ρ
and ρ ≥ 0 is a solution to
det
−ρΣ11 Σ12
Σ21 −ρΣ22
= 0
This is a polynomial in ρ of degree (p+ q). Let ρ1 denote the
maximum root and α1 and γ1 corresponding solutions to (5).
Definition: u1 and v1 are called the first canonical variables and
their correlation ρ1 = corr(u1, v1) is called the first canonical
correlation coefficient.
More generally, the rth pair of canonical variables is the pair of
linear combinations ur = (α(r))Tx and vr = (γ(r))T y, each of unit
variance and uncorrelated with the first r − 1 pairs of canonical
variables and having maximum correlation. The correlation
corr(ur, vr) is the rth canonical correlation coefficient.
28
REDUCTION TO AN EIGENVALUE PROBLEM
Since
−ρΣ11 Σ12
Σ21 −ρΣ22
=
Σ11 0
0 1
−ρI Σ−111 Σ12Σ
−122
Σ21 −ρI
1 0
0 Σ22
Thus the determinantal equation becomes
det(
ρ2I − Σ21Σ−111 Σ12Σ
−122
)
= 0
The nonzero roots ρ1, . . . , ρk are called the population canonical
correlation coefficients. k = rank(Σ12).
In applications Σ is not known. One uses the sample covariance
matrix S to obtain sample canonical correlation coefficients.
29
AN EXAMPLE
The first application of Hotelling’s canonical correlations is by
F. Waugh [16] in 1942. He begins his paper with
Professor Hotelling’s paper, “Relations between Two Sets
of Variates,” should be widely known and his method used
by practical statisticians. Yet, few practical statisticians
seem to know of the paper, and perhaps those few are
inclined to regard it as a mathematical curiosity rather
than an important and useful method for analyzing
concrete problems. This may be due to . . .
The Practical Problem: Relation of wheat characteristics to
flour characteristics. Guiding principle
The grade of the raw material should give a good
indication of the probable grade of the finished product.
30
The data: 138 samples of Canadian Hard Red Spring wheat and
the flour made from each these samples.a
wheat quality flour quality
x1 = kernel texture y1 = wheat per barrels of flour
x2 = test weight y2 = ash in flour
x3 = damaged kernels y3 = crude protein in flour
x4 = crude protein in wheat y4 = gluten quality index
x5 = foreign material
u1 = αT1 x = index of wheat quality, v1 = γT
1 y = index of flour quality
corr(u1, v1) = 0.909
aIn this example p > q. The data are normalized to mean 0 and variance 1.
31
DISTRIBUTION OF SAMPLE CANONICAL
CORRELATION COEFFICIENTS
Give a sample of size n observations on
x
y
drawn from
Np+q(µ,Σ) and A the (unnormalized) sample covariance matrix.
Then W is Wp+q(n,Σ). We have [13]
Theorem (Constantine, 1963): Let A have the Wp+q(n,Σ)
distribution where p ≤ q, n ≥ p+ q and Σ and A are partitioned as
above. Then the joint probability density function of r21,. . . , r2p, the
eigenvalues of A−111 A12A
−122 A21 (let ξj := r2j ) is
cp,q,n
p∏
j=1
(1−ρ2j)
n/2
p∏
j=1
[
ξ(q−p−1)/2j (1− ξj)(n−p−q−1)/2
]
·|∆(ξ)| ·F(ξ)
where
32
• cp,q,n is a normalization constant.
• ∆(ξ) = Vandermonde determinant =∏p
j<k(ξj − ξk).
• F(ξ) is a two-matrix hypergeometric function which can be
expressed as an infinite series involving zonal polynomials. See
Theorem 11.3.2 and Definition 7.3.2 in Muirhead [13].
Null Distribution: For Σ12 = 0 (x and y are independent), the
above joint density for the sample canonical correlation coefficients
reduces to
cp,q,n
p∏
j=1
[
ξ(q−p−1)/2j (1− ξj)(n−p−q−1)/2
]
· |∆(ξ)|
In this case the distribution of the largest sample canonical
correlation coefficient r1 can be used for testing the null hypothesis:
H : Σ12 = 0. We reject H for large values of r1.
33
ZONAL POLYNOMIALS
Zonal polynomials naturally arise in multivariate analysis when
considering group integrals such as∫
O(m)
e−tr(XHY HT ) (dH)
where X and Y are m×m symmetric, positive definite matrices
and (dH) is normalized Haar measure. Zonal polynomials can be
defined either through the representation theory of GL(m,R) [12]
or as eigenfunctions of certain Laplacians [5, 13].
Let Sm denote the space of m×m symmetric, positive definite
matrices. Zonal polynomials, Cλ(X), X ∈ Sm are certain
homogeneous polynomials in the eigenvalues of X that are indexed
by partitions λ.
To give the precise definition we need some preliminary definitions.
34
Definition: A partition λ of n is a sequence λ = (λ1, λ2, . . . , λℓ)
where the λj ≥ 0 are weakly decreasing and∑
j λj = n. We denote
this by λ ⊢ n.
For example, λ = (5, 3, 3, 1) is a partition of 12. The number of
nonzero parts of λ is called the length of λ, denoted ℓ(λ).
If λ and µ are two partitions of n, we say λ < µ (lexicographic
order) if, for some index i, λj = µj for j < i and λj < µj . For
example
(1, 1, 1, 1) < (2, 1, 1) < (2, 2) < (3, 1) < (4)
If λ ⊢ n with ℓ(λ) = m, we define the monomial
xλ = xλ11 xλ2
2 · · ·xλmm
and say xµ is of higher weight than xλ if µ > λ.
35
Metrics and Laplacians
Let X ∈ Sm and dX = (dxjk) the matrix of differentials of X . We
define a metric on Sm by
(ds)2 = tr(
X−1dX ·X−1dX)
A simple computation shows this metric is invariant under
X −→ LXLT , L ∈ GL(m,R).
Let n = m(m+ 1)/2 = # of independent elements of X . We denote
by vec(X) ∈ Rn the column vector representation of X , e.g.
X =
x11 x12
x12 x22
, x := vec(X) =
x11
x12
x22
The metric (ds)2 is a quadratic differential in dx.
36
For example,
(ds)2 =(
dx11 dx12 dx22
)
·G(x) ·
dx11
dx12
dx22
where
G(x) = (gij) =
1
(x11x22 − x212)
2
x222 −2x12x22 x2
12
−2x12x22 2x11x22 + 2x212 −2x11x12
x212 −2x11x12 x2
11
Labeling x = vec(X) with a single index the differential is in the
standard form
(ds)2 =∑
i<j
dxi gij dxj
37
and the Laplacian associated to this metric is
∆X = (detG)−1/2n
∑
j=1
∂
∂xj
[
(detG)1/2n
∑
i=1
gij ∂
∂xi
]
where
G−1 = (gij)
More succinctly, if
∇X =
∂∂x1
...
∂∂xn
,
∆X = (detG)−1/2(
∇X , (detG)1/2G−1∇X
)
where (·, ·) is the standard inner product on Rn.
38
One can show that ∆X is an invariant differential operator :
∆LXLT = ∆X , L ∈ GL(m,R)
We now diagonalize X
X = HYHT , Y = diag(y1, . . . , ym), H ∈ O(m).
The Laplacian ∆X is now expressed in terms of a radial part and
an angular part. The radial part of ∆X is the differential operator
m∑
j=1
y2j
∂2
∂y2j
+m
∑
j=1
m∑
k=1k 6=j
y2j
yj − yk
∂
∂yj+
∑
j
yj∂
∂yj
We now let ∆X denote only the radial part.
We can now define zonal polynomials!
39
Definition: Let X ∈ Sm with eigenvalues x1,. . . , xm and let
λ = (λ1, . . . , λ−m) ⊢ k into not more than m parts. Then Cλ(X) is
the symmetric, homogeneous polynomial of degree k in xj such that
1. The term of highest weight in Cλ(X) is xλ
2. Cλ(X) is an eigenfunction of the Laplacian ∆X .
3. (tr(X))k
= (x1 + · · ·+ xm)k =∑
λ⊢k
ℓ(λ)≤m
Cλ(X).
Remarks:
• Must show there is an unique polynomial satisfying these
requirements.
• Eigenvalue in (2) equals αλ :=∑
j λj(λj − j) + k(m+ 1)/2.
• Program MOPS [5] computes zonal polynomials.
40
The following theorem is at the core of why zonal polynomials
appear in multivariate statistical analysis.
Theorem: If X,Y ∈ Sp, then
∫
O(p)
Cλ(XHYHT ) (dH) =Cλ(X)Cλ(Y )
Cλ(Ip)(6)
where (dH) is normalized Haar measure.
Proof: Let fλ(Y ) denote the left-hand side of (6) and Q ∈ O(p).
fλ(QYQT ) = fλ(Y ) (let H → HQ in integral and use invariance of
the measure). Thus fλ is a symmetric function of the eigenvalues of
Y . Since Cλ is homogeneous of degree |λ|, so is fλ. Now apply the
41
Laplacian to fλ.
∆Y fλ(Y ) =
∫
O(p)
∆Y Cλ(XHYHT ) (dH)
=
∫
∆Y Cλ(X1/2HYHTX1/2) (dH)
=
∫
∆Y Cλ(LY LT ) (dH) (L = X1/2H)
=
∫
∆LY LT Cλ(LY LT ) (dH) invariance of ∆Y
= αλ
∫
Cλ(LY LT ) (dH)
= αλ fλ(Y )
By definition of Cλ(Y ) we have fλ(Y ) = dλ Cλ(Y ). Since
fλ(Ip) = Cλ(X), we find dλ and the theorem follows.
42
Using Zonal Polynomials to Evaluate Group Integrals∫
O(p)
e−ρ tr(XHY HT ) (dH)
=∞∑
k=0
ρk
k!
∫
O(p)
(
tr(XHYHt))k
(dH)
=∞∑
k=0
ρk
k!
∑
λ⊢k
ℓ(λ)≤m
∫
O(p)
Cλ(XHYHT ) (dH)
=
∞∑
k=0
ρk
k!
∑
λ⊢k
ℓ(λ)≤m
Cλ(X)Cλ(Y )
Cλ(Ip)
43
References
[1] T. W. Anderson, An Introduction to Multivariate Statistical
Analysis, third edition, John Wiley & Sons, Inc., 2003.
[2] J. Baik, G. Ben Arous, and S. Peche, Phase transition of the largest
eigenvalue for nonnull complex sample covariance matrices, Ann.
Probab. 33 (2005), 1643–1697.
[3] J. Baik, Painleve formulas of the limiting distributions for nonnull
complex sample covariance matrices, Duke Math. J. 133 (2006),
205–235.
[4] M. Dieng and C. A. Tracy, Application of random matrix theory to
multivariate statistics, arXiv: math.PR/0603543.
[5] I. Dumitriu, A. Edelman and G. Shuman, MOPS: Multivariate
Orthogonal Polynomials (symbolically), arXiv:math-ph/0409066.
[6] N. El Karoui, On the largest eigenvalue of Wishart matrices with
identity covariance when n, p and p/n tend to infinity, arXiv:
44
math.ST/0309355.
[7] N. El Karoui, Tracy-Widom limit for the largest eigenvalue of a
large class of complex Wishart matrices, arXiv: math.PR/0503109.
[8] G. H. Golub and C. F. Van Loan, Matrix Computations, second
edition, Johns Hopkins University Press, 1989.
[9] H. Hotelling, Analysis of a complex of statistical variables into
principal components, J. of Educational Psychology 24 (1933),
417–441, 498–520.
[10] H. Hotelling, Relations between two sets of variates, Biometrika 28
(1936), 321–377.
[11] I. M. Johnstone, On the distribution of the largest eigenvalue in
principal component analysis, Ann. Stat. 29 (2001), 295–327.
[12] I. G. Macdonald, Symmetric Functions and Hall Polynomials, 2nd
ed., Oxford University Press, 1995.
[13] R. J. Muirhead, Aspects of Multivariate Statistical Theory, John
45
Wiley & Sons Inc., 1982.
[14] A. Soshnikov, A note on universality of the distribution of the
largest eigenvalue in certain sample covariance matrices, J.
Statistical Physics 108 (2002), 1033–1056.
[15] C. A. Tracy and H. Widom, On orthogonal and symplectic matrix
ensembles, Commun. Math. Phys. 177 (1996), 727–754.
[16] F. V. Waugh, Regression between sets of variables, Econometrica
10 (1942), 290–310.
[17] P. Zinn-Justin and J.-B. Zuber, On some integrals over the U(N)
unitary group and their large N limit, J. Phys. A: Math. Gen. 36
(2003), 3173–3193.
46