Upload
pankaj-das
View
319
Download
4
Embed Size (px)
DESCRIPTION
The Wishart and inverse-wishart distribution
Citation preview
Pankaj Das
M.Sc. (Agricultural Statistics)
Roll no. – 20394
Introduction
Mathematical background
Wishart distribution
Inverse- Wishart distribution.
Relationship of Wishart and Inverse-Wishart distribution
Conclusion
In the modern Era of science and information technology, there has
been a huge influx of high-dimensional data from various fields
such as genomics, environmental sciences, finance and the social
sciences.
In sense of all the many complex relationships and multivariate
dependencies present in the data and formulating correct models and
developing inferential procedures are the major challenges in
modern day statistics.
In parametric models the covariance or correlation matrix (or its
inverse) is the fundamental object that quantifies relationships
between random variables.
Estimating the covariance matrix in a sparse way is crucial in high
dimensional problems and enables the detection of the most
important relationship.
Covariance matrices provide the simplest measure of dependency,
and therefore, much attention has been placed on modelling
covariance matrices. It has a significant impact on statistical
inferences.
In short correlation matrix plays vital role in multivariate statistics.
The correlation/covariance matrix is directly involved in a variety of
statistical models. Estimation of correlation matrix is important.
In the estimation of covariance matrices in multivariate statistics
Wishart & inverse wishart distribution is used.
Suppose is p-random variable drawn from a p-variate normal
distribution with mean vector and covariance matrix ∑.
Then the joint density function of is given by
P α
T
α ααN ( ,= , ..., ~ ) α = 1, ..., ; n X x x μ Σ
...(1)
(α)X
1 2, , nx x x
1 -1- x -μ Σ x -μn i i1 2f x ,…,x μ,Σ = e 1 n p/2 1/2i=1 2π Σ
n1 -1- x -μ Σ x -μi i21 i=1 = enp/2 n/2
2π Σ
μ
x1
x x xx 11 12 1n2x x x. 21 22 2n= =p×n
.
x x. xpnp1 p2xp
X
p×nX Where is called data matrix which is given by
The Mean vector is shown as
p×1
n
α
α=1
1x= x
n
x1
x2
1 .= x +x …+x =1 n2
n .
.
xp
The variance-covariance matrix is given by
Another way
This dispersion matrix is semi-positive definite.
11 1
1
ˆn
p pn
'
1
1ˆ (x )( )1
n
x x xn
The sample covariance matrix
Where,
11 12 1
12 11 2
1 1 1
1 2
1
1
p
np
j jp p j p p
p p pp
s s s
s s sx x x x
n
s s s
s
1
1
1
n
ik ij i kj k
j
s x x x xn
The sampling distribution of the sample covariance matrix S and
follow a distribution which is known as Wishart
distribution.
It is named in honor of John Wishart, who first formulated the
distribution in 1928.
n-1Σ̂= S
n
j
n2(n-1)s = (x -x)(x -x)j=1
j p~W n-1, Σ
The Wishart distribution is a multivariate extension of the Gamma
distribution. It simplifies to a multivariate generalization of the χ2
distribution.
χ2 distribution describes the sum of squares of n draws from a
univariate normal distribution, where the Wishart distribution
represents the sum of squares (and cross-products) of n draws from a
multivariate normal distribution.
It is a family of probability distributions defined over symmetric,
nonnegative-definite matrix-valued random variables.
The Wishart distribution arises as the distribution of the sample
covariance matrix for a sample from a multivariate normal
distribution.
Contd...
Let be k independent random p-vectors . Each having a
p-variate normal distribution with mean vector and covariance
matrix
Let
Then U is said to have the p-variate Wishart distribution with k
degrees of freedom and the covariance matrix , where .
1 2 kz ,z ,…,z
1 0 p
p p
' ' '
1 1 2 2 k kp×p
=z z +z z +…+z zU
~ pU W k
p×10
p p
The joint density of the p-variate Wishart distribution is
where is a multivariate gamma function.
i.e.
This is known as central Wishart distribution.
1 /212
/2/2
exp
2 / 2
k p
U kkpp pp
u tr uf u
k
1 /4
1
/ 2 1 / 2p
p p
p
j
k k j
...(2)
...(3)(.)
Contd...
(.)
Let
Then where M is a matrix with columns
Where &
This is called non-central Wishart distribution.
, . .., ~ N ( , ) ; 1,2,...,kT
pX x x
'
1
~ ( , ,M)k
px x W k
p k 1 2, ,..., k
11 1
'
1
k
p pk
x x
x x
11 1
1
' ( ')
k
p pk
M
When M = 0 then Wp is called central wishart distribution & we
write as
It can be easily checked that when p = 1 and ∑= 1 then the Wishart
distribution becomes the chi-square distribution with k degrees of
freedom. Note that we must have k> p-1 to ensure ∑ is invertible. If
k > p-1 does not hold, is called singular Wishart distribution due to
∑ being a singular matrix.
(k, )pW
The expected value of S is
...(4)
The expected value of a Wishart distribution depends on number of
draws one makes from the multivariate normal distribution.
In comparison, the expected value of a distribution is k, so that
the only differences between a Wishart expectation and a
expectation are the underlying dimensionality of the data and a
scale component.
(S) k
2 ( )k
2 ( )k
~ W(k , )i iS Let 1 ,...,i n where
We can find the individual variances of the elements of S. for instance,the variance of the ijth element of S is:
...(5)
Where is the ijth element of the matrix and can be thought of as the
population covariance between variable i and variable j. When p = 1, so
that the only element of the variance/covariance matrix is
Therefore, we get Var(X) = k (1 + 1 x 1) = 2k, which is the familiar
variance of a variable.
2 ij ij ii j jVar S k
ij thij
2 k
thij
2
11 11 1
ij
2 ( )k
Equation (5) is a set of variances rather than depicting the
variance/covariance matrix because every observation of the Wishart
distribution is a matrix.
Therefore, describing all combinations of variances and covariance
of S requires either an array of higher order or a Kronecker
operation to represent that higher order array as a matrix
In 2007 Eaton et al. present a new idea to derive for derivation the
covariance matrix of S. The covariance matrix of S can be represented
as
Where is the Cholesky decomposition of the (square,symmetric) matrix , and
1
Cov(S) cov( )k
T
i i
i
x x
1
( )k
T
i i
i
Cov x x
(Cz C )T T
i ikCov z
TCC
T
i i pE z z I
...(6)
Applying the vector(vec) operator to S, which forms a long vector
by stacking the columns of S, so that Cov[vec(s)] is a matrix rather
than an array, so we have
k T TCov vec S Cov vec Czz C
TkCov C C vec zz
TTk C C Cov vec zz C C
T Tk C C Cov z z C C
(by the vec to Kronecker property)
(by vec and Kronecker properties)
z
To determine cov [vec(s)] (as a proxy for cov(s)), one would only
need to know
(1) The variance of (where n is any element of z);
(2) The variance of (where are any two elements of z);
(3) The covariance between & ;
(4) The covariance between & and
(5) The covariance between and (where at most two of i, j, n, or
o are the same).
Cov z z
n oZ Z n o
n oZ Zo nZ Z
n oZ Z
2
nZ 2
oZ
2
nZ
1. is standard normally distributed, so follows a distribution
variance equal to 2(1) = 2. Therefore, = 2 for all k.
2. & are uncorrelated standard normal random variables, which
implies that they are also independent. Therefore,
due to independence
Here & both follow a distribution.
nZ
2 )(1
2 2 2 2 2 0n o n o n oVar Z Z E Z Z E Z E Z E Z Z E Z E Z
2 2( ) ( )( ) 1 1 1n o n oVar Z Z E Z Z
2 )(1
nZ
2( )nV Z
2 22 2— — 1— 0 1n o o n n o o n n o o n n o n oCov Z Z Z Z E Z Z Z Z E Z Z E Z Z E Z Z E Z E Z
3. & uncorrelated standard normal random variables,
so
4. 2 2 2 2 2 2( 1 1 0, ) —n o n o n oCov Z Z E Z Z E Z E Z
5. , 0i j n oCov Z Z Z Z
Therefore, the [p(n-1)+ n,p(n -1)+n] elements of will all be 2
because and the remaining diagonal elements of
will all be 1 because for all k = l. The off-diagonal elements
must be 0 except for those elements symbolizing the covariance
between and , which will be 1.
Ultimately can be written as
is a matrix of 1s and 0s.
2 2kVar Z
1n oVar Z Z
p p pI I M
2 2 p ppM
i jZ Z j iZ Z
pM 2 2 p p
i jZ Z j iZ Z
Therefore,
T TCov vec S k C C Cov z z C C
T T
p p pk C C I I M C C
T T T T
pk C C C C C C M C C
T T
pk C C M C C ...(7)
And we can check the derivation by simulating draws from a
Wishart distribution and comparing the simulated covariance with
the empirical covariance matrix calculated using equation (7).
Analysis is done with the help of R console.
Results shows that replications must be very large to be for theempirical covariance matrix to be close to the theoretical covariancematrix.
Some Theorems of Wishart Distribution
Some Theorems on Wishart Distribution
Wishart distribution have great importance in the estimation of
covariance matrices in multivariate statistics.
The Wishart distribution is frequently used as the prior on the
precision matrix parameter of a multivariate normal distribution.
The Wishart distribution arises as models for random variation and
descriptions of uncertainty about variance and precision matrices. They
are of particular interest in sampling and inference on covariance and
association structure in multivariate normal models, and in ranges of
extensions in regression.
The Inverse-Wishart distribution is the multivariate extension of the
Inverse-Gamma distribution.
Even though the Wishart distribution generates sums of squares
matrices, one can think of the Inverse-Wishart distribution as
generating random covariance matrices.
However, those covariance matrices would be inverses of the
covariance matrices generated under Wishart distribution.
Let T ~ InvWishp
Where denotes a positive definite scale matrix, m denotes the degrees of
freedom, and p indicates the dimensions of T (i.e. ).
Then T is positive definite with probability density function is given as
...(8)
m2| Ψ | 1 -1= exp[- tr(ΨT )]
m+p+1 mp 2m2 2| T | 2 Γ ( )p
f
2
T =
( , )m
Ψp×pT R
The expected value of T is
...(9)
In respect to the distribution, the only differences between the
Inverse-Wishart expectation and the inverse- expectation are the
dimensionality of the data and a scale component.
The Inverse-Wishart distribution has finite expectation only when m >
(p -1).
( )1
Tm p
2χ
The variance of the ijth element of T is
Where is the element of the matrix
If p = 1, then only element of the variance/covariance matrix is given by
The variance expression is given by
Which is same as the variance of variable.
2
2
( 1) ( 1)(T )
( )( 1) ( 3)
ij ii jj
ij
m p m pVar
m p m p m p
2
11 11 1
2 2 2
( ) 1 ( 2) 1 1 2( 1) 2( )
( 1)( 2) ( 4) ( 1)( 2) ( 4) ( 2) ( 4)
m m mVar X
m m m m m m m m
2Inv-χ (m)
ijψ
k ~ ( , )pS W
1 1~ ),(pInvWish mS
The Wishart distribution is related to the normal distribution, Chi-square
distribution and Gamma distribution . The Inverse-Wishart distribution
is related to the those distributions in a similar way.
let
then
Where m=k is the degrees of freedom
The Inverse-Wishart distribution is frequently used as the prior on
the variance/covariance matrix parameter (S) of a multivariate
normal distribution. Note that the Inverse-Gamma distribution is the
conjugate prior for the variance parameter of a univariate normal
distribution, and the Inverse-Wishart distribution (as its multivariate
generalization) extends conjugacy to the multivariate normal
distribution.
The Wishart and Inverse-Wishart distribution is an important
distribution having a certain good and useful statistical properties.
These distributions have important role in estimating parameter in
multivariate studies.
Wishart distribution help to develop a framework for bayesian
inference for Gaussian covariance graph models.
Hence this work completes the powerful theory that has been developed
in the mathematical statistics literature for decomposable models. This
models help to draw a valid inference.
The generalized Wishart process (GWP) – which we used to model
time-varying covariance matrices σ(t). In the future, the GWP could be
applied to study how Σ depends on covariates like interest rates.
Conclusion:
Anderson, T. W.(2003). An Introduction to Multivariate StatisticalAnalysis. Wiley India (P) Ltd., New Delhi.
Chatfield, C. and Collins, A. J.(1980). Introduction to Multivariate
Analysis. Chapman and Hall-CRC, London.
Chib, S. and Greenberg, E.(1998). Bayesian analysis of multivariate
probit models. Biometrika , 85, 347-361.
Cook, R. D.(2011). On the mean and variance of the generalized
inverse of a singular Wishart matrix . Electronic Journal of Statistics
, 5, 146-158.
Eaton, M. L.(2007). Multivariate statistics: A vector space approach , .
Wiley India (P) Ltd., New Delhi.
Nydick, S. W.(2012). The Wishart and inverse Wishart distributions.
International Journal of Electronics and Communication,
22, 119-139.
Pourahmadi, M., Daniels, M. J. and Park, T.(2006). Simultaneously
modelling of the cholesky decomposition of several covariance
matrices. Journal of Multivariate Analysis, 97, 125-135.
Rao, C. R.(1965). Linear statistical inference and its applications,Wiley India (P) Ltd., New Delhi.
It is a decomposition of hermitian positive definitematrix into the product of lower triangular matrixand it conjugate transpose.
The cholesky decomposition of a hermitian positive-definite matrix a is a decomposition of the form.
Where L is a lower triangular matrix with real andpositive diagonal entries, and L* denotes theconjugate transpose of L
A LL
Cholesky decomposition
R CODE
M
Any
Question
s ????