Maximum Likelihood Estimation Multivariate Normal distribution

Preview:

Citation preview

Maximum Likelihood Estimation

Multivariate Normal distribution

The Method of Maximum Likelihood

Suppose that the data x1, … , xn has joint density function

f(x1, … , xn ; 1, … , p)

where (1, … , p) are unknown parameters assumed to lie in (a subset of p-dimensional space).

We want to estimate the parameters1, … , p

Definition: The Likelihood function Suppose that the data x1, … , xn has joint density function

f(x1, … , xn ; 1, … , p)

Then given the data the Likelihood function is defined to be

= L(1, … , p)

= f(x1, … , xn ; 1, … , p)

Note: the domain of L(1, … , p) is the set .

,f x

,f x L

Definition: Maximum Likelihood Estimators

Suppose that the data x1, … , xn has joint density function

f(x1, … , xn ; 1, … , p)

Then the Likelihood function is defined to be

= L(1, … , p)

= f(x1, … , xn ; 1, … , p)

and the Maximum Likelihood estimators of the parameters 1, … , p are the values that maximize

= L(1, … , p)

,f x

L

L

i.e. the Maximum Likelihood estimators of the parameters 1, … , p are the values

1

1 1, ,

ˆ ˆ, , max , ,p

p pL L

1̂ˆ, , p

Such that

Note: 1maximizing , , pL is equivalent to maximizing

1 1, , ln , ,p pl L

the log-likelihood function

The Multivariate Normal Distribution

Maximum Likelihood Estiamtion

Let 1 2, , nx x x

with mean vector

and covariance matrix

from the p-variate normal distribution

denote a sample (independent)

11 12 1

21 22 2

1 2

1 2

, , ,

n

n

n

p p pn

x x x

x x xx x x

x x x

Note:

The matrix 1 2, , np n

x x xX

is called the data matrix.

11 12 1

21 22 2

1 2

n

n

p p pn

x x x

x x x

x x x

The vector

1

2

1np

n

x

x

x

x

is called the data vector.

11

1

1

p

n

pn

x

x

x

x

The mean vector

The vector

1

2

1 2

1n

p

x

xx x x x

n

x

note

1 21

1 1 n

i i i in ijj

x x x x xn n

is called the sample mean vector

also

1 11 12 1

2 21 22 2

1 2

1

11

1

n

n

p p p pn

x x x x

x x x xx

n

x x x x

11X

n

In terms of the data vector

1

2

1

1 1, , ,

p npnp

n

x

xx I I I

n n

x

xA

where , , ,p np

I I IA

Graphical representation of sample mean vector

2x

x1x

nx

2x

1x

px

The sample mean vector is the centroid of the data vectors.

The Sample Covariance matrix

The sample covariance matrix:

11 12 1

12 11 2

1 2

p

p

p p

p p pp

s s s

s s s

s s s

S

1

1

1

n

ik ij i kj kj

s x x x xn

where

There are different ways of representing sample covariance matrix:

11 12 1

12 11 2

1 2

p

p

p p

p p pp

s s s

s s s

s s s

S

1 1 1

1

1

n

j jj p p

x x x xn

1 1 1

1

1

n

j jj p p

S x x x xn

1

1

1,...,

1 n

n

x x

x x x xn

x x

1 1

1,..., ,..., ,..., ,...,

1 n nx x x x x x x xn

1,..., ,..., ,..., ,...,

1 j j j jx x x x x x x xn

1 1 11,...,1 1,...,1

1X X X X

n n n

1 1 1

1X I J X I J

n n n

1 1

where 1,...,1 matrix of 1's

1 1n nJ n n

1 1 1

1S X I J X I J

n n n

hence

1 1 1

1X I J I J X

n n n

1 1

1X I J X

n n

Maximum Likelihood Estimation

Multivariate Normal distribution

Let 1 2, , nx x x

with mean vector

and covariance matrix

from the p-variate normal distribution

denote a sample (independent)

11

21 / 2 1/ 2

1

1, , , e

2

i in x x

n pi

f x x

Then the joint density function of 1 2, , nx x x

is:

1

1

1

2

/ 2 / 2

1e

2

n

i ii

x x

np n

The Likelihood function is:

1

1

1

2

/ 2 / 2

1, e

2

n

i ii

x x

np nL

and the Log-likelihood function is:

, ln , l L

1

1

1ln 2 ln

2 2 2

n

i ii

np nx x

To find the Maximum Likelihood estimators of

1

1

1

2

/ 2 / 2

1, e

2

n

i ii

x x

np nL

or equivalently maximize

1

1

1, ln 2 ln

2 2 2

n

i ii

np nl x x

and

we need to find ˆ ˆ and

to maximize

Note:

1

1

n

i ii

x x

thus 1

1

, 1

2

n

i ii

dl dx x

d d

1 1 1

1 1

2n n

i i ii i

x x x n

1 1

1

0n

ii

x n

1

1ˆn

ii

x xn

hence

Now

1

1

1, ln 2 ln

2 2 2

n

i ii

np nl x x

1

1

1ln 2 ln tr

2 2 2

n

i ii

np nx x

1

1

1ln 2 ln tr

2 2 2

n

i ii

np nx x

tr tr AB BA

1

1

1ln 2 ln tr

2 2 2

n

i ii

np nx x

Now ,l

1

1

, ln 1tr

2 2

n

i ii

dl dn dx x

d d d

1

1

1ln 2 ln tr

2 2 2

n

i ii

np nx x

1 1 1

1

10

2 2

n

i ip p

i

nx x

1

1 ˆ ˆˆor n

i ii

x xn

1

1 1=

n

i ii

nx x x x S

n n

and

Summary:

the Maximum Likelihood estimators of

are

1

1ˆ n

ii

x xn

and

1

1 1ˆ n

i ii

nx x x x S

n n

Sampling distribution of the MLE’s

Note

1

1

1 1ˆ , ,n

ii

n

x

x x I I Axn n

x

11

21 / 2 1/ 2

1

1, , , e

2

i in x x

n pi

f x x

The joint density function of 1 2, , nx x x

is:

1

1

1

2

/ 2 / 2

1e

2

n

i ii

x x

np n

*

0

and covariance matrix

0

p p

p p

This distribution is np-variate normal with mean vector

*

1

1

1 1ˆ , ,n

ii

n

x

x x I I Axn n

x

Thus the distribution of

is p-variate normal with mean vector

*

1

1 1 1, , = =

n

i

A I I nn n n

*and covariance matrix A A

2

2

01

, ,

0

1 1=

p p

p p

I

I In

I

nn n

Summary

The sampling distribution of

is p-variate normal with

x

nxx

1 and

The sampling distribution of the sample covariance matrix S

and

Sn

n 1ˆ

The Wishart distribution

A multivariate generalization of the 2 distribution

Let 1 2, , , kz z z be k independent random p-vectors

Each having a p-variate normal distribution with

1mean vector 0 and covariance matrix

p pp

and covariance matrix p p

1 1 2 2Let k kp pU z z z z z z

Then U is said to have the p-variate Wishart distribution with k degrees of freedom

pU W k

Definition: the p-variate Wishart distribution

Suppose

Then the joint density of U is:

1 / 4

1

i.e. / 2 1 / 2p

p pp

j

k k j

1 / 2 12

/ 2/ 2

exp

2 / 2

k p

U kkpp pp

u tr uf u

k

where p(·) is the multivariate gamma function.

pU W k

The density ot the p-variate Wishart distribution

It can be easily checked that when p = 1 and 1 then the Wishart distribution becomes the 2

distribution with k degrees of freedom.

U

Suppose

Let denote a matrix of rank .q pC q p q p

pU W k

Theorem

then

2 21 a kv a Ua W k a a

Corollary 1:

2with a a a

Corollary 2: If the diagonal element of th

iiu i U 2then where ii ii k iju

pV CUC W k C C

Proof

Set [0 0 1 0]i

i

a e

Suppose 1 1 2 2 and p pU W k U W k

Theorem

are independent, then

1 2 1 2pV U U W k k

Suppose 1 1 2 and pU W k UTheorem

are independent and

1 2 1 with pV U U W k k k

then 2 1pU W k k

1

n

i i pi

U x x W n

Theorem Let

Theorem

1 2, , , nx x x

be a sample from

then pN

1 1i pU n x x W

Let 1 2, , , nx x x

be a sample from

then pN

1

n

i ii

U x x

Theorem

Proof

in x x

1

n

i ii

x x x x

1

n

i ii

U x x

1

n

i ii

x x x x x x

etc

21

n

i ii

U x x x x

Theorem Let

1 2, , , nx x x

be a sample from

then

pN

1 iU n x x

is independent of

Proof 1 1 1

21 22 2

1 2

Let

n n n

n

n n nn

h h hH

h h h

be orthogonal

Then H H HH I

1 1 1

* 21 22 2

1 2

Let

n n n

n

np np

n n nn

I I I

h I h I h IH

h I h I h I

Note H* is also orthogonal* the Kronecker product of and H H I H I

11 1

1

n

m mn

a B a B

A B

a B a B

Properties of Kronecker-product

1. A B C D AC BD

2. A B A B

1 1 1 3. A B A B

BDACDCBA

1 1 11 1

2 2* 21 22 2

1 2

Let

n n n

n

n nn n nn

I I I x u

x uh I h I h IH x

x uh I h I h I

11

1

1

for 2,3,...,

n

ini

n

i ij ji

u x nx

u h x i p

1 1

Note: n n

i i i ii i

u u x x

1 12 1

n n

i i i ii i

u u u u x x

1 1 1 12 1 1

- - 1 n n n

i i i i i ii i i

u u u u x x u u x x nxx n S

*

0

and covariance matrix =

0

p p

p p

I

This the distribution of

* 1

is np-variate normal with mean vector 1 2, , , nx x x x

= H I I H I

Thus the joint distribution of

1

= 1 =

0

n

H

is np-variate normal with mean vector u H I x

1 u H I H I

*and covariance matrix u H I H I

= HH I

0

and covariance matrix =

0

p p

u

p p

I

Thus the joint distribution of

1

=

0

n

u

is np-variate normal with mean vector

u

1

1 1n

i i pi

U x x x x n S W n

Summary: Sampling distribution of MLE’s for multivatiate Normal distribution

Let 1 2, , , nx x x

be a sample from

then

pN

1p nx N

and

22 2

1 1Also 1ii ii

nu s n

Recommended