The Multivariate Gaussian Jian Zhao. Outline What is multivariate Gaussian? Parameterizations...

Preview:

Citation preview

The Multivariate Gaussian

Jian Zhao

Outline

What is multivariate Gaussian? Parameterizations Mathematical Preparation Joint distributions, Marginalization and

conditioning Maximum likelihood estimation

What is multivariate Gaussian?

                                                 

Where x is a n*1 vector, ∑ is an n*n, symmetric matrix

11/ 2/ 2

1 1( | , ) exp{ ( ) ( )}

2(2 )T

np x x x

2

1 1 2

211 2 2

,

,

x x x

x x x

Geometrical interpret

Exclude the normalization factor and exponential function, we look the quadratic form

1( ) ( )Tx x

This is a quadratic form. If we set it to a constant value c, considering the case that n=2, we obtained

22

2 211 1 12 1 2 22a x a x x a x

Geometrical interpret (continued)

This is a ellipse with the coordinate x1 and x2.

Thus we can easily image that when n increases the ellipse became higher dimension ellipsoids

Geometrical interpret (continued)

Thus we get a Gaussian bump in n+1 dimension

Where the level surfaces of the Gaussian bump are ellipsoids oriented along the eigenvectors of ∑

Parameterization

1

1

( )

( )( )TE x

E x x

Another type of parameterization, putting it into the form of exponential family

1( log(2 ) log | | )

2Ta n

1( | , ) exp{ }

2T Tp x a x x x

Mathematical preparation

In order to get the marginalization and conditioning of the partitioned multivariate Gaussian distribution, we need the theory of block diagonalizing a partitioned matrix

In order to maximum the likelihood estimation, we need the knowledge of traces of the matrix.

Partitioned matrices

Consider a general patitioned matrix

E FM

G H

To zero out the upper-right-hand and lower-left-hand corner of M, we can premutiply and postmultiply matrices in the following form

1

1

0 0

0 0

I FH E F I E FH G

I G H H G I H

Partitioned matrices (continued)

Define Schur complement of Matrix M with respect to H , denote M/H as the term 1E FH G

Since1 1 1 1 1

1 1

( )XYZ Z Y X W

Y ZW X

1 1 1

1 1

1 1 1

1 1 1 1 1 1

0 ( / ) 0

0 0

( / ) ( / )

( / ) ( / )

E F I M H I FH

G H H G I H I

M H M H FH

H G M H H H G M H FH

So

Partitioned matrices (continued)

Note that we could alternatively have decomposed the matrix m in terms of E and M/E, yielding the following expression for the inverse.

1 1

1 1 1

0 0

0 00 0

I E F E F EI E F I E F

GE I G H GE F H GE F HI I

1 1 1

11

1 1 1

11

1 1 1 1 1 1

1 1 1

00

0 0 ( / )

0( / )

0 ( / )

( / ) ( / )

( / ) ( / )

E F II E F E

G H GE II M E

IE E F M E

GE IM E

E E F M E GE E F M E

M E GE M E

Partitioned matrices (continued)

Thus we get

1 1 1 1 1 1 1( ) ( )E FH G E E F H GE F GE 1 1 1 1 1 1( ) ( )E FH G FH E F H GE F

At the same time we get the conclusion

|M| = |M/H||H|

Theory of traces

[ ]iii

i i

tr A a

[ ] [ ]T T Tx Ax tr x Ax tr xx A

Define

It has the following properties:

tr[ABC] = tr[CAB] = tr[BCA]

Theory of traces (continued)

[ ] kl lk jik lij ij

tr AB a b ba a

[ ] Ttr BA BA

so

[ ] [ ]T T T T Tx Ax tr xx A xx xxA A

Theory of traces (continued)

We want to show that

Since

Recall

This is equivalent to prove

Noting that

log | | TA AA

1

log | | | |ij ij

A Aa A a

1 1

| |A A

A

| |ij

A Aa

| | ( 1)i j ij ijj

A a M

Joint distributions, Marginalization and conditioning

1

2

11 12

21 22

1

1 1 11 12 1 1

1/ 2( ) / 22 2 21 22 2 2

1 1( | , ) exp{ }

2(2 )

T

p q

x xp x

x x

We partition the n by 1 vector x into p by 1 and q by 1, which n = p + q

Marginalization and conditioning

1

1 1 11 12 1 1

2 2 21 22 2 2

11 1 22

12 2 22 21 22

11 112 22

2 2

1 11 1 12 22 2 2 22

1exp{ }

2

0 ( / ) 01exp{

2 0

}0

1exp ( ( )) ( / ) (

2

T

T

T

x x

x x

x I

x I

xI

xI

x x x

11 1 12 22 2 2

12 2 22 2 2

( ))

1exp ( ) ( )

2T

x

x x

Normalization factor

1/ 2 ( ) / 2 1/ 2( ) / 222 22

/ 2 1/ 2 / 2 1/ 222 22

1 1

(2 ) (| / || |)(2 )

1 1

(2 ) (| / |) (2 ) (| |)

p qp q

p q

Marginalization and conditioning

Thus

12 2 2 22 2 2/ 2 1/ 2

22

1 1( ) exp ( ) ( )

(2 ) (| |) 2T

qp x x x

1 2 / 2 1/ 222

1 1 11 1 12 22 2 2 22 1 1 12 22 2 2

1( | )

(2 ) (| / |)

1exp ( ( )) ( / ) ( ( ))

2

p

T

p x x

x x x x

Marginalization & Conditioning

Marginalization

Conditioning

2 2

2 22

m

m

11|2 1 12 22 2 2

11|2 11 12 22 21

( )c

c

x

In another form

1|2 1 12 2

1|2 11

c

c

x

Marginalization

Conditioning

12 2 21 11 1

12 22 21 11 12

m

m

Maximum likelihood estimation

1

1

1( , | ) log | | ( ) ( )

2 2

NT

i ii

Nl D x x

1

1

( )N

Ti

i

lx

Calculating µ

1

N

ii

xN

Taking derivative with respect to µ

Setting to zero

Estimate ∑

We need to take the derivative with respect to ∑

according to the property of traces

1

1 1

1 1

1( | ) log | | ( ) ( )

2 2

1log | | [( ) ( )]

2 2

1log | | [( )( ) ]

2 2

T

n

T

n

T

n

Nl D x x

Ntr x x

Ntr x x

1

1( )( )

2 2T

n nn

l Nx x

Estimate ∑

Thus the maximum likelihood estimator is

The maximum likelihood estimator of canonical parameters are

1ˆ ( )( )TML n nn

x xN

1

1

ˆˆ

ˆˆ ˆ

ML

ML ML

Recommended