The Multivariate Normal Distribution - Paul Johnson Homepage

The Multivariate Normal Distribution

Paul Johnson

June 10, 2013

1 IntroductionA one dimensional Normal variable should be very familiar to students who have completedone course in statistics.

The multivariate Normal is a generalization of the Normal which describes the probabilityof observing n− tuples.

The multivariate Normal distribution looks just like you might expect from your expe-rience with the univariate Normal: symmetric, single-peaked, basically pleasant in everyway!

The vectors that are described by the multivariate Normal distribution can have manyelements as we choose to assign. Let’s work with an example in which each randomly selectedcase has two attributes: a two dimensional model. A point drawn from a two dimensionalNormal would be a pair of real-valued numbers (you can call it a vector or a point if youlike).

Given any vector, such as(0.2432, 0.1211)

or

(−2.2432, 0.1211)

the multivariate Normal density function gives back an indicator of how likely we are todraw that particular pair of values.

The Multi Variate Normal (MVN) distribution has two parameters, but each of thesetwo parameters contains more than one piece of information. One parameter, µ, describesthe “location” or “center point” of the distribution, while the other parameter determinesthe dispersion, the width of the spread from side to side in the observed points.

Figure 1 illustrates the density function of a Normal distribution on two dimensions.

2 Mathematical DescriptionIn the multivariate Normal density, there are two parameters, µ and Σ. The first is an n×1column vector of “location parameters” and an n×n “dispersion matrix” Σ .

1

Figure 1: Multi Variate Normal Distribution

y1

−3

−2

−1

0

1

2

3

y2

−3

−2

−1

0

1

2

3

probability

0.00

0.05

0.10

0.15

0.20

0.25

2

f(yi|µ,Σ) =

= f((yi1,yi2, ...,yin)′ |µ= (µ1, ...,µn)′,Σ =

σ2

1 σ1n. . .

σn1 σ2n

)

= 1√(2π)n|Σ|

exp(−12(yi−µ)′Σ−1(yi−µ)) (1)

The symbol |Σ| refers to the determinant of the matrix Σ. The determinant is a single realnumber. The symbol Σ−1 is the inverse of Σ, a matrix for which ΣΣ−1 = I. This equationassumes that Σ can be inverted, and one sufficient condition for the existence of an inverseis that the determinant is not 0.

The matrix Σ must be positive semi-definite in order to assure that the most likely pointis µ= (µ1,µ2, . . . ,µn) and that, as yi moves away from µ in any direction, then the probabilityof observing yi declines.

The denominator in the formula for the MVN is a normalizing constant, one which assuresus that the distribution integrates to 1.0.

In the last section of this writeup, I have included a little bit to help with the matrixalgebra and the importance of a “weighted cross product” like (yi−µ)Σ−1(yi−µ).

If there is only one dimension, this gives the same value as the ordinary Normal distri-bution, which one will recall is

f(yi|µ,σ2) = 1√2πσ2

exp

[(yi−µ)2

2σ2

]

In the one dimensional case, Σ = σ2, meaning |Σ|= σ2 and Σ = 1/σ2.In the example with two dimensions, the location parameter has two components

µ= (µ1,µ2)

and (again for the two dimensional case) the scale parameter is

Σ =[σ2

1 σ12σ12 σ2

2

]

For the 2×2 case, the determinant and inverse are known:

|Σ|= σ21σ

22−σ2

12 and Σ−1 =

σ22

σ21σ

22−σ2

12

−σ12σ2

1σ22−σ2

12−σ12

σ21σ

22−σ2

12

σ21

σ21σ

22−σ2

12

so you can actually calculate an explicit value “by hand” if you have enough patience.

3

f(yi = (y1,y2)′|µ,Σ)

= 1√(2π)nσ2

1σ22−σ2

12exp(−1

2[yi1−µ1,yi2−µ2]

σ22

σ21σ

22−σ2

12

−σ12σ2

1σ22−σ2

12−σ12

σ21σ

22−σ2

12

σ21

σ21σ

22−σ2

12

[yi1−µ1yi2−µ2

]

At this time, I’m unwilling to write out any more of that. But I encourage you to do it.

3 The Expected Value and VarianceRecall that expected value is defined as a “probability weighted sum of possible outcomes”.

E(y) = µ

V ar(y) = Σ

Because we are considering a multivariate distribution here, it is probably lost in theshuffle that the expected value is calculated by integrating over all dimensions.

If we had a discrete distribution, say p(x1,x2), then the expected value would be calcu-lated by looping over all values of x1 and x2 and calculating a probability weighted sum

E(x1) =n∑i=1

m∑j=1

p(x1i,x2j)x1i

With a probability model defined on continuous variables, the calculation of the expectedvalue requires the use of integrals, but except for the change from notation (Σ to

∫), there

is not major conceptual change.

E(y1) =∫ +∞

−∞

∫ +∞

−∞f(y1,y2)y1

Because the covariance between two variables is the same, whether we calculate σ12 =Covariance(y1,y2) or σ21 = Coviariance(y2,y1), the matrix Σ has to be same on either sideof the diagonal.

As you can see, the matrix is symmetrical, so that the bottom-left element is the sameas the top-right. The matrix is the same as its transpose.

Σ =[σ2

1 σ12σ12 σ2

2

]= Σ′

4

4 Illustrations

4.1 Contour Plots

The parameters that were used to create Figure 1 were µ= (0,0) and Σ =[

1 00 1

]. That is

the bivariate equivalent of the “standardized normal distribution.” A contour plot offers oneway to view this distribution. The equally-probable points are connected by lines. The ideawill be familiar to economists as “indifference curves” and to geographers as “topographicalmaps”. Consider the contours in Figure 2.

4.2 Change the VarianceIf we change one of the diagonal elements in Σ, the effect is to “spread out” the observationson that dimension. See Figure 3.

4.3 The effect of the Covariance parameterUntil this point, we have been considering distributions in which the two variables y1andy2were statistically independent, in the sense that the expected value of y1 was not influencedby y2.

5 “Cross-sections”A conditional density gives the probability of the outcome y = (y1,y2) variable when one ofthese is taken as given. For example,

f((y1,y2)|y2 = 0.75)

reprents the probability of observing y1 when it is known that y1 = 0.75.The Multivariate Normal distribution has the interesting property that the conditional

distributions are basically Normal (Normal up to a constant of proportionality). Figure 6illustrates the MVN when y2 = 0.75. The “total area” under that curve will not equal 1.0,but if we normalize it by finding the area (call that A), then

f(y1|y2 = 0.75)A

is a valid probability density function.

6 Appendix: Matrix Algebra.If you have never studied matrix algebra, I would urge you to do a little study right now.Find any book or one of my handouts.

5

Figure 2: The View from Above Corresponding to Figure 1#image(x1, x2, probX, col = terrain.colors(100), axes = FALSE)

contour ( x1 , x2 , probX)

0.02

0.04

0.06

0.08

0.1

0.12

0.14

−3 −2 −1 0 1 2 3

−3

−2

−1

01

23

6

Figure 3: MV Normal(µ= (0,0),σ2 =[

1 00 3

])

y1

−2

−1

0

1

2

y2

−5

0

5

probability

0.00

0.05

0.10

0.15

0.20

0.25

7

Figure 4: MV Normal(µ= (0,0),σ2 =[

1 .8.8 1

])

y1

−2

−1

0

1

2

y2

−2

−1

0

1

2

probability

0.00

0.05

0.10

0.15

0.20

0.25

8

Figure 5: The View from Above#image(x1, x2, probX, col = terrain.colors(100), axes = FALSE)

contour ( x1 , x2 , probX)

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

0.18

0.2

0.22

0.24

0.26

−2 −1 0 1 2

−2

−1

01

2

9

Figure 6: Cross section of Standard MVN for y2 = 0.75

y1

−2

−1

0

1

2

y2

−2

−1

0

1

2

probability

0.00

0.05

0.10

0.15

0.20

0.25

10

Your objective is to understand enough about multiplication of matrices and vectors sothat you can see the meaning of an expression like the following.

x′Mx

x is an n× 1 column vector and M is an n× n matrix. The result is a single number,and if you write it out term-for-term, you find this a “weighted sum of x’s squares andcross-products.”

(x1,x2, . . . ,xn)

m11 m1n

. . .mn1 mnn

x1x2...xn

= (x1,x2, . . . ,xn)

m11x1 +m12x2 + . . .+m1nxnm21x1 +m22x2 + . . .+m2nxn

...mn1x1 +mn2x2 + . . .+mnnxn

= m11x1x1 +m12x2x1 + . . .+m1nxnx1 +m21x1x2 +m22x2x2 + . . .+m2nxnx2 +. . .+mn1x1xn+mn2x2xn+ . . .+mnnxnxn

=n∑i=1

n∑j=1

mijxixj

If M is diagonal, meaning

M =

m11 0 0 0

0 m22 0 00 0 . . . 00 0 0 mnn

(2)

then there are no “cross products” and the result of x′Mx is simply a sum of weightedx-squared terms.

=n∑i=1

miix2i (3)

A matrix is positive semi-definite if, for all x,

x′Mx≥ 0 (4)

11

If M is positive semi-definite, so is M−1.In the Normal distribution’s density formula, we have:

exp(−12(yi−µ)′Σ−1(yi−µ)) (5)

In that formula, the role of x is played by (yi−µ) and the role of M is played by Σ−1.The Normal distribution requires that Σ is positive semi-definite.

−12(yi−µ)′Σ−1(yi−µ)> 0 (6)

Tidbit, The Eigenvectors (e1, e2, ..., en) of Σ are known as its “principal components”.The principal components give the axes of symmetry of the MVN. If the i’th eigenvalue (theone associated with the i′theigenvector is λi, the length of the ellipsoid on the ith axis isproportional to

√λi (B.Walsh and M. Lynch, Appendix 2, The Multivariate Normal, 2002,

http://nitro.biosci.arizona.edu/zbook/volume_2/chapters/vol2_A2.html

12

Documents

The Multivariate Normal Distribution - Paul Johnson Homepage