27
Chapter . Multivariate Distributions . Bivariate Distributions. For a pair r.v.s (X , Y ), the Joint CDF is dened as F X , Y (x , y ) = P (X x , Y y ). Obviously, the marginal distributions may be obtained easily from the joint distribution: F X (x ) = P (X x ) = P (X x , Y < ∞) = F X , Y (x , ∞) , and F Y ( y ) = F X , Y (∞, y ). Covariance and correlation of X and Y : Cov(X , Y ) = E {(X - EX )( Y - EY )} = E (XY )-(EX )(EY ) , Corr(X , Y ) = Cov(X , Y ) p Var(X )Var( Y ).

Chapter 4. Multivariate Distributions 4.1 Bivariate ... · But f„x;y”, fX„x”fY„y”, hence they are not independent. 4.2 Conditional Distributions If XandYare not independent,

  • Upload
    others

  • View
    10

  • Download
    1

Embed Size (px)

Citation preview

Chapter 4. Multivariate Distributions

4.1 Bivariate Distributions.

For a pair r.v.s (X ,Y ), the Joint CDF is de�ned as

FX ,Y (x , y ) = P (X ≤ x ,Y ≤ y ).

Obviously, the marginal distributions may be obtained easily from thejoint distribution:

FX (x ) = P (X ≤ x ) = P (X ≤ x ,Y < ∞) = FX ,Y (x ,∞),

and FY (y ) = FX ,Y (∞, y ).

Covariance and correlation of X andY :

Cov(X ,Y ) = E {(X − EX )(Y − EY )} = E (XY ) − (EX )(EY ),Corr(X ,Y ) = Cov(X ,Y )

/√Var(X )Var(Y ).

Discrete bivariate distributions

If X takes discrete values x1, · · · , xm andY takes discrete values y1, · · · , yn ,their joint probability function may be presented in a table:

X \Y y1 y2 · · · ynx1 p11 p12 · · · p1n p1·x2 p21 p22 · · · p2n p2·

· · · · · ·

xm pm1 p22 · · · pmn pm ·p ·1 p ·2 · · · p ·n

where pi j = P (X = xi ,Y = yj ), and

pi · = P (X = xi ) =n∑j=1

P (X = xi ,Y = yj ) =∑j

pi j ,

p ·j = P (Y = yj ) =m∑i=1

P (X = xi ,Y = yj ) =∑i

pi j .

In general, pi j , pi · × p ·j . However if pi j = pi · × p ·j for all i and j , X andYare independent, i.e.

P (X = xi ,Y = yj ) = P (X = xi ) × P (Y = yj ), [i , j .

For independent X andY , Cov(X ,Y ) = 0.

Example 1. Flip a fair coin two times. Let X = 1 if H occurs in the �rst �ip,and 0 if T occurs in the �rst �ip. LetY = 1 if the outcomes in the two �ipsare the same, and 0 if the two outcomes are di�erent. The joint probabilityfunction is

X \Y 1 01 1/4 1/4 1/20 1/4 1/4 1/2

1/2 1/2

It is easy to see that X and Y are independent, which is a bit counter-intuitive.

Continuous bivariate distribution

If the CDF FX ,Y can be written as

FX ,Y (x , y ) =

∫ y

−∞

∫ x

−∞fX ,Y (u,v )dudv for any x and y ,

where fX ,Y ≥ 0, (X ,Y ) has a continuous joint distribution, and fX ,Y (x , y )is the joint PDF.

As FX ,Y (∞,∞) = 1, it holds that∫ ∞−∞

∫ ∞−∞

fX ,Y (u,v )dudv = 1.

In fact„ any non-negative function satisfying this condition is a PDF.Furthermore for any subset A in R2,

P {(X ,Y ) ∈ A} =

∫AfX ,Y (x , y )dxdy .

Also

Cov(X ,Y ) =∫(x − EX )(y − EY )fX ,Y (x , y )dxdy

=

∫x yfX ,Y (x , y )dxdy − EX EY .

Note that

FX (x ) = FX ,Y (x ,∞) =

∫ ∞−∞

∫ x

−∞fX ,Y (u,v )dudv =

∫ x

−∞

{ ∫ ∞−∞

fX ,Y (u,v )dv}du,

hence the marginal PDF of X can be derived from the joint PDF as follows

fX (x ) =

∫ ∞−∞

fX ,Y (x , y )dy .

Similarly, fy (y ) =∫ ∞−∞

fX ,Y (x , y )dx .

Note. Di�erent from discrete cases, it is not always easy to work outmarginal PDFs from joint PDFs, especially when PDFs are discontinuous.

When fX ,Y (x , y ) = fX (x )fy (y ) for any x and y , X andY are independent,as then

P (X ≤ x ,Y ≤ y ) =

∫ y

−∞

∫ x

−∞fX ,Y (u,v )dudv =

∫ y

−∞

∫ x

−∞fX (u)fY (v )dudv

=

∫ x

−∞fX (u)du

∫ y

−∞fY (v )dv = P (X ≤ x )P (Y ≤ y ),

and also Cov(X ,Y ) = 0.

Example 2. Uniform distribution on unit square – U [0, 1]2.

f (x , y ) =

{1 0 ≤ x ≤ 1, 0 ≤ y ≤ 10 otherwise.

This is well-de�ned PDF, as f ≥ 0 and∫ ∫

f (x , y )dxdy = 1. It is easy to seethat X andY are independent. Let us calculate some probabilities

P (X < 1/2,Y < 1/2) = F (1/2, 1/2) =

∫ 1/2

−∞

∫ 1/2

−∞fX ,Y (x , y )dxdy

=

∫ 1/2

0

∫ 1/2

0dxdy = 1/4.

P (X +Y < 1) =

∫{x+y<1}

fX ,Y (x , y )dxdy =

∫{x>0, y>0, x+y<1}

dxdy

=

∫ 1

0dy

∫ 1−y

0dx =

∫ 1

0(1 − y )dy = 1/2.

Example 3. Let (X ,Y ) have the joint PDF

f (x , y ) =

{x2 + x y/3 0 ≤ x ≤ 1, 0 ≤ y ≤ 2,0 otherwsie.

Calculate P (0 < X < 1/2, 1/4 < Y < 3) and P (X < Y ). Are X and Yindependent with each other?

P (0 < X < 1/2, 1/4 < Y < 3) = P (0 < X < 1/2, 1/4 < Y < 2)

=

∫ 2

1/4dy

∫ 1/2

0(x2 +

x y

3)dx =

∫ 2

1/4

1 + y

24dy =

1.75

24+y 2

48

��21/4 = 0.155.

P (X < Y ) =

∫ 1

0dx

∫ 2

x(x2+

x y

3)dy =

∫ 1

0(2

3x+2x2−

7

6x3)dx = 17/24 = 0.708.

fX (x ) =

∫ 2

0(x2 +

x y

3)dy = 2x2 +

2x

3, fY (y ) =

∫ 1

0(x2 +

x y

3)dx =

1

3+y

6.

Both fX (x ) and fY (y ) are well-de�ned PDFs.

But f (x , y ) , fX (x )fY (y ), hence they are not independent.

4.2 Conditional Distributions

If X andY are not independent, knowing X should be helpful in deter-mining Y , as X may carry some information on Y . Therefore it makessense to de�ne the distribution ofY given, say, X = x . This is the conceptof conditional distributions.

If both X andY are discrete, the conditional probability function is simplya special case of conditional probabilities:

P (Y = y |X = x ) = P (Y = y , X = x )/P (X = x ).

However this de�nition does not extend to continuous r.v.s, as thenP (X = x ) = 0.

De�nition (Conditional PDF). For continuous r.v.s X andY , the conditionalPDF ofY given X = x is

fY |X (·|x ) = fX ,Y (x , ·)/fX (x ).

Remark. (i) As a function of y , fY |X (y |x ) is a PDF:

P (Y ∈ A |X = x ) =

∫AfY |X (y |x )dy ,

while x is treated as a constant (i.e. not a variable).

(ii) E (Y |X = x ) =∫yfY |X (y |x )dy is a function of x , and

Var(Y |X = x ) =

∫{y − E (Y |X = x )}2fY |X (y |x )dy .

(iii) If X andY are independent, fY |X (y |x ) = fY (y ).

(iv) fX ,Y (x , y ) = fX (x )fY |X (y |x ) = fX |Y (x |y )fY (y ), which o�ers alternativeways to determine the joint PDF.

(v) E {E (Y |X )} = E (Y ) — This in fact holds for any r.v.s X andY . We give aproof here for continuous r.v.s only:

E {E (Y |X )} =

∫ { ∫yfY |X (y |x )dy

}fX (x )dx =

∫ ∫yfX ,Y (x , y )dxdy

=

∫y{ ∫

fX ,Y (x , y )dx}dy =

∫yfY (y )dy = EY .

Example 4. Let fX ,Y (x , y ) = e−y for 0 < x < y < ∞, and 0 otherwise. FindfY |X (y |x ), fX |Y (x |y ) and Cov(X ,Y ).

We need to �nd fX (x ), fY (y ) �rst:

fX (x ) =

∫fX ,Y (x , y )dy =

∫ ∞x

e−ydy = e−x x ∈ (0,∞),

fY (y ) =

∫fX ,Y (x , y )dx =

∫ y

0e−ydx = ye−y y ∈ (0,∞).

Hence

fY |X (y |x ) =fX ,Y (x , y )

fX (x )= e−(y−x ) y ∈ (x ,∞),

fX |Y (x |y ) =fX ,Y (x , y )

fY (y )= 1/y x ∈ (0, y ).

Note that givenY = y , X ∼ U (0, y ), i.e. the uniform distribution on (0, y ).

To �nd Cov(X ,Y ), we compute EX , EY and E (XY ) �rst.

EX =

∫xfX (x )dx =

∫ ∞0

xe−xdx = −e−x (1 + x )��∞0 = 1,

EY =

∫yfY (y )dy =

∫ ∞0

y 2e−ydy = −y 2e−y��∞0 + 2

∫ ∞0

ye−ydy = 2,

E (XY ) =

∫x yfX ,Y (x , y )dxdy =

∫ ∞0

dy

∫ y

0x ye−ydx =

1

2

∫ ∞0

y 3e−ydy

= −1

2y 3e−y

��∞0 +

3

2

∫ ∞0

y 2e−ydy = 3.

Hence Cov(X ,Y ) = E (XY ) − (EX )(EY ) = 3 − 2 = 1.

4.3 Multivariate Distributions

Let X = (X1, · · · ,Xn)′ be a random vector (r.v.) consisting of n r.v.s. The

joint CDF is de�ned as

F (x1, · · · , xn) ≡ FX1,··· ,Xn (x1, · · · , xn) = P (X1 ≤ x1, · · · , Xn ≤ xn).

If X is continuous, its PDF f satis�es

F (x1, · · · , xn) =

∫ xn

−∞· · ·

∫ x1

−∞f (u1, · · · ,un)du1 · · · dun .

In general, the PDF admits the factorisation

f (x1, · · · , xn) = f (x1)f (x2 |x1)f (x3 |x1, x2) · · · f (xn |x1, · · · , xn−1),

where f (xj |x1, · · · , xj−1) denotes the conditional PDF of Xj given X1 =

x1, · · · ,Xj−1 = xj .

However, when X1, · · · ,Xn are independent,

fX1,··· ,Xn (x1, · · · , xn) = fX1(x1) · · · fXn (xn).

IID Samples. If X1, · · · ,Xn are independent and each has the same CDF F ,we say that X1, · · · ,Xn are IID (independent and identically distributed)and write

X1, · · · ,Xn ∼iid F .

We also call X1, · · · ,Xn a sample or a random sample.

4.3 Two important multivariate distributions

Multinomial DistributionMultinomial(n, p1, · · · , pk )—an extension of Bin(n, p).

Suppose we threw a k -sided die n times, record Xi as the number of timesended with the i -th side, i = 1, · · · , k . Then

(X1, · · · ,Xk ) ∼ Multinomial(n, p1, · · · , pk ),

where pi is the probability of the event that the i -th side occurs in onethrew. Obviously pi ≥ 0 and

∑i pi = 1.

We may immediately make the following observation from the abovede�nition.

(i) X1 + · · · + Xk ≡ n , therefore X1, · · · ,Xn are not independent.

(ii) Xi ∼ Bin(n, pi ), hence EXi = npi and Var(Xi ) = npi (1 − pi ).

The joint probability function for Multinomial(n, p1, · · · , pk ):

For any j1, · · · , jk ≥ 0 and j1 + · · · + jk = n ,

P (X1 = j1, · · · ,Xk = jk ) =n !

j1! · · · jk !pj11· · · p

jkk.

Multivariate Normal DistributionN (µ, Σ): a k -variable r.v. X = (X1, · · · ,Xk )′

is normal with mean µ and covariance matrix Σ if its PDF is

f (x) = 1

(2π)k /2 |Σ |1/2exp

{−1

2(x − µ)′Σ−1(x − µ)

}x ∈ R k ,

where µ is k -vector, and Σ is a k × k positive-de�nite matrix.

Some properties of N (µ, Σ): Let µ = (µ1, · · · , µk )′ and Σ ≡ (σi j ), then

(i) EX = µ, and the covariance matrix

Cov(X) = E {(X − µ)(X − µ)′} = Σ,

and

σi j = Cov(Xi ,Xj ) = E {(Xi − µi )(Xj − µj )}.

(ii) When σi j = 0 for all i , j , i.e. the components of X are uncorrelated,Σ = diag(σ11, · · · ,σk k ), |Σ | =

∏i σi i . Hence the PDF admits a simple form

f (x) =k∏i=1

1√2πσi i

exp{− 1

2σi i(xi − µi )

2}.

Thus X1, · · · ,Xn are independent when σi j = 0 for all i , j .

(iii) Let Y = AX + b, where A is a constant matrix and b is a constant vector.Then Y ∼ N (Aµ + b, AΣA′).

(iv) Xi ∼ N (µi ,σi i ). For any constant k -vector a, a′X is a scale r.v. anda′X ∼ N (a′µ, a′Σa).

(v). Standard Normal Distribution: N (0, Ik), where Ik is the k × k identitymatrix.

Example 5. Let X1,X2,X3 be jointly normal with the common mean 0,variance 1 and

Corr(Xi ,Xj ) = 0.5, 1 ≤ i , j ≤ 3.

Find the probability P (|X1 | + |X2 | + |X3 | ≤ 2).

It is di�cult to calculate this probability by the integration of the jointPDF. We provide an estimate by simulation. We solve a general problem�rst.

Let X ∼ N (µ, Σ), X has p component. For any set A ⊂ Rp , we may estimatethe probability P (X ∈ A) by the relative frequency

#{1 ≤ i ≤ n : Xi ∈ A}/n,

where n is a large integer, and X1, · · · , Xn are n vectors generated indepen-dently from N (µ, Σ).

Note

X = µ + Σ1/2Z,

where Z ∼ N (0, Ip) is standard normal, and Σ1/2 ≥ 0 and Σ1/2Σ1/2 = Σ.We generate Z by rnorm(p), and apply the above linear transformation toobtain X.

Σ1/2 may be obtained by an eigenanalysis for Σ using R-function eigen.Since Σ ≥ 0, it holds that

Σ = ΓΛΓ′,

where Γ is an orthogonal matrix (i.e. Γ′Γ = Ip), Λ = diag(λ1, · · · , λp) is adiagonal matrix. Then

Σ1/2 = ΓΛ1/2Γ′, where Λ1/2 = diag(√λ1, · · · ,

√λp

).

The R function rMNorm below generate random vectors from N (µ, Σ).

rMNorm <- function(n, p, mu, Sigma) {# generate n p-vectors from N(mu, Sigma)# mu is p-vector of mean, Sigma >=0 is pxp matrix

t <- eigen(Sigma, symmetric=T) # eigenanalysis for Sigmaev <- sqrt(t$values) # square-roots of the eigenvaluesG <- as.matrix(t$vectors) # line up eigenvectors into a matrix GD <- G*0; for(i in 1:p) D[i,i] <- ev[i]; # D is diagonal matrixP <- G%*%D%*%t(G) # P=GDG' is the required transformation matrixZ <- matrix(rnorm(n*p), byrow=T, ncol=p)# Z is nxp matrix with elements drawn from N(0,1)

Z <- Z%*%P # Now each row of Z is N(0, Sigma)X <- matrix(rep(mu, n), byrow=T, ncol=p) + Z

# each row of X is N(mu, Sigma)}

This function is saved in the �le ‘rMNorm.r’. We may use it to perform therequired task:

source("rMNorm.r")

mu <- c(0, 0, 0)Sigma <- matrix(c(1,0.5,0.5,0.5,1,0.5,0.5,0.5,1), byrow=T, ncol=3)X <- rMNorm(20000, 3, mu, Sigma)dim(X) # check the size of Xt <- abs(X[,1]) + abs(X[,2]) + abs(X[,3])cat("Estimated probability:", length(t[t<=2])/20000, "\n")

It returned the value:

Estimated probability: 0.446

I repeated it a few more times and obtained the estimates 0.439, 0.445,0.441 etc.

4.4 Transformations of random variables

Let a random vector X have PDF fX. We are interested in the distributionof a scalar function of X, say,Y = r (X). We introduce a general procedure�rst.

Three steps to �nd the PDF ofY = r (X):

(i) For each y , �nd the set Ay = {x : r (x) ≤ y }(ii) Find the CDF

FY (y ) = P (Y ≤ y ) = P {r (X) ≤ y } =∫AyfX(x)dx.

(iii) fY (y ) = ddy FY (y ).

Example 6. Let X ∼ fX (x ) (X is a scalar). Find the PDF ofY = eX .

Ay = {x : ex ≤ y } = {x : x ≤ log y }. Hence

FY (y ) = P (Y ≤ y ) = P {eX ≤ y } = P (X ≤ log y ) = FX (log y ).

Hence

fY (y ) =d

dyFX (log y ) = fX (log y )

d log ydy

= y−1fX (log y ).

Note that y = ex and log y = x , dydx = ex = y . The above result can bewritten as

fY (y ) = fX (x )/dydx, or fY (y )dy = fX (x )dx .

For 1-1 transformationY = r (X ) (i.e. the inverse function X = r−1(Y ) isuniquely de�ned), it holds that

fY (y ) = fX (x )/|r′(x )| = fX (x )

��dxdy

��.Note. You should replace all x in the above by x = r−1(y ).

Example 7. Let X ∼ U (−1, 3). Find the PDF ofY = X 2. Now this is not a 1-1transformation. We have to use the general 3-step procedure.

Note thatY takes values in (0, 9). Consider two cases:

(i) For y ∈ (0, 1), Ay = (−√y ,√y ), Fy (y ) =

∫AyfX (x )dx = 0.5

√y . Hence

fY (y ) = F′Y(y ) = 0.25/

√y .

(ii) For y ∈ [1, 9), Ay = (−1,√y ), Fy (y ) =

∫AyfX (x )dx = 0.25(

√y + 1). Hence

fY (y ) = F′Y(y ) = 0.125/

√y .

Collectively we have

fY (y ) =

0.25/

√y 0 < y < 1

0.125/√y 1 ≤ y < 9

0 otherwise.