- Home
- Documents
*Convergence Concepts Continued - UMN Convergence Concepts Continued 8.1 Multivariate Convergence...*

Click here to load reader

View

1Download

0

Embed Size (px)

Chapter 8

Convergence Concepts Continued

8.1 Multivariate Convergence Concepts

When we covered convergence concepts before (Chapter 6 of these notes), we only did the scalar case because of the semester system. Logically, this chapter goes with Chapter 6, but if we had done it then, students transferring into this section this semester would be lost because the material isn’t in Lindgren. Then we only covered convergence in probability and in distribution of scalar random variables. Now we want to cover the same ground but this time for random vectors. It will also be a good review.

8.1.1 Convergence in Probability to a Constant

Recall that convergence in probability to a constant has a definition (Def- inition 6.1.2 in Chapter 6 of these notes), but we never used the definition. Instead we obtained all of our convergence in probability results, either directly or indirectly from the law of large numbers (LLN).

Now we want to discuss convergence of random vectors, which we can also call multivariate convergence in probability to a constant. It turns out, that the multivariate concept is a trivial generalization of the univariate concept.

Definition 8.1.1 (Convergence in Probability to a Constant). A sequence of random vectors

Xn = (Xn1, . . . , Xnm), n = 1, 2, . . .

converges in probability to a constant vector

a = (a1, . . . , am)

written Xn

P−→ a, as n → ∞

211

212 Stat 5101 (Geyer) Course Notes

if the corresponding components converge in probability, that is, if

Xni P−→ ai, as n → ∞

for each i.

The reader should be warned that this isn’t the usual definition, but it is equivalent to the usual definition (we have defined the usual concept but not in the usual way).

8.1.2 The Law of Large Numbers

The componentwise nature of convergence in probability to a constant makes the multivariate law of large numbers a trivial extension of the univariate law (Theorem 6.3 in Chapter 6 of these notes).

Theorem 8.1 (Multivariate Law of Large Numbers). If X1, X2, . . . is a sequence of independent, identically distributed random vectors having mean vector µ, and

Xn = 1 n

n∑ i=1

Xi

is the sample mean for sample size n, then

Xn P−→ µ, as n → ∞. (8.1)

The only requirement is that the mean µ exist. No other property of the distribution of the Xi matters.

We will use the abbreviation LLN for either theorem. The multivariate theorem is only interesting for giving us a notational shorthand that allows us to write the law of large numbers for all the components at once. It has no mathematical content over and above the univariate LLN’s for each component. Convergence in probability (to a constant) of random vectors says no more than the statement that each component converges. In the case of the LLN, each statement about a component is just the univariate LLN.

8.1.3 Convergence in Distribution

Convergence in distribution is different. Example 8.1.1 below will show that, unlike convergence in probability to a constant, convergence in distribution for random vectors is not just convergence in distribution of each component.

Univariate convergence in distribution has a definition (Theorem 6.1.1 of these notes), but the definition was not used except in Problem 7-7 in Chap- ter 7 of these notes, which is an odd counterexample to the usual behavior of statistical estimators.

Instead we obtained all of our convergence in distribution results, either directly or indirectly, from the central limit theorem (CLT), which is Theo- rem 6.2 of Chapter 6 of these notes. Multivariate convergence in distribution

8.1. MULTIVARIATE CONVERGENCE CONCEPTS 213

has a definition is much the same, we will obtain almost all such results directly or indirectly from the (multivariate) CLT. Hence here we define multivariate convergence in distribution in terms of univariate convergence in distribution.

Definition 8.1.2 (Convergence in Distribution). A sequence of random vectors X1, X2, . . . converges in distribution to a random vector X, written

Xn D−→ X, as n → ∞.

if

t′Xn D−→ t′X, for all constant vectors t. (8.2)

Again, the reader should be warned that this isn’t the usual definition, but it is equivalent to the usual definition (we have defined the usual concept but not in the usual way, the equivalence of our definition and the usual definition is called the Cramér-Wold Theorem).

This shows us in what sense the notion of multivariate convergence in dis- tribution is determined by the univariate notion. The multivariate convergence in distribution Xn

D−→ X happens if and only if the univariate convergence in distribution t′Xn

D−→ t′X happens for every constant vector t. The following example shows that convergence in distribution of each component of a random vector is not enough to imply convergence of the vector itself.

Example 8.1.1. Let Xn = (Un, Vn) be defined as follows. Define Un to be standard normal for all n. Then trivially Un

D−→ N (0, 1). Define Vn = (−1)nUn for all n. Then Vn is also standard normal for all n, and hence trivially Vn

D−→ N (0, 1). Thus both components of Xn converge in distribution. But if t′ =

( 1 1

) , then

t′Xn = Un + Vn =

{ 2Un, n even 0, n odd

This clearly does not converge in distribution, since the even terms all have one distribution, N (0, 4) (and hence trivially converge to that distribution), and the odd terms all have another, the distribution concentrated at zero (and hence trivially converge to that distribution).

Thus, unlike convergence in probability to a constant, multivariate conver- gence in distribution entails more than univariate convergence of each com- ponent. Another way to say the same thing is that marginal convergence in distribution does not imply joint convergence in distribution. Of course, the converse does hold: joint convergence in distribution does imply marginal con- vergence in distribution (just take a vector t in the definition having all but one component equal to zero).

214 Stat 5101 (Geyer) Course Notes

8.1.4 The Central Limit Theorem

We can now derive the multivariate central limit theorem from the univariate theorem (Theorem 6.2 of Chapter 6 of these notes).

Theorem 8.2 (Multivariate Central Limit Theorem). If X1, X2, . . . is a sequence of independent, identically distributed random vectors having mean vector µ and variance matrix M and

Xn = 1 n

n∑ i=1

Xi

is the sample mean for sample size n, then √

n ( Xn − µ

) D−→ N (0,M). (8.3) The only requirement is that second moments (the elements of M) exist (this

implies first moments also exist by Theorem 2.44 of Chapter 2 of these notes). No other property of the distribution of the Xi matters.

We often write the univariate CLT as

Xn ≈ N (

µ, σ2

n

) (8.4)

and the multivariate CLT as

Xn ≈ N (

µ, M n

) (8.5)

These are simpler to interpret (though less precise and harder to use theoret- ically). We often say that the right hand side of one of these equations is the asymptotic distribution of the left hand side.

Derivation of the multivariate CLT from the univariate. For each constant vec- tor t, the scalar random variables t′(Xn − µ) have mean 0 and variance t′Mt and hence obey the univariate CLT

√ nt′

( Xn − µ

) D−→ N (0, t′Mt). The right hand side is the distribution of t′Z where Z ∼ N (0,M), hence (8.3) follows by our definition of multivariate convergence in distribution.

Example 8.1.2. (This continues Example 5.1.1.) Let X1, X2, . . . be a sequence of i. i. d. random variables, and define random vectors

Zi = (

Xi X2i

) Then Z1, Z2, . . . is a sequence of i. i. d. random vectors having mean vector µ given by (5.6) and variance matrix M given by (5.7), and the CLT applies

√ n

( Zn − µ

) D−→ N (0,M).

8.1. MULTIVARIATE CONVERGENCE CONCEPTS 215

8.1.5 Slutsky and Related Theorems

As with univariate convergence in distribution, we are forced to state a num- ber of theorems about multivariate convergence in distribution without proof. The proofs are just too hard for this course.

Theorem 8.3. If a is a constant, then Xn D−→ a if and only if Xn P−→ a.

Thus, as was true in the univariate case, convergence in probability to a constant and convergence in distribution to a constant are equivalent concepts. We could dispense with one, but tradition and usage do not allow it. We must be able to recognize both in order to read the literature.

Theorem 8.4 (Slutsky). If g(x,y) is a function jointly continuous at every point of the form (x,a) for some fixed a, and if Xn

D−→ X and Yn P−→ a, then

g(Xn,Yn) D−→ g(X,a).

The function g here can be either scalar or vector valued. The continuity hypothesis means that g(xn,yn) → g(x,a) for all nonrandom sequences xn → x and yn → a.

Sometimes Slutsky’s theorem is used in a rather trivial way with the sequence “converging in probability” being nonrandom. This uses the following lemma.

Lemma 8.5. If an → a considered as a nonrandom sequence, then an P−→ a considered as a sequence of constant random vectors.

This is an obvious consequence of the definition of convergence in probability (Definition 6.1.2 in Chapter 6 of th