18.175: Lecture 7 .1in Zero-one laws and maximal inequalitiesmath.mit.edu/~sheffield/2016175/Lecture7.pdf · 18.175: Lecture 7 Zero-one laws and maximal inequalities Scott She eld

18.175: Lecture 7

Zero-one laws and maximal inequalities

Scott Sheffield

MIT

18.175 Lecture 7

Outline

Borel-Cantelli applications

Strong law of large numbers

Kolmogorov zero-one law and three-series theorem

18.175 Lecture 7

Outline




18.175 Lecture 7

Borel-Cantelli lemmas

I First Borel-Cantelli lemma: If∑∞

n=1 P(An) <∞ thenP(An i.o.) = 0.

I Second Borel-Cantelli lemma: If An are independent, then∑∞n=1 P(An) =∞ implies P(An i.o.) = 1.

18.175 Lecture 7

Borel-Cantelli lemmas

I First Borel-Cantelli lemma: If∑∞

n=1 P(An) <∞ thenP(An i.o.) = 0.

I Second Borel-Cantelli lemma: If An are independent, then∑∞n=1 P(An) =∞ implies P(An i.o.) = 1.

18.175 Lecture 7

Convergence in probability subsequential a.s. convergence

I Theorem: Xn → X in probability if and only if for everysubsequence of the Xn there is a further subsequenceconverging a.s. to X .

I Main idea of proof: Consider event En that Xn and X differby ε. Do the En occur i.o.? Use Borel-Cantelli.

18.175 Lecture 7

Convergence in probability subsequential a.s. convergence

I Theorem: Xn → X in probability if and only if for everysubsequence of the Xn there is a further subsequenceconverging a.s. to X .

I Main idea of proof: Consider event En that Xn and X differby ε. Do the En occur i.o.? Use Borel-Cantelli.

18.175 Lecture 7

Pairwise independence example

I Theorem: Suppose A1,A2, . . . are pairwise independent and∑P(An) =∞, and write Sn =

∑ni=1 1Ai

. Then the ratioSn/ESn tends a.s. to 1.

I Main idea of proof: First, pairwise independence impliesthat variances add. Conclude (by checking term by term) thatVarSn ≤ ESn. Then Chebyshev implies

P(|Sn − ESn| > δESn) ≤ Var(Sn)/(δESn)2 → 0,

which gives us convergence in probability.

I Second, take a smart subsequence. Letnk = inf{n : ESn ≥ k2}. Use Borel Cantelli to get a.s.convergence along this subsequence. Check that convergencealong this subsequence deterministically implies thenon-subsequential convergence.

18.175 Lecture 7



∑ni=1 1Ai






18.175 Lecture 7



∑ni=1 1Ai






18.175 Lecture 7

Outline




18.175 Lecture 7

Outline




18.175 Lecture 7

General strong law of large numbers

I Theorem (strong law): If X1,X2, . . . are i.i.d. real-valuedrandom variables with expectation m and An := n−1

∑ni=1 Xi

are the empirical means then limn→∞ An = m almost surely.

18.175 Lecture 7

Proof of strong law assuming E [X 4] <∞

I Assume K := E [X 4] <∞. Not necessary, but simplifies proof.

I Note: Var[X 2] = E [X 4]− E [X 2]2 ≥ 0, so E [X 2]2 ≤ K .

I The strong law holds for i.i.d. copies of X if and only if itholds for i.i.d. copies of X − µ where µ is a constant.

I So we may as well assume E [X ] = 0.

I Key to proof is to bound fourth moments of An.

I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].

I Expand (X1 + . . .+ Xn)4. Five kinds of terms: XiXjXkXl andXiXjX

2k and XiX

3j and X 2

i X2j and X 4

i .

I The first three terms all have expectation zero. There are(n2

)of the fourth type and n of the last type, each equal to at

most K . So E [A4n] ≤ n−4

(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞

(and hence An → 0) with probability 1.

18.175 Lecture 7







I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].


2k and XiX

3j and X 2

i X2j and X 4

i .




(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞


18.175 Lecture 7







I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].


2k and XiX

3j and X 2

i X2j and X 4

i .




(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞


18.175 Lecture 7







I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].


2k and XiX

3j and X 2

i X2j and X 4

i .




(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞


18.175 Lecture 7







I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].


2k and XiX

3j and X 2

i X2j and X 4

i .




(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞


18.175 Lecture 7







I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].


2k and XiX

3j and X 2

i X2j and X 4

i .




(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞


18.175 Lecture 7







I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].


2k and XiX

3j and X 2

i X2j and X 4

i .




(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞


18.175 Lecture 7







I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].


2k and XiX

3j and X 2

i X2j and X 4

i .




(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞


18.175 Lecture 7







I E [A4n] = n−4E [S4

n ] = n−4E [(X1 + X2 + . . .+ Xn)4].


2k and XiX

3j and X 2

i X2j and X 4

i .




(6(n2

)+ n)K .

I Thus E [∑∞

n=1 A4n] =

∑∞n=1 E [A4

n] <∞. So∑∞

n=1 A4n <∞


18.175 Lecture 7

General proof of strong law

I Suppose Xk are i.i.d. with finite mean. Let Yk = Xk1|Xk |≤k .Write Tn = Y1 + . . .+ Yn. Claim: Xk = Yk all but finitelyoften a.s. so suffices to show Tn/n→ µ. (Borel Cantelli,expectation of positive r.v. is area between cdf and line y = 1)

I Claim:∑∞

k=1Var(Yk)/k2 ≤ 4E |X1| <∞. How to prove it?I Observe: Var(Yk) ≤ E (Y 2

k ) =∫∞0 2yP(|Yk | > y)dy ≤∫ k

0 2yP(|X1| > y)dy . Use Fubini (interchange sum/integral,since everything positive)

∞∑k=1

E (Y 2k )/k2 ≤

∞∑k=1

k−2∫ ∞0

1(y<k)2yP(|X1| > y)dy =

∫ ∞0

( ∞∑k=1

k−21(y<k)

)2yP(|X1| > y)dy .

Since E |X1| =∫∞0 P(|X1| > y)dy , complete proof of claim by

showing that if y ≥ 0 then 2y∑

k>y k−2 ≤ 4.

18.175 Lecture 7



I Claim:∑∞

k=1Var(Yk)/k2 ≤ 4E |X1| <∞. How to prove it?

I Observe: Var(Yk) ≤ E (Y 2k ) =

∫∞0 2yP(|Yk | > y)dy ≤∫ k


∞∑k=1

E (Y 2k )/k2 ≤

∞∑k=1

k−2∫ ∞0

1(y<k)2yP(|X1| > y)dy =

∫ ∞0

( ∞∑k=1

k−21(y<k)

)2yP(|X1| > y)dy .



k>y k−2 ≤ 4.

18.175 Lecture 7



I Claim:∑∞

k=1Var(Yk)/k2 ≤ 4E |X1| <∞. How to prove it?I Observe: Var(Yk) ≤ E (Y 2

k ) =∫∞0 2yP(|Yk | > y)dy ≤∫ k


∞∑k=1

E (Y 2k )/k2 ≤

∞∑k=1

k−2∫ ∞0

1(y<k)2yP(|X1| > y)dy =

∫ ∞0

( ∞∑k=1

k−21(y<k)

)2yP(|X1| > y)dy .



k>y k−2 ≤ 4.

18.175 Lecture 7


I Claim:∑∞

k=1Var(Yk)/k2 ≤ 4E |X1| <∞. How to use it?

I Consider subsequence k(n) = [αn] for arbitrary α > 1. UsingChebyshev, if ε > 0 then

∞∑n=1

P(|Tk(n) − ETk(n)| > εk(n)) ≤ ε−1

∞∑n=1

Var(Tk(n))/k(n)2

= ε−2∞∑n=1

k(n)−2k(n)∑m=1

Var(Ym) = ε−2∞∑

m=1

Var(Ym)∑

n:k(n)≥m

k(n)−2.

I Sum series:∑n:αn≥m[αn]−2 ≤ 4

∑n:αn≥m α

−2n ≤ 4(1− α−2)−1m−2.I Combine computations (observe RHS below is finite):∞∑n=1

P(|Tk(n)−ETk(n)| > εk(n)) ≤ 4(1−α−2)−1ε−2∞∑

m=1

E (Y 2m)m−2.

I Since ε is arbitrary, get (Tk(n) − ETk(n))/k(n)→ 0 a.s.

18.175 Lecture 7


I Claim:∑∞

k=1Var(Yk)/k2 ≤ 4E |X1| <∞. How to use it?I Consider subsequence k(n) = [αn] for arbitrary α > 1. Using

Chebyshev, if ε > 0 then

∞∑n=1

P(|Tk(n) − ETk(n)| > εk(n)) ≤ ε−1

∞∑n=1

Var(Tk(n))/k(n)2

= ε−2∞∑n=1

k(n)−2k(n)∑m=1


m=1

Var(Ym)∑

n:k(n)≥m

k(n)−2.


∑n:αn≥m α


P(|Tk(n)−ETk(n)| > εk(n)) ≤ 4(1−α−2)−1ε−2∞∑

m=1

E (Y 2m)m−2.


18.175 Lecture 7


I Claim:∑∞



∞∑n=1

P(|Tk(n) − ETk(n)| > εk(n)) ≤ ε−1

∞∑n=1

Var(Tk(n))/k(n)2

= ε−2∞∑n=1

k(n)−2k(n)∑m=1


m=1

Var(Ym)∑

n:k(n)≥m

k(n)−2.


∑n:αn≥m α

−2n ≤ 4(1− α−2)−1m−2.

I Combine computations (observe RHS below is finite):∞∑n=1

P(|Tk(n)−ETk(n)| > εk(n)) ≤ 4(1−α−2)−1ε−2∞∑

m=1

E (Y 2m)m−2.


18.175 Lecture 7


I Claim:∑∞



∞∑n=1

P(|Tk(n) − ETk(n)| > εk(n)) ≤ ε−1

∞∑n=1

Var(Tk(n))/k(n)2

= ε−2∞∑n=1

k(n)−2k(n)∑m=1


m=1

Var(Ym)∑

n:k(n)≥m

k(n)−2.


∑n:αn≥m α


P(|Tk(n)−ETk(n)| > εk(n)) ≤ 4(1−α−2)−1ε−2∞∑

m=1

E (Y 2m)m−2.


18.175 Lecture 7


I Claim:∑∞



∞∑n=1

P(|Tk(n) − ETk(n)| > εk(n)) ≤ ε−1

∞∑n=1

Var(Tk(n))/k(n)2

= ε−2∞∑n=1

k(n)−2k(n)∑m=1


m=1

Var(Ym)∑

n:k(n)≥m

k(n)−2.


∑n:αn≥m α


P(|Tk(n)−ETk(n)| > εk(n)) ≤ 4(1−α−2)−1ε−2∞∑

m=1

E (Y 2m)m−2.

I Since ε is arbitrary, get (Tk(n) − ETk(n))/k(n)→ 0 a.s.18.175 Lecture 7


I Conclude by taking α→ 1. This finishes the case that the X1

are a.s. positive.

I Can extend to the case that X1 is a.s. positive with infinitemean.

I Generally, can consider X+1 and X−1 , and it is enough if one of

them has a finite mean.

18.175 Lecture 7



are a.s. positive.




18.175 Lecture 7



are a.s. positive.




18.175 Lecture 7

Outline




18.175 Lecture 7

Outline




18.175 Lecture 7

Kolmogorov zero-one law

I Consider sequence of random variables Xn on some probabilityspace. Write F ′n = σ(Xn,Xn1 , . . .) and T = ∩nF ′n.

I T is called the tail σ-algebra. It contains the information youcan observe by looking only at stuff arbitrarily far into thefuture. Intuitively, membership in tail event doesn’t changewhen finitely many Xn are changed.

I Event that Xn converge to a limit is example of a tail event.Other examples?

I Theorem: If X1,X2, . . . are independent and A ∈ T thenP(A) ∈ {0, 1}.

18.175 Lecture 7






18.175 Lecture 7






18.175 Lecture 7






18.175 Lecture 7

Kolmogorov zero-one law proof idea


I Main idea of proof: Statement is equivalent to saying that Ais independent of itself, i.e., P(A) = P(A ∩ A) = P(A)2. Howdo we prove that?

I Recall theorem that if Ai are independent π-systems, thenσAi are independent.

I Deduce that σ(X1,X2, . . . ,Xn) and σ(Xn+1,Xn+1, . . .) areindependent. Then deduce that σ(X1,X2, . . .) and T areindependent, using fact that ∪kσ(X1, . . . ,Xk) and T areπ-systems.

18.175 Lecture 7






18.175 Lecture 7






18.175 Lecture 7






18.175 Lecture 7

Kolmogorov maximal inequality

I Theorem: Suppose Xi are independent with mean zero andfinite variances, and Sn =

∑ni=1 Xn. Then

P( max1≤k≤n

|Sk | ≥ x) ≤ x−2Var(Sn) = x−2E |Sn|2.

I Main idea of proof: Consider first time maximum isexceeded. Bound below the expected square sum on thatevent.

18.175 Lecture 7

Kolmogorov maximal inequality

I Theorem: Suppose Xi are independent with mean zero andfinite variances, and Sn =

∑ni=1 Xn. Then

P( max1≤k≤n

|Sk | ≥ x) ≤ x−2Var(Sn) = x−2E |Sn|2.

I Main idea of proof: Consider first time maximum isexceeded. Bound below the expected square sum on thatevent.

18.175 Lecture 7

Kolmogorov three-series theorem

I Theorem: Let X1,X2, . . . be independent and fix A > 0.Write Yi = Xi1(|Xi |≤A). Then

∑Xi converges a.s. if and only

if the following are all true:

I∑∞

n=1 P(|Xn| > A) <∞I∑∞

n=1 EYn convergesI∑∞

n=1 Var(Yn) <∞I Main ideas behind the proof: Kolmogorov zero-one law

implies that∑

Xi converges with probability p ∈ {0, 1}. Wejust have to show that p = 1 when all hypotheses are satisfied(sufficiency of conditions) and p = 0 if any one of them fails(necessity).

I To prove sufficiency, apply Borel-Cantelli to see thatprobability that Xn 6= Yn i.o. is zero. Subtract means fromYn, reduce to case that each Yn has mean zero. ApplyKolmogorov maximal inequality.

18.175 Lecture 7




if the following are all true:I∑∞

n=1 P(|Xn| > A) <∞

I∑∞



implies that∑



18.175 Lecture 7





n=1 P(|Xn| > A) <∞I∑∞

n=1 EYn converges

I∑∞


implies that∑



18.175 Lecture 7





n=1 P(|Xn| > A) <∞I∑∞


n=1 Var(Yn) <∞

I Main ideas behind the proof: Kolmogorov zero-one lawimplies that

∑Xi converges with probability p ∈ {0, 1}. We

just have to show that p = 1 when all hypotheses are satisfied(sufficiency of conditions) and p = 0 if any one of them fails(necessity).


18.175 Lecture 7





n=1 P(|Xn| > A) <∞I∑∞



implies that∑



18.175 Lecture 7





n=1 P(|Xn| > A) <∞I∑∞



implies that∑



18.175 Lecture 7

Documents

18.175: Lecture 7 .1in Zero-one laws and maximal inequalitiesmath.mit.edu/~sheffield/2016175/Lecture7.pdf · 18.175: Lecture 7 Zero-one laws and maximal inequalities Scott She eld