An Introduction to Probability Theory: Outline
1. Definitions: sample space and (measurable)random variables
2. !-algebras
3. Expectation (integration)
4. Conditional expectation
5. Useful inequalities
6. Independent random variables
7. The central limit theorem
8. Laws of large numbers: Borel-Cantelli lemma
9. Uniform integrability
10. Kolmogorov’s extension theorem for consistentfinite-dimensional distributions
! March 7, 2015 George Kesidis
1
Sample space and events
• Consider a random experiment resulting in an outcome (or“sample”), ".
• E.g., the experiment is a pair of dice thrown onto a tableand the outcome is the exact orientation of the dice andtheir position on the table when they stop moving.
• The space of all outcomes, !, is called the sample space,i.e., " ! !.
• An event is merely a subset of !, e.g., “the sum of the dotson the upward facing surfaces of the dice is 7”.
• We say that an event A " ! has occurred if the outcome "of the random experiment belongs to A, i.e., " ! A, so
– events A and B occurred if " ! A #B, and
– events A or B occurred if " ! A $B.
• A sample space ! is an abstract, unordered set in general.
• Let F be the set of events, i.e., A ! F % A " !.
! March 7, 2015 George Kesidis
2
Probability on a sample space
• A probability measure P maps each event A " ! to a realnumber between zero and one inclusive, i.e., P(A) ! [0,1].
• A probability measure has certain properties:
1. P(!) = 1 and
2. P(A) = 1& P(Ac) ' events A, whereAc = {" ! ! | " (! A} is the complement of A.
• Moreover, if the events {Ai}ni=1 are disjoint (i.e., Ai #Aj = )for all i (= j), then
P
!
n"
i=1
Ai
#
=n$
i=1
P(Ai),
i.e., P is finitely additive.
• Formally, a probability measure is defined to be countablyadditive:
3. For any disjoint {Ai}*i=1,
P
! *"
i=1
Ai
#
=*$
i=1
P(Ai),
! March 7, 2015 George Kesidis
3
Probability measures on !-algebras
• On large sample spaces ! (e.g., ! = R), a formal probabilitymeasure may be impossible to construct if all subsets of !are defined as events, i.e., if F = 2! (the power set of allsubsets of !), cf., Caratheodory’s Extension Theorem.
• So, the set of events F is restricted to a !-algebra (or !-field)of subsets of ! formally satisfying the following properties:
1. ! ! F (possesses intersection identity)
2. if A ! F then Ac ! F (closed under complementation)
3. if A1, A2, A3, ... ! F, #*n=1An ! F (closed under countable
intersections)
• The probability measure P is defined only on the !-algebraF " 2!,
P : F + [0,1].
• We have thus identified a fundamental probability (measure)space: (!,F ,P).
• Note: Equivalently by De Morgan’s theorem, can use ) ! F(union identity) and closed under countable unions insteadof conditions 1 and 3 above.
! March 7, 2015 George Kesidis
4
Conditioned events
• The probability that A occurred conditioned on (or “giventhat”) another event B occurred is P(A|B) := P(A#B)/P(B),where P(B) > 0 is assumed.
• A group of events A1, A2, ..., An are said to be mutually in-dependent if
P
!
%
i!I
Ai
#
=&
i!I
P(Ai) ' I " {1,2, ..., n}.
• Note if events A and B are independent and P(B) > 0,then P(A|B) = P(A), i.e., knowledge that the event B hasoccurred has no bearing on the probability that the event Ahas occurred as well.
• Given that B has occurred with P(B) > 0:
– The set of events FB := {A # B | A ! F} is itself a!-algebra, and
– P(·|B), also a probability measure for (!,F) and (B,FB),addresses the residual uncertainty in the random experi-ment given that the event B has occurred.
– On (!,F), P(A) = 0 % P(A|B) = 0 'A ! F, i.e., P(·|B)is absolutely continuous w.r.t. P.
! March 7, 2015 George Kesidis
5
Random variables
• A random variable X is a real-valued function with domain!, X : ! + R.
• So, X(") is a real number representing some feature of theoutcome ".
• E.g., in a dice-throwing experiment, X(") could be definedas just the sum of the dots on the upward-facing surfacesof outcome " (which is the configuration of the dice on thetable when they stop moving).
• For random variables, we are typically interested in the prob-ability of the event that X takes values in a contiguous in-terval B of the real line (including singleton points), or someunion of such intervals, i.e.,
P(X ! B) := P({" ! ! | X(") ! B}) =: P(X&1(B)).
• To ensure that the fundamental probability space (!,F ,P)is capable of evaluating the probabilities of such events, weformally define random variables as being measurable.
• To explain measurability, we need to first define the Borel!-algebra of subsets of R that is generated by contiguousintervals of R.
! March 7, 2015 George Kesidis
6
The Borel !-algebra on R
• Consider contiguous intervals of the real line, e.g.,
[x,*) = {z ! R | z , x} or
(x, y] = {z ! R | x < z - y} etc.
• Define !(A) as the smallest !-algebra containing all elementsof elements A, i.e., generated by A.
• The Borel !-algebra is
B := !([x,*) | x ! R).
• Note that the singleton sets {x} ! B 'x ! R and that, e.g.,
B = !((&*, x] | x ! R)
= !([x, y) | x - y, x, y ! Q) etc
To see the first equality, note that [x,*)c = (&*, x) andthat we can define a monotonically decreasing sequence xn
converging to x so that #*n=1(&*, xn) = (&*, x].
• The Vitali subset of R is not in the Borel !-algebra, i.e.,B (= 2R.
• Indeed, the cardinality of B is only that of R.
! March 7, 2015 George Kesidis
7
Measurability of random variables
• Formally, random variables are defined to be measurablewith respect to (!,F), i.e.,
X&1(B) ! F 'B ! B,so that P(X ! B) is well-defined 'B ! B.
• A random variable X induces a probability measure PX on(R,B) (the distribution of X),
PX(B) := P(X ! B),
so that (R,B,PX) is also a probability space.
• Note: If a function g : R + R is (R,B)-measurable (i.e.,g&1(B) ! B 'B ! B) and X is a random variable, then g(X)is a random variable too.
• Note: The cumulative distribution function (CDF) of X isjust FX(x) := PX((&*, x]) = P(X - x).
! March 7, 2015 George Kesidis
8
Measurable compositions of random variables
If Y , X and X1, X2, X3, ... are all extended random variables, thenthe following are also random variables:
• min{X,Y }, max{X, Y }, XY , 1{X (= 0}/X where 00:= 1.
• #X + $Y '#, $ ! R.
• supn,1Xn, infn,1Xn, lim supn+*Xn, lim infn+*Xn.
! March 7, 2015 George Kesidis
9
!-algebra generated by a random variable
• Define !(X) := !({X&1(B) | B ! B}), i.e., the smallest!-algebra of events for which the random variable X is mea-surable.
• Note: One can directly show that
!(X) = !({X&1([x,*)) | x ! R}),i.e., considering only a “generating” subset of B.
• !(X) captures the “information” gained by the knowledgeof X(") about the outcomes ".
• E.g., If X is constant then !(X) = {),!}.
• E.g., If X = 1B for an event B ! F,
– where the the indicator function 1B(") = 1 if " ! B and1B(") = 0 else (also may write 1B := 1B),
– i.e., X is a Bernoulli distributed random variable,
– then !(X) = {), B,Bc,!}.
– If the scalars a (= b, then Y := a1B + b1Bc also indicateswhether B or Bc has occurred,
– i.e., !(X) = !(Y ) in this case.
• Doob’s Theorem: If Y is !(X)-measurable (so that !(Y ) "!(X)), then . Borel measurable g such that Y = g(X) a.s.
• If g is one-to-one, then !(Y ) = !(X).
! March 7, 2015 George Kesidis
10
Independent random variables
• The random variables X1, X2, ...,Xn are said to be mutuallyindependent (or just “independent”) if and only if
P(#ni=1{Xi ! Bi}) =
n&
i=1
P(Xi ! Bi) 'B1, ..., Bn ! B.
• Clearly mutual independence implies that the joint CDF
FX1,X2,...,Xn(x1, x2, ..., xn) := P(#n
i=1{Xi - xi})satisfies
FX1,X2,...,Xn(x1, x2, ..., xn) =
n&
i=1
FXi(xi) 'x1, ..., xn ! R.
• To prove the converse statement, we will now discuss mono-tone class theorems.
• Note: The marginal FX1(x1) = FX1,...,Xn
(x1,*,*, ...,*).
! March 7, 2015 George Kesidis
11
Monotone class theorems
• C " 2! is a %-class over ! if A,B ! C % A #B ! C.
• C " 2! is a &-class over ! when:
(i) ! ! C;
(ii) if A,B ! C and A " B % B\A := B #Ac ! C; and
(iii) if A1, A2, ... ! C is monotonically increasing(An " An+1 'n), then $*
n=1An ! C.
• Proposition: If C is a &-class over !, then
(a) A ! C % Ac ! C (by (i) and (ii), !\A / Ac ! C).
(b) A,B ! C and A # B = ) (i.e., they’re disjoint), thenA$B ! C (A " Bc % (Bc\A)c / B$A ! C by (ii) and (a)).
(c) if C is also a %-class, then C is a !-algebra.
The proof of (c) is left as an exercise.
• Because of the conditions on (ii) or (b), a &-class seems lessinclusive than a !-algebra.
! March 7, 2015 George Kesidis
12
Dynkin’s theorem
If D is a %-class, C a &-class and D " C, then !(D) " C.Proof:
• Define G as the smallest &-class such that D " G; thusD " G " C.
• We now prove G is also a %-class; the theorem then followsby the previous proposition (c).
• To this end, define H := {A " ! | A #D ! G 'D ! D}:
– Since D " G and D is a %-class, D " H.
– Check that H is a &-class % (minimal) G " H.
– Thus, A ! G (% A ! H) and D ! D % A #D ! G.
• Now define F := {B " ! | B #A ! G 'A ! G}:
– By the previous step, D " F.
– Check that F is a &-class % (minimal) G " F.
– Thus, G is a %-class.
Note: So, a &-class can be much larger than a %-class.
Classical monotone class theorem: if D is an algebra, C amonotone class (contains all limits of its monotone sequences)and D " C, then !(D) " C.
! March 7, 2015 George Kesidis
13
Independence in probability space (!,F ,P)
• Lemma: If D, C " F are independent classes of events andD is a %-class, then !(D) and C are independent.Proof:
– Take an arbitrary B ! C and defineDB = {A ! !(D) | P(A #B) = P(A)P(B)}
– D " DB.
– Check DB is a &-class.
– Apply Dynkin’s theorem.
• Theorem: If the joint CDF FX1,...,Xn/
'ni=1 FXi
(i.e., theLHS and RHS are equal at all points in Rn), then the nrandom variables X1, ...,Xn are independent.Proof:
– Define Dk = {{Xk - x} | &* - x - *}.
– Note: x = &* % ) ! Dk.
– Check that Dk is a %-class 'k.
– Finally use the lemma to obtain independence of the!(Dk) = !(Xk).
! March 7, 2015 George Kesidis
14
Conditional Independence
• Events A and C are said to be independent given B if
P(A | B,C) = P(A | B).
• Note that this implies P(C | B,A) = P(C | B).
• This is a natural extension of the unqualified notion of in-dependent events, i.e., events A and C are (unconditionally)independent if P(A | C) = P(A).
• Similarly, random variables X and Y are conditionally inde-pendent given Z if
P(X ! A | Z ! B, Y ! C) = P(X ! A | Z ! B)
for all Borel A,B,C " R, cf., Markov processes.
! March 7, 2015 George Kesidis
15
Expectation
• The expectation EX of a random variable X is simply its av-erage or mean value, which can be expressed as the Riemann-Stieltjes integral:
EX =
( *
&*x dFX(x),
recall that the CDF FX is nondecreasing on R.
• In the special case of a di"erentiable FX with probabilitydensity function (PDF) fX = F 0
X, i.e., X is continuouslydistributed, we can use the Riemann integral
EX =
( *
&*xfX(x)dx.
• In the case of a discretely distributed random variable Xwith countable state-space RX,
– F 0X(x) =
)
'!RXpX(')((x& '), where
– ( is the Dirac unit impulse and the probability mass func-tion (PMF) pX(') := P(X = ') > 0 for all ' ! RX, so that
EX =$
'!RX
' pX(').
! March 7, 2015 George Kesidis
16
Lebesgue integration
• Formally, the Lebesgue integral is used to define expecta-tion:
EX =
(
!
X(")dP(") =
(
!
X dP.
• Recall that ! is generally abstract, unordered.
• If X is simple (discretely distributed with |RX| = M < *),
– define the state-space RX = {'1, '2, ..., 'M} and
– the events Ai := {X = 'i} := {" ! ! | X(") = 'i}, whicha.s. partition !,
– so that the (well-defined) Lebesgue integral is
(
!
X(")dP(") =M$
i=1
'iP(Ai),
i.e., P(Ai) = pX('i) for all i.
! March 7, 2015 George Kesidis
17
Lebesgue integration (cont)
• To develop the general Lebesgue integral, we need to con-sider “extended” random variables X : ! + R, where
– the extended reals R := R $ {±*}, and
– measurability involves Borel sets that include ±*.
• For any sequence X1, X2, X3, ... of random variables, the fol-lowing are extended random variables:
– limn+*Xn (assuming the limit in n of Xn(") exists '" !!) and
– supn+*Xn.
! March 7, 2015 George Kesidis
18
Approximating random variables
• Proposition: For any non-negative extended random variableX, there is a sequence of simple random variables Xn suchthat
(a) P(0 - Xn - Xn+1 - X) = 1 for all n (i.e., monotonicity),and
(b) Xn + X almost surely (a.s.), i.e., almost sure conver-gence:
P( limn+*
Xn = X) = 1.
• Proof:
1. Define the nth partition of R+ (i.e., of the y-axis unlikeRiemannian integration) into a finite collection of con-tiguous intervals
{[bkn, bk+1n )}Kn
k=0 ,
where 'n: b0n = 0, bKn+1n = *+ (last interval includes
*).
2. 'k, define Xn(") = bkn '" ! X&1[bkn, bk+1n ), i.e., Xn - X
a.s.
3. The (n + 1)st partition is finer than the nth (i.e., Xn -Xn+1), in such a way that limn+*Kn 1 * to achieve(b).
! March 7, 2015 George Kesidis
19
Construction of the Lebesgue integral
• So for a non-negative extended random variable, the Lebesgueintegral is defined as
(
!
X dP = limn+*
(
!
Xn dP,
i.e., EX = limn+* EXn.
• For a signed extended random variable:
1. Note that X = X+&X& where the non-negative extendedrandom variables
X+ := max{0, X} and X& := max{0,&X}.
2. If EX+ < * or EX& < *, then define the Lebesgueintegral
EX = EX+ & EX&,
otherwise the Lebesgue integral EX is not defined.Note: |X| = X+ +X&.
• E.g., 1Qc is Lebesgue but not Riemann integrable:
( 1
0
1Qc(x)dx = 1 · P(Qc # [0,1]) + 0 · P(Q # [0,1]) = 1,
where here P is Lebesgue measure on ! = [0,1], so thatP(Q # [0,1]) = 0 as the rationals are countable.
! March 7, 2015 George Kesidis
20
Integration theorems (1 variable integrand)
Consider a sequence X1, X2, ... of random variables:
• Bounded convergence theorem: if supn |Xn| - K < *a.s. (where K is constant) and X = limn+*Xn a.s., thenEX = limn+* EXn and E|X| - K.
• Proof: Define An = {X&Xn > )} for arbitrary positive ) 2 1and note
|EXn & EX| - E|Xn &X|= E|Xn &X|1An
+ E|Xn &X|1Acn
- 2KP(An) + ).
• Now note
(lim infn+*
Xn)(") := limn+*
infk,n
Xk(")
always exists (though possibly not finite) since Yn := infk,n Xk
is a.s. monotonically nondecreasing in n.
• Fatou’s lemma: lim infn+* EXn , E(lim infn+*Xn).
• Proof:
– Let X := lim infn+*Xn.
– For any K > 0, invoke the bounded convergence theoremon min{Yn,K} 1 min{X,K}.
– Approximating with simple RVs and using monotonicity,limK+* Emin{X,K} = EX.
! March 7, 2015 George Kesidis
21
Integration theorems (cont)
Let X be the extended RV such that Xn + X a.s.
• Monotone convergence theorem: if limn+*Xn 1 X a.s.,then
limn+*
EXn 1 EX.
• Lebesgue’s dominated convergence theorem:If there exists a random variable Y such that|X| - |Y | a.s. and E|Y | < *, then
limn+*
E|X &Xn| = 0.
• Corollary (Sche"e):
limn+*
E|Xn &X| = 0 3 limn+*
E|Xn| = E|X|
! March 7, 2015 George Kesidis
22
Conditional expected value andevent-conditional distributions
Consider a random variable X and an event A such that P(A) > 0.
• The conditional expected value of X given A, denoted µ(X|A)is
µ(X|A) =
( *
&*xdFX|A(x), where FX|A(z) := P(X - z|A).
• For a discretely distributed random variable X with RX ={aj}*j=1,
µ(X|A) =*$
j=1
ajP(X = aj|A).
• The conditional PMF of X given A is pX|A(aj) = P(X = aj|A)for all j.
• Event-conditional PDF of a continuously distributed X is
fX|A(x) :=d
dxFX|A(x) % µ(X|A) =
( *
&*xfX|A(x)dx.
! March 7, 2015 George Kesidis
23
Conditional expectation
• Consider now two discretely distributed X and Y .
• The conditional expectation of X given the random variableY , denoted E(X|Y ), is a random variable itself.
• Indeed, suppose {bj}*j=1 = RY and, for all samples
"j ! {" ! ! | Y (") = bj} =: Bj
define
E(X|Y )("j) := µ(X|Bj) := µ(X|Y = bj),
• That is, E(X|Y ) maps all samples in the event Bj to the con-ditional expected value µ(X|Bj), i.e., E(X|Y ) is “smoother”(less uncertain) than X.
• Therefore, the random variable E(X|Y ) is a.s. a function ofY , i.e., E(X|Y ) is !(Y )-measurable.
• So, E(X|Y ) = E(X|Z) a.s. whenever !(Z) = !(Y ) allowingfor di"erences involving P-null events.
! March 7, 2015 George Kesidis
24
Conditional densities
• Now consider two random variables X and Y which are con-tinuously distributed with joint PDF
fX,Y =*2FX,Y
*x*y.
• For fY (y) > 0, we can define the conditional density:
fX|Y (x|y) :=fX,Y (x, y)
fY (y)
for all x ! R.
• Note that fX|Y (·|y) is itself a PDF and,
µ(X|Y = y) =
( *
&*xfX|Y (x|y)dx, where P(Y = y) = 0.
! March 7, 2015 George Kesidis
25
Conditional expectation and MSE
• In general, E(X|Y ) is the function of Y which minimizes themean-square error (MSE),
E[(X & h(Y ))2],
among all (measurable) functions h.
• So, E(X|Y ) is the best approximation of X given Y .
• In particular, E(X|Y ) and X have the same expectation,
E(E(X|Y )) = EX.
• Note: if X and Y are independent,
E(X|Y ) = EX a.s.
! March 7, 2015 George Kesidis
26
Some useful inequalities
• If event A1 " A2, thenP(A1) - P(A2) = P(A1) + P(A2\A1).
• For any group of events A1, A2, ..., An, Boole’s inequalityholds:
P
!
n"
i=1
Ai
#
-n$
i=1
P(Ai).
• Note that when the Ai are disjoint, equality holds simply bythe additivity property of a probability measure P and recallthe inclusion-exclusion set identities.
• If two random variables X and Y are such that X , Y a.s.,then EX , EY .
• Recall Fatou’s lemma.
! March 7, 2015 George Kesidis
27
Markov’s Inequality
• Consider a random variable X with E|X| < * and a realnumber x > 0.
• Since |X| , |X|1{|X| , x} , x1{|X| , x} a.s., we arrive atMarkov’s inequality:
E|X| , Ex1{|X| , x}= xE1{|X| , x}= xP(|X| , x).
• An alternative explanation for continuously distributed ran-dom variables X (with PDF f) is
E|X| =
( *
&*|z|f(z)dz
,( &x
&*(&z)f(z)dz +
( *
xzf(z)dz
,( &x
&*xf(z)dz +
( *
xxf(z)dz
= xP(|X| , x).
! March 7, 2015 George Kesidis
28
Chebyshev and Cramer’s Inequalities
• Take x = )2, where ) > 0, and argue Markov’s inequalitywith (X&EX)2 in place of |X| to get Chebyshev’s inequality
var(X) := E[(X & EX)2] , )2P(|X & EX| , )),
i.e.,
P(|X & EX| , )) - )&2var(X).
• Noting that, for all + > 0, {X , x} = {e+X , e+x} and arguingas for Markov’s inequality gives the Cherno" (or Cramer)inequality:
Ee+X , e+xP(X , x)% P(X , x) - exp
*
&[x+ & log Ee+X]+
- exp
,
&max+>0
[x+ & log Ee+X]
-
,
where we have simply sharpened the inequality by taking themaximum over the free parameter + > 0.
• Note the Legendre transform of the log moment-generatingfunction of X in the Cherno" bound.
! March 7, 2015 George Kesidis
29
Inequalities of Minkowski, Holder, andCauchy-Schwarz-Bunyakovsky
• Minkowski’s inequality: if E|X|q, E|Y |q < * for q , 1, then
(E|X + Y |q)1/q - (E|X|q)1/q + (E|Y |q)1/q,i.e., triangle inequality in the Lq space of random variables.
• Holder’s inequality: if E|X|r, E|Y |q < * for r > 1 andq&1 := 1& r&1, then
E|XY | - (E|X|r)1/r(E|Y |q)1/q.
• CBS inequality (q = 2): if EX2, EY 2 < *, then
E|XY | -.
E(X2)
.
E(Y 2).
• CBS is strict whenever X (= cY or Y = 0 a.s. for someconstant c.
• CBS is an immediate consequence of the fact that wheneverX (= 0 a.s. and Y (= 0 a.s.,
E
!
X/
E(X2)&
Y/
E(Y 2)
#2
, 0.
! March 7, 2015 George Kesidis
30
Jensen’s Inequality
• Note that if we take Y = 1 a.s., the Cauchy-Schwarz-Bunyakovsky inequality simply states that var(X) , 0, i.e.,
E(X2)& (EX)2 , 0.
• This is also an immediate consequence of Jensen’s inequal-ity.
• A real-valued function g on R is said to be convex if
g(px+ (1& p)y) - pg(x) + (1& p)g(y)
for any x, y ! R and any real fraction p ! [0,1].
• If the inequality is reversed, g is said to be concave.
• For any convex function g and random variable X, we haveJensen’s inequality:
g(EX) - E(g(X)).
! March 7, 2015 George Kesidis
31
Inequalities: Conditioned versions
• These inequalities and integration theorems have straight-forward “conditional” extensions.
• E.g., the conditional Jensen’s theorem: if g is convex andG " F is a !-algebra,
E(g(X) | G) , g(E(X | G))
• Applying conditional Jensen’s with g(x) = |x|q for real q , 1,and using the linearity of conditional expectation, we getthe following result.
• If X and X1, X2, X3, ... are random variables such that, forreal q , 1, E|X|q < 1 and E|Xn|q < * (i.e., X,Xn ! Lq) 'n,then:
(a) ||E(X | G)||q - ||X||q := (E(|X|q))1/q, and, therefore,
(b) If limn+* ||X &Xn||q = 0, i.e., convergence in Lq, then
limn+*
||E(X | G)& E(Xn | G)||q = 0.
! March 7, 2015 George Kesidis
32
Sums of independent random variables
• Consider two independent random variables X1 and X2 withPDFs f1 and f2 respectively; so, fX1,X2
= f1f2.
• The CDF of the sum is
F (z) = P(X1 +X2 - z) =
( *
&*
( z&x1
&*f1(x1)f2(x2)dx2dx1.
• Exchanging the first integral on the RHS with a derivativew.r.t. z gives the PDF of X1 +X2:
f(z) =d
dzF (z) =
( *
&*f1(x1)f2(z & x1)dx1 for all z ! R.
• Thus, f is the convolution of f1 and f2 which is denotedf = f1 4 f2.
! March 7, 2015 George Kesidis
33
Sums of independent random variables
• In this context, moment generating functions can be usedto simplify calculations.
• Let the MGF (bilateral Laplace transform) of Xi be
mi(+) = Ee+Xi =
( *
&*fi(x)e
+xdx.
• The MGF of X1 +X2 is, by independence,
m(+) = Ee+(X1+X2) = Ee+X1e+X2 = m1(+)m2(+).
• So, convolution of PDFs corresponds to simple multiplica-tion of MGFs (and to addition of independent random vari-ables).
! March 7, 2015 George Kesidis
34
Example: exponential and gamma distributions
Consider independent random variables that are all exponentiallydistributed with mean 1/&.
• The PDF of X1 +X2 is f , where f(z) = 0 for z < 0 and, forz , 0,
f(z) =
( z
0
f1(x1)f2(z & x1)dx1 = &2ze&&z,
i.e., the (n,&) gamma distribution with n = 2 (a.k.a. Erlangdistribution when n ! Z+).
• So, the MGF of X1 +X2 is
m(+) =
,
&
&& +
-2
,
which is consistent with the PDF just computed.
• There is a 1-to-1 relationship between PDFs and MGFsof nonnegative random variables (unilateral Laplace trans-form).
• So, for a sum of n random variables
m(+) =
,
&
&& +
-n
3 fn(z) =&nzn&1e&&z
(n& 1)!'z , 0.
• Note: Construction of continuous-time Markov chains isbased on the memoryless property that is unique to theexponential distribution.
! March 7, 2015 George Kesidis
35
The Gaussian distribution
Assume Xi is Gaussian (normally) distributed with mean µi andvariance !2
i , i.e., Xi 5 N(µi,!2i ).
• If independent RVs, the MGF of X1 +X2 is
m(+) = exp*
µ1+ + 12!21+
2+
6 exp*
µ2+ + 12!22+
2+
= exp*
(µ1 + µ2)+ + 12(!2
1 + !22)+
2+
,
which we also recognize as a Gaussian MGF.
• Even if dependent, #1X1 + #2X2, for scalars #i, is Gaussiandistributed with mean #1µ1 + #2µ2 andvariance #2
1!21 + #2
2!22 + 2#1#2cov(X1, X2),
where the covariance cov(X1, X2) := EX1X2 & EX1EX2.
• X = (X1, X2, ...,Xn) are jointly Gaussian if
fX(x) =1
[2% det(C)]n/2exp
*
&12(x& EX)TC&1(x& EX)
+
,
where the (symmetric) covariance matrix isC = E(X & EX)(X & EX)T.
• Note: E(X1 | X2) = EX1 + (X2 & EX2)E(X1X2)/EX22 is a.s.
linear in X2, and
• if X1, X2 are uncorrelated (diagonal covariance matrix), thenGaussian distributed with mean µi and variance !2
i . X1, X2
are independent (the converse is always true).
! March 7, 2015 George Kesidis
36
de Moivre’s formula
There is a constant $ > 0 such that n!en 5 $nn7n.
Proof:
• Define B(n) = n!en/(nn7n).
• logB(n) = 1+)n
j=2[logB(j)& logB(j & 1)]
where 1 = logB(1).
• By Taylor’s theorem,
log(1& x) = &x& x2/2& x3/3+ o(x3) where limy+0
o(y)
y= 0.
• So,
logB(j)& logB(j & 1) = 1+ (j &1
2) log(1&
1
j)
= &1
12j2+ o(
1
j2)
• Since j&2 is summable, B(n) converges to a finite $.
! March 7, 2015 George Kesidis
37
de Moivre-Laplace Central Limit Theorem (CLT)
• Consider a sequence of independent and identically distributed(i.i.d.) Bernoulli random variables X1, X2, ..., where
p := P(Xi = 1) and q := 1& p = P(Xi = 0).
• Define the sum Sn = X1 +X2 + ...+Xn.
• Sn is binomially distributed with parameters (n, p), i.e.,
P(Sn = k) =0n
k
1
pkqn&k
for all k ! {0,1,2, ..., n}.
• ESn = np and variance var(Sn) = ES2n & (ESn)2 = npq.
• Thus Yn := (Sn & np)/7nqp is centered (EYn = 0) and has
unit variance var(Yn) = 1 for all n.
• Theorem (de Moivre-Laplace CLT): If Xi are i.i.d. Bernoullirandom variables, then Yn defined above converges in dis-tribution to a standard normal (Gaussian), i.e.,
limn+*
P(Yn > y) = #(y) :=
( *
y
172%
e&x2/2dx
! March 7, 2015 George Kesidis
38
de Moivre-Laplace CLT: Proof
P(a < Yn :=Sn & np7npq
- b)
=$
np+a7nqp<k-np+b
7nqp
0n
k
1
pkqn&k
=$
a7nqp<k0-b
7nqp
0 n
k0 + np
1
pk0+npqnq&k0
where the sums are over integers k, k0. Using de Moivre’s formulato uniformly approximate
* nk0+np
+
over k0 as n + *:
P(a < Yn - b)
51
$7npq
$
a7nqp<k0-b
7nqp
(1 +k0
np)&k0&np(1&
k0
nq)&nq+k0
51
$7npq
$
a7nqp<k0-b
7nqp
exp
,
&(k0)2
2npq
-
+n+*
( b
a
e&x2/2
$dx
=
72%
$(#(a)&#(b))
where the second step is log(1& x) = &x& x2/2+ o(x2) and thethird is the Riemann integral.
Taking &a, b + *, gives Wallis’ identity: $ =72%.
! March 7, 2015 George Kesidis
39
Stirling’s formula
• de Moivre’s formula with Wallis’ identity gives Stirling’s for-mula:
n!en 5 nn72%n.
• In the following, we prove a more general sequential CLT.
! March 7, 2015 George Kesidis
40
An i.i.d. CLT
Theorem: If X1, X2, ... are i.i.d. with E|X1| < * and0 < !2 := var(X1) < *, then
Yn :=Sn & nµ
!7n
+d N(0,1),
i.e., converges in distr’n to a standard normal.Proof:
• Taylor’s thm: eix = 1+ ix& 12x2 + R(x) where |R(x)| - |x|3.
• For |x| > 4, |R(x)| - |eix|+ 1+ |x|+ 12x2 - x2,
% |R(x)| - min{|x|3, x2}.
• So, if X 5 (X1 & EX1)/! then the characteristic function
of Yn is EeitYn =0
EeitX/7n1n
% EeitYn =
,
1+ i(EtX7n)&
E(tX)2
2n+ ER(
tX7n)
-n
=
,
1&t2
2n+ ER(
tX7n)
-n
,
where, by the dominated convergence theorem,
|ER(tX/7n)| - n&1
Emin{|tX|37n
, (tX)2} = n&1o(1).
% limn+*
EeitYn = limn+*
(1& t2/(2n) + o(1)/n)n = e&t2/2.
! March 7, 2015 George Kesidis
41
Trotter’s proof of the i.i.d. CLTPreliminaries
• Let C be the set of bounded uniformly continuous functionson R, i.e., if f ! C then ') > 0, .( > 0 such that: 'x, y ! R,|x& y| < ) % |f(x)& f(y)| < (.
• A transformation (function, operator) T : C + C is said tobe linear if T(af + bg) = aTf + bTg 'f, g ! C and 'a, b ! R.Note that 'x ! R, (aTf + bTg)(x) := a(Tf)(x) + b(Tg)(x).
• Define the supremum norm 8 f 8:= supx!R |f(x)|.
• T is said to be a contraction operator if 8 Tf 8-8 f 8 'f ! C.
• For a random variable X, define TX : C + C as
(TXf)(y) := Ef(X + y) =
( *
&*f(x+ y)dFX(x) y ! R.
• Note: f ! C % TXf ! C, TX is a linear contraction, and(TXf)(0) = Ef(X).
• Note: TX1TX2
= TX2TX1
(commutation), furthermore if X1, X2
are independent then (as characteristic functions)
TX1+X2= TX1
TX2= TX2
TX1,
c.f., Fubini’s theorem.
• Define C2 = {f ! C | f 0, f” ! C}.
! March 7, 2015 George Kesidis
42
Trotter’s proof of the i.i.d. CLTPreliminaries (cont)
Lemma 1: If limn+* Ef(Xn) = Ef(X) 'f ! C2, then X1, X2, ...converges in distribution to X.Note: Hypothesis is satisfied if 8 TXn
f & TXf 8+ 0.Proof of Lemma 1:
• Consider any y at which FX is continuous.
• Fix ) > 0 arbitrarily and take ( > 0 small enough so thatFX(y + ()& FX(y & () < ).
• Define f, g ! C2 such that
(i) f(x) = 1 for x - y & (,
(ii) g(x) = 1 for x - y,
(iii) f(x) = 0 for x , y, and
(iv) g(x) = 0 for x , y + (;
so that 0 - f - g - 1 in particular.
• So, since f(X) - 1{X - y & (} etc.,
FX(y & () - Ef(X) = limn+*
Ef(Xn) - lim infn+*
FXn(y)
- lim supn+*
FXn(y) - lim
n+*Eg(Xn) = Eg(X) - FX(y + ()
where the equalities are by hypothesis.
• Since this holds ') > 0, limn+* FXn(y) = FX(y).
! March 7, 2015 George Kesidis
43
Trotter’s proof of the i.i.d. CLTPreliminaries (cont)
Lemma 2: If A,B : C + C are linear, contraction operatorsthat commute, then 8 Anf&Bnf 8- n 8 Af&Bf 8 'n ! Z+, f ! C.Proof:
• Factor
Anf & Bnf =n&1$
i=0
An&i&1(A&B)Bif =n&1$
i=0
An&i&1Bi(A&B)f.
where the second equality is by commutativity.
• Now take norm of both sides, use the triangle inequality,and finally repeatedly use the contraction hypotheses.
! March 7, 2015 George Kesidis
44
Trotter’s proof of the i.i.d. CLTPreliminaries (cont)
Lemma 3: If EX = 0 and EX2 = 1, then 'f ! C2 and ') > 0:.N < * such that 8 Tn&1/2Xf & f & 1
2nf” 8- )
n'n , N .
Proof:
• Fix y such that FX is continuous at y.
• By Taylor’s theorem, .z(x) ! [y, y + x] such that
f(y + x) = f(y) + xf 0(y) + 12x2f”(y) + 1
2x2[f”(z(x))& f”(y)].
• By uniform continuity of f” (f ! C2), ') > 0, .( > 0 suchthat |z(x)& y| < ( % |f”(z(x))& f”(y)| < ). Thus,
(Tn&1/2Xf)(y)
=
(
f(y + n&1/2x)dFX(x)
= f(y)
(
dFX(x) + 17nf 0(y)
(
xdFX(x) + 12nf”(y)
(
x2dFX(x)
+ 12n
(
[f”(z(n&1/2x))& f”(y)]x2dFX(x)
= f(y) + 12nf”(y)
+ 12n
,(
|x|<(7n+
(
|x|,(7n
-
[f”(z(n&1/2x))& f”(y)]x2dFX(x)
! March 7, 2015 George Kesidis
45
Trotter’s proof of the i.i.d. CLTLemma 3’s proof (cont)
• Now, |x| < (7n % |z(n&1/2x)& y| - |n&1/2x| - (.
• Thus,2
2
2
2
12n
(
|x|<(7n[f”(z(n&1/2x))& f”(y)]x2dFX(x)
2
2
2
2
-2
2
2
2
12n
(
|x|<(7n)x2dFX(x)
2
2
2
2
- )n.
• Since |f”(z)& f”(x)| - 2 8 f” 8< * and EX2 < *,
12n
2
2
2
2
(
|x|,(7n[f”(z(n&1/2x))& f”(y)]x2dFX(x)
2
2
2
2
- 1n8 f” 8
2
2
2
2
(
|x|,(7nx2dFX(x)
2
2
2
2
-)
n' su". large n.
• Finally, substitute the last two estimates into the expressionfor (Tn&1/2Xf)(y) of the previous slide.
! March 7, 2015 George Kesidis
46
Trotter’s proof of the i.i.d. CLT
Theorem: If X1, X2, ... are i.i.d. with E|X1| < * and 0 <EX2
1 < *, then n&1/2Sn := n&1/2(X1+ ...+Xn) +d N(µ,!2) whereµ := EX1 and !2 := var(X1).Proof:
• Let Y 5 N(µ, !2).
• By Lemma 1, theorem follows if limn+* 8 Tn&1/2Snf&TY f 8= 0.
• w.l.o.g., µ = 0 and !2 = 1, i.e., Y 5N(0,1) if we restate thetheorem in terms of (Sn & nµ)/(!
7n).
• Since Y 5N(0,1), TY = Tnn&1/2Y and, by IBP,
Tnn&1/2Y f = f + 1
2nf”.
• Since the Xi are i.i.d., Tn&1/2Sn= Tn
n&1/2X1.
• By Lemma 2,
8 Tn&1/2Snf & TY f 8 - n 8 Tn&1/2X1
f & Tn&1/2Y f 8 .
• Applying Lemma 3 we get
8 Tn&1/2X1f & Tn&1/2Y f 8 - 2).
for all su$ciently large n.
! March 7, 2015 George Kesidis
47
Lindeberg’s CLT forindependent random variables
• Consider an independent sequence of random variables withEXi = 0 'i w.l.o.g.
• Let !2i = EX2
i , i.e., they are not necessarily identically dis-tributed.
• The CLT can be generalized to a sequence of random vari-ables that assuming only their mutual independence underLindeberg’s condition:
limn+*
s&2n
n$
i=1
(
|x|,(sn
x2dFXi(x) = 0 '( > 0,
where sn :=/
)ni=1 !
2i , i.e., the Xi are not identically dis-
tributed (recall the last inequality of the proof of Lemma3).
• A simple proof of Lindeberg’s CLT follows that of Trotter’sfor the i.i.d. case [Trotter’59].
• Feller proved that Lindeberg’s condition is necessary.
! March 7, 2015 George Kesidis
48
Modes of convergence
• A CLT involves convergence in distribution.
• This is the weakest class of convergence results, in whichno limiting random variable need exist.
• FYn(y) + FY (y) 'y that are points of continuity of FY implies
Yn + Y in distr’n.
• In the following, we will see that convergence:in distr’n 9 in prob. 9 (a.s. or in L2).
! March 7, 2015 George Kesidis
49
Weak law of large numbers (WLLN):assumptions
• Assume random variables X1, X2, X3, ... are i.i.d.
• Also suppose that the common distribution has finite vari-ance, i.e., !2 := var(X) := E(X & EX)2 < *, where X 5 Xi
'i.
• Finally, suppose that the mean exists and is finite,i.e., µ := EX < *.
• Recall the sum Sn := X1+X2+ · · ·+Xn for n , 1, ESn = nµand var(Sn) = n!2.
• The quantity Sn/n is called the empirical mean of X after nsamples and is an unbiased estimate of µ, i.e.,
E
,
Sn
n
-
= µ.
! March 7, 2015 George Kesidis
50
A WLLN: Statement and Proof
Theorem: If X1, X2, ... i.i.d. with var(X1) < *,
limn+*
P
,2
2
2
2
Sn
n& µ
2
2
2
2
, )
-
= 0 ') > 0.
• By Chebyshev’s inequality,
P
,2
2
2
2
Sn
n& µ
2
2
2
2
, )
-
-var(Sn/n)
)2=
!2
n)2.
• Note: So, Sn/n is said to be a weakly consistent estimatorof µ.
• Consequently, L2 convergence, E(Yn&Y )2 + 0, implies weakconvergence, P(|Yn & Y | > )) + 0 ') > 0.
• Example: Discretely distributed Yn + 0 in probability butnot in L2 when:
P(Yn = &n) = P(Yn = n) = pn = (1& P(Yn = 0))/2
such that pn + 0 as n + * but n2pn (+ 0, e.g., pn = 1/n forn > 1 so that EY 2
n = 2n (+ 0.
• Example: Yn + c (a constant) in probability 3 Yn + c indistribution.
! March 7, 2015 George Kesidis
51
Strong Law of Large Numbers (SLLN)
• Again, a sequence of random variables X1, X2, ... is said toconverge almost surely (a.s.) to a random variable X if
P
0
limn+*
Xn (= X1
= 0.
• Kolmogorov’s strong LLN: if X1, X2, ... i.i.d. and E|X1| < *,then
P
,
limn+*
Sn
n= µ
-
= 1 i.e.,Sn
n+ µ := EX1 a.s.
• Formally, the limit states that 'r ! Z+, .n such that 'k , n:|Sn/n& µ| < 1/r a.s.; i.e.,
P
! *%
r=1
*"
n=1
*%
k=n
3
|Sk
k& µ| <
1
r
4
#
= 1.
• But P(5*
r=1Br) = 1 3 P(Br) = 1 'r, soSn/n + µ a.s. if and only if
P
! *"
n=1
*%
k=n
3
|Sk
k& µ| <
1
r
4
#
= 1 'r.
! March 7, 2015 George Kesidis
52
SLLN statement (cont)
• Now fix r ! Z+ arbitrarily and let
Ack =
6
|Sk/k & µ| < r&17
.
• The event*"
n=1
*%
k=n
Ack
is denoted Acn almost always (a.a.), i.e.,
" ! Acn a.a. 3 .n4(") such that " ! Ac
n 'n , n4(").
• Note: P(5*
k=n Ack) 1 P(Ac
n a.a.).
• Note: equivalently express above in terms of An where theevent
*%
n=1
*"
k=n
Ak
is denoted An infinitely often (i.o.) = (Acn a.a.)c, i.e.,
" ! An i.o. 3 'n .m > n such that " ! Acm.
! March 7, 2015 George Kesidis
53
SLLN and Borel-Cantelli Lemmas
So, Sn/n + µ a.s. 3 P(Acn a.a.) = 1 3 P(An i.o.) = 0 for all
r ! Z+ where An = {|Sn/n& µ| , r&1}.
• First BC Lemma: Generally for events A1, A2, . . . ,
*$
n=1
P(An) < * % P(An i.o.) = 0.
• Proof:
P(An i.o.) - P(*"
n=k
An) 'k
- P(Ak) + P(Ak+1) + · · · +k 0
• Second BC Lemma: For independent events A1, A2, . . . ,
*$
n=1
P(An) = * % P(An i.o.) = 1.
• Proof: limm+*)m
k=n P(Ak) = * implies
0 : e&)m
k=nP(Ak) =
m&
k=n
e&P(Ak) ,m&
k=n
(1& P(Ak)) = P(m%
k=n
Ack)
where the equalities are by independence ande&x , 1& x was used.
• Note: P(An i.o.) ! {0,1} by the “zero-one law”.
! March 7, 2015 George Kesidis
54
Kolmogorov’s maximal inequality
If X1, X2, ... are independent and EX2n < * 'n,
P(max1-k-n
|Sk & ESk| , &) -var(Sk)
&2'n , 1,& > 0.
Proof:
• W.l.o.g. assume EXn = 0 'n % ESn = 0 'n. Fix & > 0.
• Define disjoint Bk = {|Si| < & 'i < k, |Sk| , &}.
ES2n ,
n$
k=1
ES2n1Bk
,n$
k=1
E(2(Sn & Sk)Sk + S2k)1Bk
=n$
k=1
[2E(Sn & Sk)ESk1Bk+ ES2
k1Bk]
=n$
k=1
ES2k1Bk
where the second-to-last inequality is by independence.
• Thus, ES2n , &2
)nk=1 P(Bk) = &2P(
8nk=1Bk).
• Note how this generalizes Chebyshev’s inequality.
! March 7, 2015 George Kesidis
55
SLLN: proof of bounded second moment case
Assume EX21 < *.
• Again, assume centered Xn w.l.o.g., and apply the maximalinequality with & = 2n) and First BC Lemma to get that
P({ max1-k-2n
|Sk| - 2n)} a.a.) = 1 ') > 0.
• Now, 'm such that 2n&1 < m - 2n,
max1-k-2n
|Sk| - 2n) implies |Sm| - 2m)
• This leads to P(|Sm|/m - 2) a.a.) = 1 ') > 0.
• Note: Kolmogorov’s SLLN only requires E|X1| < *, i.e., boundedfirst moment.
! March 7, 2015 George Kesidis
56
Weak and strong LLNs
• SLLN % WLLN since P(Acn i.o.) = 0 % P(An) + 0.
• Example of persistently shrinking pulse on ! = [0,1] with P
Lebesgue measure:
– 'm ! Z+, k ! {1,2, . . . ,m}, define
Yk+m(m&1)/2(") := 1
3
k &m
m< " -
k
m
4
.
– The random variables Yn converge weakly but not stronglyto zero because P({Ym > )} i.o.) = 1 for all ) = r&1 > 0.
• Theorem: If X1, X2, ... converges to X in probability thenthere is a subsequence Xn1, Xn2, ... that converges to X a.s.
! March 7, 2015 George Kesidis
57
Uniform integrability: motivation
• Persistently shrinking pulse example also showed that con-vergence in Lq, for q , 1, does not generally imply conver-gence a.s.
• Example (Dirac/Heaviside impulse):
– ! = [0,1] and P is Lebesgue measure.
– Xn(") := n1[0,
1n]'n , 1.
– Clearly, Xn + 0 a.s.
– But, EXn = 1 'n , 1.
• So, convergence a.s. (i.e., “pointwise”) does not generallyimply convergence in Lq either.
• Under what conditions does a.s. convergence imply conver-gence in Lq for q , 1?
! March 7, 2015 George Kesidis
58
Uniform integrability: preliminaries
Consider the probability space (!,F ,P).
• Theorem: If E|X| < * then ') ! (0,*) .c()) ! [0,*) suchthat
E(|X|1{|X|,c}) < ) 'c ! [c()),*).
Proof:
– Lebesgue integrals are uniformly continuous, i.e.,.(()) ! (0,*) such that EX1A < ) 'A ! F such thatP(A) < (())(exercise: prove by contradiction of the assumption thatE|X| < *).
– By Markov’s inequality P(|X| , c) - c&1E|X| 'c ! (0,*).
– Since E|X| < *, .c()) ! (0,*) such thatP(|X| , c) < (()) 'c ! [c()),*).
– Finally, take A = {X , c} for c ! (c()),*).
! March 7, 2015 George Kesidis
59
Uniform integrability:definition and su!cient conditions
• A collection of random variables C on (!,F ,P) is uniformlyintegrable if: ') ! (0,*) .c()) ! [0,*) such that
supX!C
E(X1{|X|,c}) < ) 'c ! [c()),*) and 'X ! C.
• If |X| - Y a.s. 'X ! C with EY < *, then C is uniformly inte-grable (note that EXn1{|Xn| > Y } = 0 and recall Lebesgue’sdominated convergence theorem); the converse is not true.
• For uniformly integrable C, if c ! [c()),*) then
E|X| = E|X|1{|X|<c} + E|X|1{|X|,c} < c+ ) 'X ! C,
i.e., C is uniformly L1 bounded; the converse is not true.
• If C = {X0, X1, ...} such that 0 - Xn - Xn+1 'n ! Z+ (i.e., anincreasing sequence) and C is L1 bounded, then by monotoneconvergence theorem, C is uniformly integrable.
• Theorem: If X ! (!,F ,P) and E|X| < * and {G&, & ! %} isa collection of sub !-algebras of F, then {E(X | G&), & ! %}is uniformly integrable.Proof: exercise.
! March 7, 2015 George Kesidis
60
Uniform integrability: main theorem prelim
• Define the ramp +c(x) = x1{|x|<c} + c1{x,c} & c1{x-&c}.
• Theorem: If C is uniformly integrable then ') ! (0,*).c()) ! [0,*) such that
E|X & +c(X)| < ) 'X ! C, c ! [c()),*).
• Proof: Arbitrarily fix c ! (0,*) and note that
|x& +c(x)| = (x& c)+ + (x+ c)& 'x ! R
% E(X & c)+ = E(X & c)1{X,c} - E|X|1{|X|,c}
and E(X + c)& = &E(X + c)1{X-&c} - E|X|1{|X|,c}.
! March 7, 2015 George Kesidis
61
Swapping limits & expectation (integration)
Theorem: If (a) limn+*Xn = X a.s. and (b) {X0, X1, ...} areuniformly integrable, then
E|X| < * and limn+*
E|Xn &X| = 0.
Proof:
• By (b) and previous “uniformly L1 boundedness” result,.B < * such that E|Xn| < B 'n ! Z+.
• So, by (a) and Fatou’s lemma,
E|X| = E lim infn+*
|Xn| - lim infn+*
E|Xn| < B.
• To get L1 convergence of Xn to X, arbitrarily fix ) ! (0,*).By the previous “ramp” result and (b), .c()) ! (0,*) suchthat, 'c ! (c()),*), n ! Z+,
E|Xn & +c(Xn)| < )/3 and E|X & +c(X)| < )/3.
• Fix c ! (c()),*). Since +c is continuous and bounded inmagnitude by c, by the dominated convergence theorem,.N()) ! Z+ such that
E|+c(Xn)& +c(X)| < )/3 'n , N()).
• By these three “)/3” inequalities and the triangle inequality,
E|X &Xn| - E|Xn & +c(Xn)|+ E|X & +c(X)|+ E|+c(Xn)& +c(X)|< ).
! March 7, 2015 George Kesidis
62
Completeness of Lq
• Definition: Lq(!,F ,P), or just Lq, is the set of random vari-ables X on (!,F ,P) such that ||X||q := (E(|X|q))1/q < *.
• Definition: X1, X2, ... is said to be a Cauchy sequence in Lq
if ') > 0 .N) such that ||Xn &Xm||q < ) whenever n,m > N).
• Definition: A set is said to be complete if all of its Cauchysequences converge to an element inside it.
• For q , 1, || · ||q is a norm by Minkowski’s inequality, ||X +Y ||q - ||X||q + ||Y ||q, so that Lq is a (complete) Banachspace.
• To show that Lq is complete:
– Consider a Cauchy sequence X1, X2, ... in Lq.
– Let N4()) := inf{N' | ' - )} and define X4 = XN 4()); so,by Minkowski’s inequality, 'n , N4,
||Xn||q - ||X4||q + ||Xn &X4||q - ||X4||q + ),
i.e., the sequence {Xn, n , N4} is bounded in Lq.
– Then, one can show that a subsequence Xnk a.s. con-verges to a (measurable) random variable X (use thecompleteness of R and argue by contradiction).
– So by Fatou’s lemma, ||X||qq < *, i.e., X ! Lq.
– Finally, use Fatou’s lemma on ||X & Xn||qq to establishLq-convergence to X.
! March 7, 2015 George Kesidis
63
Caratheodory’s extension theorem
• Theorem: If A is an algebra on ! and P is countably addi-tive on A, then there exists P̄ on !(A) such that P = P̄ onA.
• In addition, if . !1 " !2 " !3... ! A such that !n 1 !, thenthe extension P̄ is unique.
• On R, Caratheodory’s theorem extends a countably additiveprobability measure on the algebra A containing all intervalsand their finite unions, to a !-field that is a strict subset of2R but contains B = !(A).
! March 7, 2015 George Kesidis
64
Product probability spaces
• Consider two probability spaces (!i,Fi,Pi), i = 1,2.
• Define the product sample space
! := !1 6!2 := {("1,"2) | "i ! !i, i = 1,2}
• Note that A0 := {A1 6 A2 | Ai ! Fi, i = 1,2} is closedunder intersections but not under finite unions, e.g., cannotexpress
(A1 6A2) $ (B1 6 B2) as C1 $ C2
where Ai,Bi, Ci ! Fi.
• So, add all finite disjoint unions of elements of A0 to A0 andcall the result A, an algebra.
• Denote F1 6 F2 = !(A).
! March 7, 2015 George Kesidis
65
Product probability space extension
Theorem: There exists a unique P such that
(a) P(A1 6 A2) = P1(A1)P2(A2) 'A1 6 A2 ! A0, and uniquelyextending P to A with finite unions, and
(b) (! = !1 6!2,F = !(A),P) is a probability space.
Proof:
• Sections of measurable sets are measurable, where a sectionof A " ! along "2 is
A"2 := {"1 ! !1 | ("1,"2) ! A} for "2 ! !2,
because '"2 ! !2, M := {A ! F | A"2 ! F1} is a monotoneclass % M = F.
• A = A16A2 ! A0 % A"2 = A1 if "2 ! A2 otherwise A"2 = ) %
P1(A"2) = P1(A1)1A2("2) % P(A) =
(
P1(A"2)dP2("2),
where P1(A"2) is a (!2,F2,P2) random variable.
• Such disintegration extends to A ! A by finite additivity andP is countably additive on A by dominated convergence.
• So, P (and disintegration) extend uniquely to F by Caratheodory.
! March 7, 2015 George Kesidis
66
Fubini-Tonelli Theorem
• Theorem: If
(i) X : ! = !1 6!2 + R is F = F1 6 F2 measurable and,
(ii) "3&i +9
X(")dP("i) is a.s. finite and F3&i-measurable'i ! {1,2},
then
EX :=
(
XdP =
(,(
XdP1
-
dP2 =
(,(
XdP2
-
dP1.
Proof:
– By disintegration,9
XdPi are F3&i-measurable and thetheorem holds for f = 1A, A ! F.
– Extend to simple functions and take limits via dominatedconvergence to prove for the case where E|X| < *.
• Considering hypothesis (ii), recall how absolute summabilityof a sequence implies its (unique) summability in any order.
! March 7, 2015 George Kesidis
67
Consistency of Probability Measures
• Consider the product space (RZ+
,BZ+
) =: (R*,B*), whereZ+ := {0,1,2,3, ...}.
• Again, underlying probability space (!,F ,P).
• A cylinder event A ! B* is of the form
A = A0 6 A1 6 A2 6 ...
where all but a finite number of Ai = !, i.e., there is a finiteindex IA " Z+ such that Ai = R 'i (! IA.
• A family of probability measures {Pn}n!Z+, Pn on (Rn,Bn), issaid to be consistent if
Pn(A0 6 A1 6 . . . An&1) = P
n+1(A0 6A1 6 . . . An&1 6 R)
for all cylinder sets A0 6A1 6 . . . An&1.
! March 7, 2015 George Kesidis
68
Kolmogorov’s Extension Theorem
For each consistent family of probability measures Pn
on (Rn,Bn), .! consistent P* on (R*,B*).
• Clearly, require that
P*(A) = P
n(A0 6 A1 6 ...6An&1)
for all cylinder sets A = A0 6A1 6 ... and all n ! Z+.
• Since P* is specified for all cylinder sets, P* is unique onthe algebra generated by them and, by the monotone classtheorem, unique on B* too.
• For existence:
– Let A be the set of finite unions of cylinder sets, including), so that !(A) = B*.
– Show P* is finitely additive on A and apply Caratheodory’sextension theorem.
! March 7, 2015 George Kesidis
69
Consistency and FDDs
• Consider a discrete-time/parameter stochastic process
X := {Xt | t ! Z+}where each Xt is itself a random variable.
• Let Ft1,t2,...,tn be the joint CDF of Xt1, Xt2, ...,Xtn for some finiten and di"erent tk ! Z+ for all k ! {1,2, ..., n}, i.e.,
Ft1,...,tn(x1, ..., xn) = P(Xt1 - x1, ...,Xtn - xn),
where P is the underlying probability measure.
• This is called an n-dimensional distribution of X.
! March 7, 2015 George Kesidis
70
KET and Consistent FDDs
• A family of such joint CDFs is called a set of finite-dimensionaldistributions (FDDs).
• The FDDs are consistent if one can marginalize (reduce thedimension) of one and obtain another, e.g.,
Ft1,t4(x1, x4) := Ft1,t2,t3,t4(x1,*,*, x4).
then using KET one can prove .! a discrete-time stochasticprocess X on R* (with distribution P*), i.e.,
dPn := dF0,1,...,n&1 and dP* := dFZ+
• Samples " of the underlying probability space space are ac-tually sample paths of the stochastic process, i.e., Xt(").
• KET can be extended to continuous-time stochastic pro-cesses, i.e., sample paths in Xt(") ! RR+
instead of ! RZ+
.
• In the following, we will focus on the underlying probabilityspace ! and the !-algebras !(Xs | s - t) for t ! R+ = [0,*),i.e., in continuous-time...
! March 7, 2015 George Kesidis
71
Uncountable products
Suppose I is an uncountably infinite index set and 't ! I: (!,Ft)is a sample space and !-algebra of events.
• Theorem: If A ! !(Ft, t ! I) =: GI, then there is somecountable J " I (depending on A) such that A ! GJ.Proof:
– Define H = {A ! GI | . countable J " I s.t. A ! GJ}.
– Clearly, ! ! H and H is closed under countable unionsso that H is a !-algebra.
– Finally since Ft " H 't ! I, H = GI.
• Corollary: If Y : ! + R̄ is GI-measurable, then . a countableJ " I such that Y is GJ-measurable.Proof:
– Use previous theorem if Y = 1A and easily extended tosimple (discretely distributed) Y .
– Extend to any (measurable) random variable Y by ap-proximating with simple functions.
• For the special case where G[0, t] = !(Xs | s - t), i.e., Ft =!(Xt) for random variables Xt: if Y is Ft-measurable then .countable {t0, t1, ...} " [0, t] and a BZ+
-measurable mapping& such that
Y = &(Xt0, Xt1, ...) a.s.
Recall Doob’s theorem.
! March 7, 2015 George Kesidis
72