Mathematics of Probability · Stochastic processes. 2. Probabilities. I. Title. QA274.S854 2013 519.2—dc23 2013011622 Copying and reprinting. Individual readers of this publication,

American Mathematical Society

Daniel W. Stroock

Mathematics ofProbability

��in Mathematics

��

Mathematics of Probability

Daniel W. Stroock

American Mathematical SocietyProvidence, Rhode Island

Graduate Studies in Mathematics

Volume 149

https://doi.org/10.1090//gsm/149

EDITORIAL COMMITTEE

David Cox (Chair)Daniel S. FreedRafe Mazzeo

Gigliola Staffilani

2010 Mathematics Subject Classification. Primary 60A99, 60J10, 60J99, 60G42, 60G44.

For additional information and updates on this book, visitwww.ams.org/bookpages/gsm-149

Library of Congress Cataloging-in-Publication Data

Stroock, Daniel W.Mathematics of probability / Daniel W. Stroock.

pages cm. — (Graduate studies in mathematics ; volume 149)Includes bibliographical references and index.ISBN 978-1-4704-0907-4 (alk. paper)1. Stochastic processes. 2. Probabilities. I. Title.

QA274.S854 2013519.2—dc23

2013011622

Copying and reprinting. Individual readers of this publication, and nonprofit librariesacting for them, are permitted to make fair use of the material, such as to copy a chapter for usein teaching or research. Permission is granted to quote brief passages from this publication inreviews, provided the customary acknowledgment of the source is given.

Republication, systematic copying, or multiple reproduction of any material in this publicationis permitted only under license from the American Mathematical Society. Requests for suchpermission should be addressed to the Acquisitions Department, American Mathematical Society,201 Charles Street, Providence, Rhode Island 02904-2294 USA. Requests can also be made bye-mail to [email protected].

c© 2013 by the author.Printed in the United States of America.

©∞ The paper used in this book is acid-free and falls within the guidelinesestablished to ensure permanence and durability.

Visit the AMS home page at http://www.ams.org/

10 9 8 7 6 5 4 3 2 1 18 17 16 15 14 13

Contents

Preface ix

Chapter 1. Some Background and Preliminaries 1

§1.1. The Language of Probability Theory 21.1.1. Sample Spaces and Events 31.1.2. Probability Measures 4

Exercises for § 1.1 6

§1.2. Finite and Countable Sample Spaces 71.2.1. Probability Theory on a Countable Space 71.2.2. Uniform Probabilities and Coin Tossing 101.2.3. Tournaments 131.2.4. Symmetric Random Walk 151.2.5. DeMoivre’s Central Limit Theorem 171.2.6. Independent Events 201.2.7. The Arc Sine Law 241.2.8. Conditional Probability 27


§1.3. Some Non-Uniform Probability Measures 321.3.1. Random Variables and Their Distributions 321.3.2. Biased Coins 331.3.3. Recurrence and Transience of Random Walks 36


§1.4. Expectation Values 401.4.1. Some Elementary Examples 451.4.2. Independence and Moment Generating Functions 47

iii

iv Contents

1.4.3. Basic Convergence Results 49


Comments on Chapter 1 52

Chapter 2. Probability Theory on Uncountable Sample Spaces 55

§2.1. A Little Measure Theory 562.1.1. Sigma Algebras, Measurable Functions, and Measures 562.1.2. Π- and Λ-Systems 58


§2.2. A Construction of Pp on {0, 1}Z+59

2.2.1. The Metric Space {0, 1}Z+59

2.2.2. The Construction 61


§2.3. Other Probability Measures 652.3.1. The Uniform Probability Measure on [0, 1] 662.3.2. Lebesgue Measure on R 682.3.3. Distribution Functions and Probability Measures 70


§2.4. Lebesgue Integration 712.4.1. Integration of Functions 722.4.2. Some Properties of the Lebesgue Integral 772.4.3. Basic Convergence Theorems 802.4.4. Inequalities 842.4.5. Fubini’s Theorem 88


§2.5. Lebesgue Measure on RN 952.5.1. Polar Coordinates 982.5.2. Gaussian Computations and Stirling’s Formula 99



Chapter 3. Some Applications to Probability Theory 105

§3.1. Independence and Conditioning 1053.1.1. Independent σ-Algebras 1053.1.2. Independent Random Variables 1073.1.3. Conditioning 1093.1.4. Some Properties of Conditional Expectations 113


§3.2. Distributions that Admit a Density 1173.2.1. Densities 117

Contents v

3.2.2. Densities and Conditioning 119


§3.3. Summing Independent Random Variables 1213.3.1. Convolution of Distributions 1213.3.2. Some Important Examples 1223.3.3. Kolmogorov’s Inequality and the Strong Law 124



Chapter 4. The Central Limit Theorem and Gaussian Distributions 135

§4.1. The Central Limit Theorem 1354.1.1. Lindeberg’s Theorem 137


§4.2. Families of Normal Random Variables 1434.2.1. Multidimensional Gaussian Distributions 1434.2.2. Standard Normal Random Variables 1444.2.3. More General Normal Random Variables 1464.2.4. A Concentration Property of Gaussian Distributions 1474.2.5. Linear Transformations of Normal Random Variables 1504.2.6. Gaussian Families 152



Chapter 5. Discrete Parameter Stochastic Processes 159

§5.1. Random Walks Revisited 1595.1.1. Immediate Rewards 1595.1.2. Computations via Conditioning 162


§5.2. Processes with the Markov Property 1685.2.1. Sequences of Dependent Random Variables 1685.2.2. Markov Chains 1715.2.3. Long-Time Behavior 1715.2.4. An Extension 174


§5.3. Markov Chains on a Countable State Space 1795.3.1. The Markov Property 1815.3.2. Return Times and the Renewal Equation 1825.3.3. A Little Ergodic Theory 185



vi Contents

Chapter 6. Some Continuous-Time Processes 193

§6.1. Transition Probability Functions and Markov Processes 1936.1.1. Transition Probability Functions 194


§6.2. Markov Chains Run with a Poisson Clock 1966.2.1. The Simple Poisson Process 1976.2.2. A Generalization 1996.2.3. Stationary Measures 200


§6.3. Brownian Motion 2046.3.1. Some Preliminaries 2056.3.2. Levy’s Construction 2066.3.3. Some Elementary Properties of Brownian Motion 2096.3.4. Path Properties 2166.3.5. The Ornstein–Uhlenbeck Process 219



Chapter 7. Martingales 225

§7.1. Discrete Parameter Martingales 2257.1.1. Doob’s Inequality 226


§7.2. The Martingale Convergence Theorem 2337.2.1. The Convergence Theorem 2347.2.2. Application to the Radon–Nikodym Theorem 237


§7.3. Stopping Times 2427.3.1. Stopping Time Theorems 2427.3.2. Reversed Martingales 2477.3.3. Exchangeable Sequences 249


§7.4. Continuous Parameter Martingales 2547.4.1. Progressively Measurable Functions 2547.4.2. Martingales and Submartingales 2557.4.3. Stopping Times Again 2577.4.4. Continuous Martingales and Brownian Motion 2597.4.5. Brownian Motion and Differential Equations 266



Contents vii

Notation 275

Bibliography 279

Index 281

Preface

There are a myriad of books about probability theory already on the mar-ket. Nonetheless, a few years ago Sergei Gelfand asked if I would write aprobability theory book for advanced undergraduate and beginning gradu-ate students who are interested in mathematics. He had in mind an updatedversion of the first volume of William Feller’s renowned An Introduction toProbability Theory and Its Applications [3]. Had I been capable of doingso, I would have loved to oblige, but, unfortunately, I am neither the math-ematician that Feller was nor have I his vast reservoir of experience withapplications. Thus, shortly after I started the project, I realized that Iwould not be able to produce the book that Gelfand wanted. In addition,I learned that there already exists a superb replacement for Feller’s book.Namely, for those who enjoy combinatorics and want to see how probabilitytheory can be used to obtain combinatorial results, it is hard to imaginea better book than N. Alon and J. H. Spencer’s The Probabilistic Method[1]. For these reasons, I have written instead a book that is a much moreconventional introduction to the ideas and techniques of modern probabil-ity theory. I have already authored such a book, Probability Theory, AnAnalytic View [9], but that book makes demands on the reader that thisone does not. In particular, that book assumes a solid grounding in analy-sis, especially Lebesgue’s theory of integration. In the hope that it will beappropriate for students who lack that background, I have made this onemuch more self-contained and developed the measure theory that it uses.

Chapter 1 contains my attempt to explain the basic concepts in proba-bility theory unencumbered by measure-theoretic technicalities. After intro-ducing the terminology, I devote the rest of the chapter to probability theoryon finite and countable sample spaces. In large part because I am such a

ix

x Preface

poor combinatorialist myself, I have emphasized computations that do notrequire a mastery of counting techniques. Most of the examples involveBernoulli trials. I have not shied away from making the same computationsseveral times, each time employing a different line of reasoning. My hope isthat in this way I will have made it clear to the reader why concepts likeindependence and conditioning have been developed.

Many of the results in Chapter 1 are begging for the existence of aprobability measure on an uncountable sample space. For example, whendiscussing random walks in Chapter 1, only computations involving a finitenumber of steps can be discussed. Thus, answers to questions about recur-rence were deficient. Using this deficiency as motivation, in Chapter 2 Ifirst introduce the fundamental ideas of measure theory and then construct

the Bernoulli measures on {0, 1}Z+. Once I have the Bernoulli measures, I

obtain Lebesgue measure as the image of the symmetric Bernoulli measureand spend some time discussing its translation invariance properties. Theremainder of Chapter 2 gives a brief introduction to Lebesgue’s theory ofintegration.

With the tools developed in Chapter 2 at hand, Chapter 3 explains howKolmogorov fashioned those tools into what has become the standard mathe-matical model of probability theory. Specifically, Kolmogorov’s formulationsof independence and conditioning are given, and the chapter concludes withhis strong law of large numbers.

Chapter 4 is devoted to Gaussian distributions and normal random vari-ables. It begins with Lindeberg’s derivation of the central limit theoremand then moves on to explain some of the transformation properties ofmulti-dimensional normal random variables. The final topic here is centeredGaussian families.

In the first section of Chapter 5 I revisit the topic that I used to motivatethe contents of Chapter 2. That is, I do several computations of quantitiesthat require the Bernoulli measures constructed in § 2.2. I then turn to asomewhat cursory treatment of Markov chains, concluding with a discussionof their ergodic properties when the state space is finite or countable.

Chapter 6 begins with Markov processes that are the continuous pa-rameter analog of Markov chains. Here I also introduce transition probabil-ity functions and discuss some properties of general continuous parameterMarkov processes. The second part of this chapter contains Levy’s construc-tion of Brownian motion and proves a few of its elementary path properties.The chapter concludes with a brief discussion of the Ornstein–Uhlenbeckprocess.

Martingale theory is the subject of Chapter 7. The first three sectionsgive the discrete parameter theory, and the continuous parameter theory

Preface xi

is given in the final section. In both settings, I try to emphasize the con-tributions that martingale theory can make to topics treated earlier. Inparticular, in the final section, I discuss the relationship between continuousmartingales and Brownian motion and give some examples that indicate howmartingales provide a bridge between differential equations and probabilitytheory.

In conclusion, it is clear that I have not written the book that Gelfandasked for, but, if I had written that book, it undoubtedly would have lookedpale by comparison to Feller’s. Nonetheless, I hope that, for those who readit, the book that I have written will have some value. I will be posting anerrata file on www.ams.org/bookpages/gsm-149. I expect that this file willgrow over time.

Daniel W. Stroock Nederland, CO

Notation

General

Notation Description See

a ∧ b & a ∨ b The minimum and the maximum of a and b

a+ & a− The non-negative part, a ∨ 0, and non-positivepart, −(a ∧ 0), of a ∈ R

δi,j Kronecker delta: δi,j = 0 if i = j and δi,j = 1 ifi = j

(x, y)RN The inner product of x, y ∈ RN

f � S The restriction of the function f to the set S

‖ · ‖u The uniform (supremum) norm

Γ(t) Euler’s Gamma function (2.5.6)

�t� max{n ∈ Z : n ≤ t}

$t% min{n ∈ Z : n ≥ t}

Sets and Spaces

A� The complement of the set A

1A The indicator function of the set A: 1A(ω) = 1 ifω ∈ A and 1A(ω) = 0 if ω /∈ A

§ 1.2.7

B(a, r) The ball of radius r around a

B(E;R

)Space of bounded, Borel measurable functionsfrom E into R

275

276 Notation

K ⊂⊂ E To be read: K is a compact subset of E

N The non-negative integers: N = {0} ∪ Z+

SN−1 The unit sphere in RN

Q The set of rational numbers

Z & Z+ Set of all integers and the subset of positive inte-gers

Cb

(E;R

)Space of bounded continuous functions from Einto R

Cc

(G;R

)The space of continuous, R-valued functions hav-ing compact support in the open set G

C1,2([0,∞)× R;R) The space of functions (t,x) ∈ [0,∞) × R −→ R

which are continuously differentiable once in t andtwice in x

W The space of paths w ∈ C([0,∞);R) with w(0)= 0

§ 6.3

Measure-Theoretic

L1(μ;R) The space of R-valued functions, μ-integrablefunctions

M1(E) The space of Borel probability measures on E

BE The Borel σ-algebra over a topological space E

B(E;R) The space of bounded, measurable functions on E

EP[X, A

]To be read: the expectation value of X with re-spect to μ on A; equivalent to

∫AX dμ; when A is

unspecified, it is assumed to be the whole space∫Γf dμ Integral of f on A with respect to μ § 2.4.1

EP[X∣∣Σ] To be read: the conditional expectation value of X

given the σ-algebra Σ§5.1.1

δa The unit point mass at a

F∗μ The image (pushforward) of μ under F

σ({Xi : i ∈ I}) The σ-algebra generated by the set of randomvariables {Xi : i ∈ I}

λA Lebesgue measure on the set A; usually A = RN

or some interval

Notation 277

γm,C The Gaussian distribution with mean m and co-variance C

§ 4.2.3

N(m,C) Class of normal (a.k.a. Gaussian) random vari-ables with distribution γm,C

§ 4.2.3

μ � ν The convolution of measures μ with ν

μ' ν The measure μ is absolutely continuous with re-spect to ν

§ 7.2.2

μ ⊥ ν The measure μ is singular to ν § 7.2.2

W Wiener measure, the distribution of Brownian mo-tion

§ 6.3

Bibliography

[1] Alon, N. and Spencer, J. H., The Probabilistic Method, 2nd. ed., J. Wiley, New York,2000.

[2] Diaconis, P., Group Representations in Probability and Statistics, Institute of Math-ematical Statistics Lecture Notes–Monograph Series, 11, Institute of MathematicalStatistics, Hayward, CA, 1988.

[3] Feller, Wm., An Introduction to Probability Theory and Its Applications, Vol. I,J. Wiley, New York, 1968.

[4] Kolmogorov, A. N., Foundations of Probability, Chelsea, New York, 1956.

[5] Morters, P. and Peres, Y., Brownian Motion, with an appendix by Oded Schrammand Wendelin Werner, Cambridge Series in Statistical and Probabilistic Mathematics,Cambridge Univ. Press, Cambridge, 2010.

[6] Neveu, J., Discrete-Parameter Martingales, translated from the French by T. P.Speed, revised edition, North-Holland Mathematical Library, Vol. 10, North-HollandPublishing Co., Amsterdam-Oxford; American Elsevier Publishing Co., Inc., NewYork, 1975.

[7] Revuz, D. and Yor, M., Continuous Martingales and Brownian Motion, Grunlehren,293, Springer-Verlag, Heidelberg, 1999.

[8] Stirzaker, D., Elementary Probability, Oxford Univ. Press, Oxford, 1994.

[9] Stroock, D., Probability Theory, An Analytic View, 2nd. ed., Cambridge Univ. Press,New York, 2011.

[10] , Essentials of Integration Theory for Analysis, Springer-Verlag, Heidelberg,2010.

[11] , Markov Processes from K. Ito’s Perspective, Annals of Mathematics Studies,155, Princeton Univer. Press, Princeton, NJ, 2003.

[12] , An Introducion to Markov Processes, Graduate Texts in Mathematics, 230,Springer-Verlag, Heidelberg, 2005.

[13] Stroock, D. W. and Varadhan, S. R. S., Multidimensional Diffusion Processes, reprintof the 1997 edition, Classics in Mathematics, Springer-Verlag, Berlin, 2006.

[14] Wax, N., editor, Selected Papers on Noise and Stochastic Processes, Dover Press,N.Y., 1954.

[15] Williams, D., Probability and Martingales, Cambridge Univ. Press, Cambridge, 2001.

279

Index

absolutely continuous, 237almost everywhere, 77

convergence, 78almost surely, 110arc sine law, 26

Azuma’s inequality, 232

Bayes’s formula, 28

Bernoulli measure, 33Bernstein polynomial, 132Berry–Esseen theorem, 142biased random walk, 35

binomial, 11coefficient, 11

N choose m, 11

distributionparameters (n, p), 34

Borelmeasurable function, 56

measurable set, 56measure, 57

Borel–Cantelli lemma, 32, 244

branching process, 178extinction, 178

Brownian motion, 206Feynman–Kac formula, 269

Levy’s characterization, 263relative to {Ft : t ≥ 0}, 210scaling property, 209

strong law, 213time inversion invariance, 213

centered Gaussian family, 152

central limit theorem, 140

DeMoivre’s, 18

Lindeberg’s, 137

Chapman–Kolmogorov equation, 194

Chebychev’s inequality, 44

complement, 4

concave function, 85

conditional expectation, 110

conditional probability, 27

contraction, 171

convergence

μ-almost everywhere, 78

in μ-measure, 79

convex

function, 85

set, 84

convolution, 121

countably additive, 57

covariance, 143

cumulant of a random variable, 51

DeMorgan’s law, 6

decreasing events, 5

density of a distribution, 117

difference, 4

discrete arcsine measure, 33

disjoint sets, 4

distribution, 59

having density, 117

of a random variable, 32

of a stochastic process, 193

distribution function, 70

Doeblin’s

281

282 Index

condition, 174theorem, 173

Doob’s decomposition lemma, 234Doob’s stopping time theorem

continuous parameter, 259discrete parameter, 243

doubly stochastic, 188

empiricalmean, 145, 155variance, 145, 155

empty set, 4ergodic, 177

theory, 185error function, 118Euler’s

Beta function, 103Gamma function, 100

event, 3exchangeable random variables, 249expected value, 42, 105

exists, 42non-negative discrete, 42of RN -valued random variable, 143

exponential distribution, 117

Fatou’s lemma, 50Feynman–Kac formula, 269finite measure, 57Fubini’s theorem, 90

gambler’s ruin problem, 163Gauss density, 100Gaussian distribution, 117

concentration property, 150Maury–Pisier estimate, 147parameters m and σ2, 117standard, 144tail estimate, 157with mean m and covariance C, 146

Gaussian family, 152centered, 152

graph, 13complete, 13

two-colorings, 30edges, 13vertices, 13

Hardy–Littlewood maximal function,229

inequality, 229Hewitt–Savage 0–1 law, 251Holder’s inequality, 88

Hunt’s stopping time theoremcontinuous parameter, 259discrete parameter, 245

identically distributed randomvariables, 129

image of a measure, 59increasing events, 5independent

σ-algebras, 105events, 21random variables, 48

existence of, 109indicator function, 24inequality

Chebychev’s, 44Holder’s, 88Jensen’s, 85Markov’s, 43Minkowski’s, 88Schwarz’s, 92

integer part, 22integrable, 42, 77integral

exists, 76of a function, 73

intersection, 3Ito’s formula, 263

Jensen’s inequality, 85for conditional expectations, 116

Kolmogorov’s0–1 law, 107backward equation, 204forward equation, 204inequality, 126strong law, 129

L1(μ;R), 77Λ-system, 58Laplace transform, 132law of large numbers

Kolmogorov’s strong law, 129weak, 125

law of the iterated logarithm, 133Lebesgue decomposition, 239Lebesgue measure

on [0 1], 67on R, 68on RN , 95

alternative construction, 103scaling property, 70

Index 283

under linear transformations, 96Lebesgue’s dominated convergence

theorem, 50limits of sets, 6

marginal distribution, 119Markov chain, 171

renewal equation, 182state space, 179time-homogeneous, 171

Markov process, 194time-homogeneous, 194

Markov property, 171Markov’s inequality, 43, 77martingales, 225

Azuma’s inequality, 232continuous, 259continuous parameter, 255convergence theorem, 236, 246discrete parameter, 225Doob’s decomposition lemma, 234reversed, 248upcrossing inequality, 246

mean value, 43, 105, 143measurable function, 56, 72

Borel measurable, 56measurable space, 56measure, 57

Borel, 57finite, 57non-atomic, 65probability, 57

measure space, 57finite, 57probability, 57

median, 45variational characterization, 52vs. expectation value, 52

minimum principle, 166Minkowski’s inequality, 88moment generating function, 47, 120moment of a random variable, 47monotone class, 59monotone convergence theorem, 50

negative part, 7nested partitions, 230non-atomic measure, 65normal random variable, 118

conditioning, 154standard, 118, 144with mean m and covariance C, 146

optional stopping time, 271Ornstein–Uhlenbeck process, 222

Π-system, 58Paley–Wiener integral, 222point mass, 174Poisson

approximation, 36measure, 36process

compound, 203simple, 199

random variable, 39, 51polar coordinates, 99positive part, 7probability function, 10probability measure, 6, 57

determined by p, 10probability space, 57progressively measurable, 225

continuous parameter, 254discrete parameter, 225

Radon–Nikodymderivative, 237theorem, 237, 238

random variable, 32, 105kth moment of, 47exchangeable, 249integrable, 42mean value, 43normal, 118

standard, 118, 144variance of, 44

random variablesmutually independent, 107sums of independent, 39

random walkbiased, 35

transience, 37symmetric, 15

recurrence, 23reflection principle

for Brownian motion, 216for symmetric random walks, 16

renewal equation, 23, 182return time

symmetric random walk, 22reversed

martingale, 248submartingale, 248

right-continuous, 70

284 Index

paths, 255

σ-algebraBorel, 56countably generated, 110generated by, 56generated by a random variable, 107over Ω, 56

σ-finite, 89sample

point, 3space, 3

Schwarz’s inequality, 92simple function, 73simple Poisson process, 199singular measures, 239square integrable, 129standard normal random variable, 144state space, 179, 193stationary distribution, 174

for transition probability function,200

non-existence of, 190Stirling’s formula, 101, 142stochastic process, 159

homogeneous, independentincrements, 199

stopping time, 242continuous parameter, 257discrete parameter, 242old definition, 271optional, 271

stopping time theoremDoob’s, 243, 259Hunt’s, 245, 259

strong law of large numbers, 125sub-Gaussian, 121submartingale, 226

continuous parameter, 255discrete parameter, 226reversed, 248

supermartingale, 235surface measure on SN−1, 98

tail σ-algebra, 106time shift map, 181, 211time-homogeneous, 171tournament, 14transient, 37transition probability, 169

doubly stochastic, 188transition probability function, 194

backward equation, 204forward equation, 204

translation invariant, 69, 95translation map

on [0 1), 67on R, 68

triangle inequality, 92

uniform distribution, 117uniform probability measure

on [0 1], 67characterization, 68

on finite set, 10uniformly integrable, 93union of sets, 3upcrossing inequality, 246

variance, 44variation distance, 172

weak law of large numbers, 125Weierstrass approximation theorem, 131Wiener measure, 205

GSM/149

For additional information and updates on this book, visit

www.ams.org/bookpages/gsm-149

www.ams.orgAMS on the Webwww.ams.org

This book covers the basics of modern probability theory. It begins with probability �� course on measure theory, which is followed by some initial applications to probability theory, including independence and conditional expectations. The second half of the book deals with Gaussian random variables, with Markov chains, with a few continuous �� discrete and continuous parameter ones.

The book is a self-contained introduction to probability theory and the measure theory required to study it.

Documents

Mathematics of Probability · Stochastic processes. 2. Probabilities. I. Title. QA274.S854 2013 519.2—dc23 2013011622 Copying and reprinting. Individual readers of this publication,