Upload
others
View
3
Download
0
Embed Size (px)
Citation preview
American Mathematical Society
Daniel W. Stroock
Mathematics ofProbability
��������������in Mathematics
��������
Mathematics of Probability
Daniel W. Stroock
American Mathematical SocietyProvidence, Rhode Island
Graduate Studies in Mathematics
Volume 149
https://doi.org/10.1090//gsm/149
EDITORIAL COMMITTEE
David Cox (Chair)Daniel S. FreedRafe Mazzeo
Gigliola Staffilani
2010 Mathematics Subject Classification. Primary 60A99, 60J10, 60J99, 60G42, 60G44.
For additional information and updates on this book, visitwww.ams.org/bookpages/gsm-149
Library of Congress Cataloging-in-Publication Data
Stroock, Daniel W.Mathematics of probability / Daniel W. Stroock.
pages cm. — (Graduate studies in mathematics ; volume 149)Includes bibliographical references and index.ISBN 978-1-4704-0907-4 (alk. paper)1. Stochastic processes. 2. Probabilities. I. Title.
QA274.S854 2013519.2—dc23
2013011622
Copying and reprinting. Individual readers of this publication, and nonprofit librariesacting for them, are permitted to make fair use of the material, such as to copy a chapter for usein teaching or research. Permission is granted to quote brief passages from this publication inreviews, provided the customary acknowledgment of the source is given.
Republication, systematic copying, or multiple reproduction of any material in this publicationis permitted only under license from the American Mathematical Society. Requests for suchpermission should be addressed to the Acquisitions Department, American Mathematical Society,201 Charles Street, Providence, Rhode Island 02904-2294 USA. Requests can also be made bye-mail to [email protected].
c© 2013 by the author.Printed in the United States of America.
©∞ The paper used in this book is acid-free and falls within the guidelinesestablished to ensure permanence and durability.
Visit the AMS home page at http://www.ams.org/
10 9 8 7 6 5 4 3 2 1 18 17 16 15 14 13
Contents
Preface ix
Chapter 1. Some Background and Preliminaries 1
§1.1. The Language of Probability Theory 21.1.1. Sample Spaces and Events 31.1.2. Probability Measures 4
Exercises for § 1.1 6
§1.2. Finite and Countable Sample Spaces 71.2.1. Probability Theory on a Countable Space 71.2.2. Uniform Probabilities and Coin Tossing 101.2.3. Tournaments 131.2.4. Symmetric Random Walk 151.2.5. DeMoivre’s Central Limit Theorem 171.2.6. Independent Events 201.2.7. The Arc Sine Law 241.2.8. Conditional Probability 27
Exercises for § 1.2 29
§1.3. Some Non-Uniform Probability Measures 321.3.1. Random Variables and Their Distributions 321.3.2. Biased Coins 331.3.3. Recurrence and Transience of Random Walks 36
Exercises for § 1.3 39
§1.4. Expectation Values 401.4.1. Some Elementary Examples 451.4.2. Independence and Moment Generating Functions 47
iii
iv Contents
1.4.3. Basic Convergence Results 49
Exercises for § 1.4 51
Comments on Chapter 1 52
Chapter 2. Probability Theory on Uncountable Sample Spaces 55
§2.1. A Little Measure Theory 562.1.1. Sigma Algebras, Measurable Functions, and Measures 562.1.2. Π- and Λ-Systems 58
Exercises for § 2.1 59
§2.2. A Construction of Pp on {0, 1}Z+59
2.2.1. The Metric Space {0, 1}Z+59
2.2.2. The Construction 61
Exercises for § 2.2 65
§2.3. Other Probability Measures 652.3.1. The Uniform Probability Measure on [0, 1] 662.3.2. Lebesgue Measure on R 682.3.3. Distribution Functions and Probability Measures 70
Exercises for § 2.3 71
§2.4. Lebesgue Integration 712.4.1. Integration of Functions 722.4.2. Some Properties of the Lebesgue Integral 772.4.3. Basic Convergence Theorems 802.4.4. Inequalities 842.4.5. Fubini’s Theorem 88
Exercises for § 2.4 91
§2.5. Lebesgue Measure on RN 952.5.1. Polar Coordinates 982.5.2. Gaussian Computations and Stirling’s Formula 99
Exercises for § 2.5 102
Comments on Chapter 2 104
Chapter 3. Some Applications to Probability Theory 105
§3.1. Independence and Conditioning 1053.1.1. Independent σ-Algebras 1053.1.2. Independent Random Variables 1073.1.3. Conditioning 1093.1.4. Some Properties of Conditional Expectations 113
Exercises for § 3.1 114
§3.2. Distributions that Admit a Density 1173.2.1. Densities 117
Contents v
3.2.2. Densities and Conditioning 119
Exercises for § 3.2 120
§3.3. Summing Independent Random Variables 1213.3.1. Convolution of Distributions 1213.3.2. Some Important Examples 1223.3.3. Kolmogorov’s Inequality and the Strong Law 124
Exercises for § 3.3 130
Comments on Chapter 3 134
Chapter 4. The Central Limit Theorem and Gaussian Distributions 135
§4.1. The Central Limit Theorem 1354.1.1. Lindeberg’s Theorem 137
Exercises for § 4.1 142
§4.2. Families of Normal Random Variables 1434.2.1. Multidimensional Gaussian Distributions 1434.2.2. Standard Normal Random Variables 1444.2.3. More General Normal Random Variables 1464.2.4. A Concentration Property of Gaussian Distributions 1474.2.5. Linear Transformations of Normal Random Variables 1504.2.6. Gaussian Families 152
Exercises for § 4.2 155
Comments on Chapter 4 158
Chapter 5. Discrete Parameter Stochastic Processes 159
§5.1. Random Walks Revisited 1595.1.1. Immediate Rewards 1595.1.2. Computations via Conditioning 162
Exercises for § 5.1 167
§5.2. Processes with the Markov Property 1685.2.1. Sequences of Dependent Random Variables 1685.2.2. Markov Chains 1715.2.3. Long-Time Behavior 1715.2.4. An Extension 174
Exercises for § 5.2 178
§5.3. Markov Chains on a Countable State Space 1795.3.1. The Markov Property 1815.3.2. Return Times and the Renewal Equation 1825.3.3. A Little Ergodic Theory 185
Exercises for § 5.3 188
Comments on Chapter 5 190
vi Contents
Chapter 6. Some Continuous-Time Processes 193
§6.1. Transition Probability Functions and Markov Processes 1936.1.1. Transition Probability Functions 194
Exercises for § 6.1 196
§6.2. Markov Chains Run with a Poisson Clock 1966.2.1. The Simple Poisson Process 1976.2.2. A Generalization 1996.2.3. Stationary Measures 200
Exercises for § 6.2 203
§6.3. Brownian Motion 2046.3.1. Some Preliminaries 2056.3.2. Levy’s Construction 2066.3.3. Some Elementary Properties of Brownian Motion 2096.3.4. Path Properties 2166.3.5. The Ornstein–Uhlenbeck Process 219
Exercises for § 6.3 222
Comments on Chapter 6 224
Chapter 7. Martingales 225
§7.1. Discrete Parameter Martingales 2257.1.1. Doob’s Inequality 226
Exercises for § 7.1 232
§7.2. The Martingale Convergence Theorem 2337.2.1. The Convergence Theorem 2347.2.2. Application to the Radon–Nikodym Theorem 237
Exercises for § 7.2 241
§7.3. Stopping Times 2427.3.1. Stopping Time Theorems 2427.3.2. Reversed Martingales 2477.3.3. Exchangeable Sequences 249
Exercises for § 7.3 252
§7.4. Continuous Parameter Martingales 2547.4.1. Progressively Measurable Functions 2547.4.2. Martingales and Submartingales 2557.4.3. Stopping Times Again 2577.4.4. Continuous Martingales and Brownian Motion 2597.4.5. Brownian Motion and Differential Equations 266
Exercises for § 7.4 271
Comments on Chapter 7 274
Contents vii
Notation 275
Bibliography 279
Index 281
Preface
There are a myriad of books about probability theory already on the mar-ket. Nonetheless, a few years ago Sergei Gelfand asked if I would write aprobability theory book for advanced undergraduate and beginning gradu-ate students who are interested in mathematics. He had in mind an updatedversion of the first volume of William Feller’s renowned An Introduction toProbability Theory and Its Applications [3]. Had I been capable of doingso, I would have loved to oblige, but, unfortunately, I am neither the math-ematician that Feller was nor have I his vast reservoir of experience withapplications. Thus, shortly after I started the project, I realized that Iwould not be able to produce the book that Gelfand wanted. In addition,I learned that there already exists a superb replacement for Feller’s book.Namely, for those who enjoy combinatorics and want to see how probabilitytheory can be used to obtain combinatorial results, it is hard to imaginea better book than N. Alon and J. H. Spencer’s The Probabilistic Method[1]. For these reasons, I have written instead a book that is a much moreconventional introduction to the ideas and techniques of modern probabil-ity theory. I have already authored such a book, Probability Theory, AnAnalytic View [9], but that book makes demands on the reader that thisone does not. In particular, that book assumes a solid grounding in analy-sis, especially Lebesgue’s theory of integration. In the hope that it will beappropriate for students who lack that background, I have made this onemuch more self-contained and developed the measure theory that it uses.
Chapter 1 contains my attempt to explain the basic concepts in proba-bility theory unencumbered by measure-theoretic technicalities. After intro-ducing the terminology, I devote the rest of the chapter to probability theoryon finite and countable sample spaces. In large part because I am such a
ix
x Preface
poor combinatorialist myself, I have emphasized computations that do notrequire a mastery of counting techniques. Most of the examples involveBernoulli trials. I have not shied away from making the same computationsseveral times, each time employing a different line of reasoning. My hope isthat in this way I will have made it clear to the reader why concepts likeindependence and conditioning have been developed.
Many of the results in Chapter 1 are begging for the existence of aprobability measure on an uncountable sample space. For example, whendiscussing random walks in Chapter 1, only computations involving a finitenumber of steps can be discussed. Thus, answers to questions about recur-rence were deficient. Using this deficiency as motivation, in Chapter 2 Ifirst introduce the fundamental ideas of measure theory and then construct
the Bernoulli measures on {0, 1}Z+. Once I have the Bernoulli measures, I
obtain Lebesgue measure as the image of the symmetric Bernoulli measureand spend some time discussing its translation invariance properties. Theremainder of Chapter 2 gives a brief introduction to Lebesgue’s theory ofintegration.
With the tools developed in Chapter 2 at hand, Chapter 3 explains howKolmogorov fashioned those tools into what has become the standard mathe-matical model of probability theory. Specifically, Kolmogorov’s formulationsof independence and conditioning are given, and the chapter concludes withhis strong law of large numbers.
Chapter 4 is devoted to Gaussian distributions and normal random vari-ables. It begins with Lindeberg’s derivation of the central limit theoremand then moves on to explain some of the transformation properties ofmulti-dimensional normal random variables. The final topic here is centeredGaussian families.
In the first section of Chapter 5 I revisit the topic that I used to motivatethe contents of Chapter 2. That is, I do several computations of quantitiesthat require the Bernoulli measures constructed in § 2.2. I then turn to asomewhat cursory treatment of Markov chains, concluding with a discussionof their ergodic properties when the state space is finite or countable.
Chapter 6 begins with Markov processes that are the continuous pa-rameter analog of Markov chains. Here I also introduce transition probabil-ity functions and discuss some properties of general continuous parameterMarkov processes. The second part of this chapter contains Levy’s construc-tion of Brownian motion and proves a few of its elementary path properties.The chapter concludes with a brief discussion of the Ornstein–Uhlenbeckprocess.
Martingale theory is the subject of Chapter 7. The first three sectionsgive the discrete parameter theory, and the continuous parameter theory
Preface xi
is given in the final section. In both settings, I try to emphasize the con-tributions that martingale theory can make to topics treated earlier. Inparticular, in the final section, I discuss the relationship between continuousmartingales and Brownian motion and give some examples that indicate howmartingales provide a bridge between differential equations and probabilitytheory.
In conclusion, it is clear that I have not written the book that Gelfandasked for, but, if I had written that book, it undoubtedly would have lookedpale by comparison to Feller’s. Nonetheless, I hope that, for those who readit, the book that I have written will have some value. I will be posting anerrata file on www.ams.org/bookpages/gsm-149. I expect that this file willgrow over time.
Daniel W. Stroock Nederland, CO
Notation
General
Notation Description See
a ∧ b & a ∨ b The minimum and the maximum of a and b
a+ & a− The non-negative part, a ∨ 0, and non-positivepart, −(a ∧ 0), of a ∈ R
δi,j Kronecker delta: δi,j = 0 if i = j and δi,j = 1 ifi = j
(x, y)RN The inner product of x, y ∈ RN
f � S The restriction of the function f to the set S
‖ · ‖u The uniform (supremum) norm
Γ(t) Euler’s Gamma function (2.5.6)
�t� max{n ∈ Z : n ≤ t}
$t% min{n ∈ Z : n ≥ t}
Sets and Spaces
A� The complement of the set A
1A The indicator function of the set A: 1A(ω) = 1 ifω ∈ A and 1A(ω) = 0 if ω /∈ A
§ 1.2.7
B(a, r) The ball of radius r around a
B(E;R
)Space of bounded, Borel measurable functionsfrom E into R
275
276 Notation
K ⊂⊂ E To be read: K is a compact subset of E
N The non-negative integers: N = {0} ∪ Z+
SN−1 The unit sphere in RN
Q The set of rational numbers
Z & Z+ Set of all integers and the subset of positive inte-gers
Cb
(E;R
)Space of bounded continuous functions from Einto R
Cc
(G;R
)The space of continuous, R-valued functions hav-ing compact support in the open set G
C1,2([0,∞)× R;R) The space of functions (t,x) ∈ [0,∞) × R −→ R
which are continuously differentiable once in t andtwice in x
W The space of paths w ∈ C([0,∞);R) with w(0)= 0
§ 6.3
Measure-Theoretic
L1(μ;R) The space of R-valued functions, μ-integrablefunctions
M1(E) The space of Borel probability measures on E
BE The Borel σ-algebra over a topological space E
B(E;R) The space of bounded, measurable functions on E
EP[X, A
]To be read: the expectation value of X with re-spect to μ on A; equivalent to
∫AX dμ; when A is
unspecified, it is assumed to be the whole space∫Γf dμ Integral of f on A with respect to μ § 2.4.1
EP[X∣∣Σ] To be read: the conditional expectation value of X
given the σ-algebra Σ§5.1.1
δa The unit point mass at a
F∗μ The image (pushforward) of μ under F
σ({Xi : i ∈ I}) The σ-algebra generated by the set of randomvariables {Xi : i ∈ I}
λA Lebesgue measure on the set A; usually A = RN
or some interval
Notation 277
γm,C The Gaussian distribution with mean m and co-variance C
§ 4.2.3
N(m,C) Class of normal (a.k.a. Gaussian) random vari-ables with distribution γm,C
§ 4.2.3
μ � ν The convolution of measures μ with ν
μ' ν The measure μ is absolutely continuous with re-spect to ν
§ 7.2.2
μ ⊥ ν The measure μ is singular to ν § 7.2.2
W Wiener measure, the distribution of Brownian mo-tion
§ 6.3
Bibliography
[1] Alon, N. and Spencer, J. H., The Probabilistic Method, 2nd. ed., J. Wiley, New York,2000.
[2] Diaconis, P., Group Representations in Probability and Statistics, Institute of Math-ematical Statistics Lecture Notes–Monograph Series, 11, Institute of MathematicalStatistics, Hayward, CA, 1988.
[3] Feller, Wm., An Introduction to Probability Theory and Its Applications, Vol. I,J. Wiley, New York, 1968.
[4] Kolmogorov, A. N., Foundations of Probability, Chelsea, New York, 1956.
[5] Morters, P. and Peres, Y., Brownian Motion, with an appendix by Oded Schrammand Wendelin Werner, Cambridge Series in Statistical and Probabilistic Mathematics,Cambridge Univ. Press, Cambridge, 2010.
[6] Neveu, J., Discrete-Parameter Martingales, translated from the French by T. P.Speed, revised edition, North-Holland Mathematical Library, Vol. 10, North-HollandPublishing Co., Amsterdam-Oxford; American Elsevier Publishing Co., Inc., NewYork, 1975.
[7] Revuz, D. and Yor, M., Continuous Martingales and Brownian Motion, Grunlehren,293, Springer-Verlag, Heidelberg, 1999.
[8] Stirzaker, D., Elementary Probability, Oxford Univ. Press, Oxford, 1994.
[9] Stroock, D., Probability Theory, An Analytic View, 2nd. ed., Cambridge Univ. Press,New York, 2011.
[10] , Essentials of Integration Theory for Analysis, Springer-Verlag, Heidelberg,2010.
[11] , Markov Processes from K. Ito’s Perspective, Annals of Mathematics Studies,155, Princeton Univer. Press, Princeton, NJ, 2003.
[12] , An Introducion to Markov Processes, Graduate Texts in Mathematics, 230,Springer-Verlag, Heidelberg, 2005.
[13] Stroock, D. W. and Varadhan, S. R. S., Multidimensional Diffusion Processes, reprintof the 1997 edition, Classics in Mathematics, Springer-Verlag, Berlin, 2006.
[14] Wax, N., editor, Selected Papers on Noise and Stochastic Processes, Dover Press,N.Y., 1954.
[15] Williams, D., Probability and Martingales, Cambridge Univ. Press, Cambridge, 2001.
279
Index
absolutely continuous, 237almost everywhere, 77
convergence, 78almost surely, 110arc sine law, 26
Azuma’s inequality, 232
Bayes’s formula, 28
Bernoulli measure, 33Bernstein polynomial, 132Berry–Esseen theorem, 142biased random walk, 35
binomial, 11coefficient, 11
N choose m, 11
distributionparameters (n, p), 34
Borelmeasurable function, 56
measurable set, 56measure, 57
Borel–Cantelli lemma, 32, 244
branching process, 178extinction, 178
Brownian motion, 206Feynman–Kac formula, 269
Levy’s characterization, 263relative to {Ft : t ≥ 0}, 210scaling property, 209
strong law, 213time inversion invariance, 213
centered Gaussian family, 152
central limit theorem, 140
DeMoivre’s, 18
Lindeberg’s, 137
Chapman–Kolmogorov equation, 194
Chebychev’s inequality, 44
complement, 4
concave function, 85
conditional expectation, 110
conditional probability, 27
contraction, 171
convergence
μ-almost everywhere, 78
in μ-measure, 79
convex
function, 85
set, 84
convolution, 121
countably additive, 57
covariance, 143
cumulant of a random variable, 51
DeMorgan’s law, 6
decreasing events, 5
density of a distribution, 117
difference, 4
discrete arcsine measure, 33
disjoint sets, 4
distribution, 59
having density, 117
of a random variable, 32
of a stochastic process, 193
distribution function, 70
Doeblin’s
281
282 Index
condition, 174theorem, 173
Doob’s decomposition lemma, 234Doob’s stopping time theorem
continuous parameter, 259discrete parameter, 243
doubly stochastic, 188
empiricalmean, 145, 155variance, 145, 155
empty set, 4ergodic, 177
theory, 185error function, 118Euler’s
Beta function, 103Gamma function, 100
event, 3exchangeable random variables, 249expected value, 42, 105
exists, 42non-negative discrete, 42of RN -valued random variable, 143
exponential distribution, 117
Fatou’s lemma, 50Feynman–Kac formula, 269finite measure, 57Fubini’s theorem, 90
gambler’s ruin problem, 163Gauss density, 100Gaussian distribution, 117
concentration property, 150Maury–Pisier estimate, 147parameters m and σ2, 117standard, 144tail estimate, 157with mean m and covariance C, 146
Gaussian family, 152centered, 152
graph, 13complete, 13
two-colorings, 30edges, 13vertices, 13
Hardy–Littlewood maximal function,229
inequality, 229Hewitt–Savage 0–1 law, 251Holder’s inequality, 88
Hunt’s stopping time theoremcontinuous parameter, 259discrete parameter, 245
identically distributed randomvariables, 129
image of a measure, 59increasing events, 5independent
σ-algebras, 105events, 21random variables, 48
existence of, 109indicator function, 24inequality
Chebychev’s, 44Holder’s, 88Jensen’s, 85Markov’s, 43Minkowski’s, 88Schwarz’s, 92
integer part, 22integrable, 42, 77integral
exists, 76of a function, 73
intersection, 3Ito’s formula, 263
Jensen’s inequality, 85for conditional expectations, 116
Kolmogorov’s0–1 law, 107backward equation, 204forward equation, 204inequality, 126strong law, 129
L1(μ;R), 77Λ-system, 58Laplace transform, 132law of large numbers
Kolmogorov’s strong law, 129weak, 125
law of the iterated logarithm, 133Lebesgue decomposition, 239Lebesgue measure
on [0 1], 67on R, 68on RN , 95
alternative construction, 103scaling property, 70
Index 283
under linear transformations, 96Lebesgue’s dominated convergence
theorem, 50limits of sets, 6
marginal distribution, 119Markov chain, 171
renewal equation, 182state space, 179time-homogeneous, 171
Markov process, 194time-homogeneous, 194
Markov property, 171Markov’s inequality, 43, 77martingales, 225
Azuma’s inequality, 232continuous, 259continuous parameter, 255convergence theorem, 236, 246discrete parameter, 225Doob’s decomposition lemma, 234reversed, 248upcrossing inequality, 246
mean value, 43, 105, 143measurable function, 56, 72
Borel measurable, 56measurable space, 56measure, 57
Borel, 57finite, 57non-atomic, 65probability, 57
measure space, 57finite, 57probability, 57
median, 45variational characterization, 52vs. expectation value, 52
minimum principle, 166Minkowski’s inequality, 88moment generating function, 47, 120moment of a random variable, 47monotone class, 59monotone convergence theorem, 50
negative part, 7nested partitions, 230non-atomic measure, 65normal random variable, 118
conditioning, 154standard, 118, 144with mean m and covariance C, 146
optional stopping time, 271Ornstein–Uhlenbeck process, 222
Π-system, 58Paley–Wiener integral, 222point mass, 174Poisson
approximation, 36measure, 36process
compound, 203simple, 199
random variable, 39, 51polar coordinates, 99positive part, 7probability function, 10probability measure, 6, 57
determined by p, 10probability space, 57progressively measurable, 225
continuous parameter, 254discrete parameter, 225
Radon–Nikodymderivative, 237theorem, 237, 238
random variable, 32, 105kth moment of, 47exchangeable, 249integrable, 42mean value, 43normal, 118
standard, 118, 144variance of, 44
random variablesmutually independent, 107sums of independent, 39
random walkbiased, 35
transience, 37symmetric, 15
recurrence, 23reflection principle
for Brownian motion, 216for symmetric random walks, 16
renewal equation, 23, 182return time
symmetric random walk, 22reversed
martingale, 248submartingale, 248
right-continuous, 70
284 Index
paths, 255
σ-algebraBorel, 56countably generated, 110generated by, 56generated by a random variable, 107over Ω, 56
σ-finite, 89sample
point, 3space, 3
Schwarz’s inequality, 92simple function, 73simple Poisson process, 199singular measures, 239square integrable, 129standard normal random variable, 144state space, 179, 193stationary distribution, 174
for transition probability function,200
non-existence of, 190Stirling’s formula, 101, 142stochastic process, 159
homogeneous, independentincrements, 199
stopping time, 242continuous parameter, 257discrete parameter, 242old definition, 271optional, 271
stopping time theoremDoob’s, 243, 259Hunt’s, 245, 259
strong law of large numbers, 125sub-Gaussian, 121submartingale, 226
continuous parameter, 255discrete parameter, 226reversed, 248
supermartingale, 235surface measure on SN−1, 98
tail σ-algebra, 106time shift map, 181, 211time-homogeneous, 171tournament, 14transient, 37transition probability, 169
doubly stochastic, 188transition probability function, 194
backward equation, 204forward equation, 204
translation invariant, 69, 95translation map
on [0 1), 67on R, 68
triangle inequality, 92
uniform distribution, 117uniform probability measure
on [0 1], 67characterization, 68
on finite set, 10uniformly integrable, 93union of sets, 3upcrossing inequality, 246
variance, 44variation distance, 172
weak law of large numbers, 125Weierstrass approximation theorem, 131Wiener measure, 205
GSM/149
For additional information and updates on this book, visit
www.ams.org/bookpages/gsm-149
www.ams.orgAMS on the Webwww.ams.org
This book covers the basics of modern probability theory. It begins with probability �������������� ������������������������� �����������������������������������course on measure theory, which is followed by some initial applications to probability theory, including independence and conditional expectations. The second half of the book deals with Gaussian random variables, with Markov chains, with a few continuous ��������������������� ���� ����������������� � �� ����������������������� �����discrete and continuous parameter ones.
The book is a self-contained introduction to probability theory and the measure theory required to study it.