Upload
dmitriigreen
View
233
Download
0
Embed Size (px)
Citation preview
8/11/2019 Probability Slides
1/84
ProbabilityComputer Science Tripos, Part IA
R.J. Gibbens
Computer LaboratoryUniversity of Cambridge
Lent Term 2010/11
Last revision: 2010-12-17/r-43
1
8/11/2019 Probability Slides
2/84
Outline Elementary probability theory (2 lectures)
Probability spaces, random variables, discrete/continuousdistributions, means and variances, independence, conditionalprobabilities, Bayess theorem.
Probability generating functions (1 lecture)Denitions and properties; use in calculating moments of randomvariables and for nding the distribution of sums of independentrandom variables.
Multivariate distributions and independence (1 lecture)Random vectors and independence; joint and marginal densityfunctions; variance, covariance and correlation; conditional densityfunctions.
Elementary stochastic processes (2 lectures)Simple random walks; recurrence and transience; the GamblersRuin Problem and solution using difference equations.
2
8/11/2019 Probability Slides
3/84
8/11/2019 Probability Slides
4/84
Elementary probability theory
4
8/11/2019 Probability Slides
5/84
8/11/2019 Probability Slides
6/84
Event spacesWe formalize the notion of an event space , F , by requiring thefollowing to hold.
Denition (Event space)1. F is non-empty2. E F \E F 3. ( i I . E i F ) i I E i F
Example any set and F = P (), the power set of .
Example
any set with some event E and F = { / 0 , E , \E , }.Note that \E is often written using the shorthand E c for thecomplement of E with respect to .
6
8/11/2019 Probability Slides
7/84
Probability spacesGiven an experiment with outcomes in a sample space with anevent space F we associate probabilities to events by dening aprobability function P : F
R as follows.
Denition (Probability function)
1. E F . P (E ) 02. P() = 1 and P( / 0 ) = 0
3. E i F for i I disjoint (that is, E i E j = / 0 for i = j ) thenP ( i I E i ) =
i I P (E i ) .
We call the triple (, F , P ) a probability space .
7
8/11/2019 Probability Slides
8/84
Examples of probability spaces any set with some event E (E = / 0 , E = ) .Take F = { / 0 , E , \E , } as before and dene the probabilityfunction P(E )by
P (E ) =
0 E = / 0p E = E 1 p E = \E 1 E =
for any 0 p 1. = { 1 , 2 , . . . , n } with F = P () and probabilities given forall E F by
P (E ) = |E
|n . For a six-sided fair die = {1, 2, 3, 4, 5, 6} we take
P ({i }) = 16
.
8
8/11/2019 Probability Slides
9/84
8/11/2019 Probability Slides
10/84
Conditional probabilities
E 1 E 2
Given a probability space (, F , P ) andtwo events E 1 , E 2 F how doesknowledge that the random event E
2, say,
has occurred inuence the probabilitythat E 1 has also occurred?This question leads to the notion ofconditional probability .
Denition (Conditional probability)If P(E 2 ) > 0, dene the conditional probability , P(E 1|E 2 ), of E 1given E 2 by
P (E 1
|E 2 ) =
P(E 1 E 2 )P
(E
2 )
.
Note that P(E 2|E 2 ) = 1.Exercise: check that for any E F such that P(E ) > 0then (, F , Q ) is a probability space where Q : F R is dened byQ
(E ) =P
(E |E ) E F
.10
8/11/2019 Probability Slides
11/84
Independent eventsGiven a probability space (, F , P ) we can dene independencebetween random events as follows.
Denition (Independent events)Two events, E 1 , E 2 F are independent if
P (E 1 E 2 ) = P (E 1 )P (E 2 )
Otherwise, the events are dependent . Note that if E 1 and E 2 areindependent events then
P (E 1 |E 2 ) = P (E 1 )P (E 2 |E 1 ) = P (E 2 ) .
11
8/11/2019 Probability Slides
12/84
Independence of multiple eventsMore generally, a collection of events {E i |i I } are independentevents if for all subsets J of I
P ( j J E j ) = j J P (E j ) .
When this holds just for all those subsets J such that |J |= 2 we havepairwise independence .Note that pairwise independence does not imply independence(unless |I |= 2).
12
8/11/2019 Probability Slides
13/84
Venn diagramsJohn Venn 18341923
Example ( |I |= 3 events)E 1 , E 2 , E 3 are independent events if
P (E 1 E 2 ) = P (E 1 )P (E 2 )P (E 1 E 3 ) = P (E 1 )P (E 3 )P (E 2 E 3 ) = P (E 2 )P (E 3 )
P (E 1 E 2 E 3 ) = P (E 1 )P (E 2 )P (E 3 )13
8/11/2019 Probability Slides
14/84
Bayes theoremThomas Bayes (17021761)
Theorem (Bayes theorem)If E 1 and E 2 are two events with P(E 1 ) > 0 and P(E 2 ) > 0 then
P (E 1|E 2 ) = P(E 2|E 1 )P (E 1 )
P (E 2)
.
Proof.We have that
P (E 1|E 2 )P (E 2 ) = P (E 1 E 2 ) = P (E 2 E 1 ) = P (E 2|E 1 )P (E 1 ) .
Thus Bayes theorem provides a way to reverse the order ofconditioning.
14
8/11/2019 Probability Slides
15/84
Partitions
E 1 E 2
E 3 E 4
E 5
Given a probability space (, F , P ) dene apartition of as follows.
Denition (Partition)A partition of is a collection of disjointevents {E i F |i I } with
i I E i = .
We then have the following theorem (a.k.a. the law of totalprobability ).
Theorem (Partition theorem)If {E i F |i I } is a partition of and P(E i ) > 0 for all i I then
P (E ) = i I
P (E |E i )P (E i ) E F .
15
8/11/2019 Probability Slides
16/84
Proof of partition theoremWe prove the partition theorem as follows.
Proof.
P (E ) = P (E ( i I E i ))= P ( i I (E E i ))=
i I
P (E
E i )
= i I
P (E |E i )P (E i )
16
8/11/2019 Probability Slides
17/84
Bayes theorem and partitionsA (slight) generalization of Bayes theorem can be stated as followscombining Bayes theorem with the partition theorem.
P (E i |E ) = P(E |E i )P (E i )
j I P (E |E j )P (E j ) i I
where {E i F |i I } forms a partition of .
E 1
E 2 = \E 1As a special case consider thepartition {E 1 , E 2 = \E 1}.
Then we have
P (E 1|E ) = P(E |E 1 )P (E 1 )
P (E
|E 1 )P (E 1 ) + P (E
|
\E 1 )P (
\E 1 )
.
17
8/11/2019 Probability Slides
18/84
Bayes theorem exampleSuppose that you have a good game of tablefootball two times in three, otherwise a poorgame.Your chance of scoring a goal is 3/4 in a goodgame and 1/4 in a poor game.
What is your chance of scoring a goal in any given game?Conditional on having scored in a game, what is the chance that youhad a good game?So we know that
P(Good ) = 2/ 3,
P(Poor ) = 1/ 3, P(Score |Good ) = 3/ 4, P(Score |Poor ) = 1/ 4.
18
h l d
8/11/2019 Probability Slides
19/84
Bayes theorem example, ctdThus, noting that {Good , Poor } forms a partition of the sample spaceof outcomes,
P (Score ) = P (Score |Good )P (Good ) + P (Score |Poor )P (Poor )= ( 3/ 4) (2/ 3) + ( 1/ 4) (1/ 3) = 7/ 12 .
Then by Bayes theorem we have that
P (Good |Score ) = P(Score |Good )P (Good )P (Score ) = (3/ 4) (2/ 3)(7/ 12 ) = 6/ 7.
19
8/11/2019 Probability Slides
20/84
8/11/2019 Probability Slides
21/84
E l f di t di t ib ti
8/11/2019 Probability Slides
22/84
Examples of discrete distributions
Example (Bernoulli distribution)
k
P (X = k )
0
0.25
0 .5
0.75
1
0 1
e.g. p = 0.75
Here Im (X ) =
{0, 1
} and for given p [0, 1]
P (X = k ) =p k = 11 p k = 0 .
RV, X Parameters Im (X ) Mean VarianceBernoulli p [0, 1] {0, 1} p p (1 p )
22
E l f di t di t ib ti td
8/11/2019 Probability Slides
23/84
Examples of discrete distributions, ctd
Example (Binomial distribution, Bin (n , p ))
k
P (X = k )0 .3
0 .2
0 .1
00 1 2 3 4 5 6 7 8 9 10
e.g. n = 10 , p = 0.5
Here Im (X ) =
{0, 1, . . . , n
} for some positive
integer n and given p [0, 1]
P (X = k ) =n k
p k (1 p )n k k {0, 1, . . . , n }.
RV, X Parameters Im (X ) Mean VarianceBin(n , p ) n {1, 2, . . .} {0, 1, . . . , n } np np (1 p )p [0, 1]
We use the notationX Bin(n , p )
as a shorthand for the statement that the RV X is distributedaccording to stated Binomial distribution. We shall use this shorthand
notation for our other named distributions.23
Examples of discrete distributions ctd
8/11/2019 Probability Slides
24/84
Examples of discrete distributions, ctd
Example (Geometric distribution, Geo (p ))
k
P (X = k )1 .0
0.75
0 .5
0.25
01 2 3 4 5 6 7 8 9 10
e.g. p = 0.75
Here Im (X ) =
{1, 2, . . .
} and 0 < p
1
P (X = k ) = p (1 p )k 1 k {1, 2, . . .}.
RV, X Parameters Im (X ) Mean VarianceGeo (p ) 0 < p 1 {1, 2, . . .} 1p 1p p 2
Notationally we writeX Geo (p ) .
Beware possible confusion : some authors prefer to dene our X 1as a Geometric RV!
24
8/11/2019 Probability Slides
25/84
Examples of discrete distributions ctd
8/11/2019 Probability Slides
26/84
Examples of discrete distributions, ctd
Example (Poisson distribution, Pois ( ))
k
P (X = k )0 .4
0 .3
0 .2
0 .1
00 1 2 3 4 5 6 7 8 9 10
= 1
= 5
Here Im (X ) =
{0, 1, . . .
} and > 0
P (X = k ) = k e
k ! k {0, 1, . . .}.
RV, X Parameters Im (X ) Mean VariancePois ( ) > 0 {0, 1, . . .}
Notationally we write X Pois ( ) .
26
Expectation
8/11/2019 Probability Slides
27/84
ExpectationOne way to summarize the distribution of some RV, X , would be toconstruct a weighted average of the observed values, weighted bythe probabilities of actually observing these values. This is the idea ofexpectation dened as follows.
Denition (Expectation)The expectation , E(X ), of a discrete RV X is dened as
E (X ) = x Im(X )
x P (X = x )
so long as this sum is (absolutely) convergent (that is,x Im(X ) |x P (X = x )|< ).The expectation of a RV X is also known as the expected value , the
mean , the rst moment or simply the average .
27
Expectations and transformations
8/11/2019 Probability Slides
28/84
Expectations and transformationsSuppose that X is a discrete RV and g : R R is sometransformation. We can check that Y = g (X ) is again a RV denedby Y ( ) = g (X )( ) = g (X ( )) .TheoremWe have that
E (g (X )) = x
g (x )P (X = x )
whenever the sum is absolutely convergent.
Proof.
E (g (X )) = E (Y ) = y g (Im(X ))
y P (Y = y )
= y g (Im(X ))
y x Im(X ):g (x )= y
P (X = x )
= x Im(X )
g (x )P (X = x )
28
8/11/2019 Probability Slides
29/84
First and second moments of random variables
8/11/2019 Probability Slides
30/84
First and second moments of random variablesJust as the expectation or mean, E(X ), is called the rst moment ofthe RV X , E(X 2 ) is called the second moment of X .The variance Var (X ) = E (X
E (X )) 2 is called the second central
moment of X since it measures the dispersion in the values of X centred about their mean value.Note that we have the following property where a , b R .
Var (aX + b ) = E (aX + b
E (aX + b )) 2
= E (aX + b a E (X ) b )2= E a 2 (X E (X )) 2= a 2Var (X ) .
30
Calculating variances
8/11/2019 Probability Slides
31/84
Calculating variancesNote that we can expand our expression for the variance where againwe use = E (X ) as follows
Var (X ) = x (x )2 P (X = x )=
x (x 2 2 x + 2 )P (X = x )
= x
x 2 P (X = x ) 2 x x P (X = x ) + 2
x P (X = x )
= E (X 2 ) 2 2 + 2= E (X 2 ) 2= E (X 2 ) (E (X )) 2 .
This useful result determines the second central moment of a RV X interms of the rst and second moments of X . This usually is the bestmethod to calculate the variance.
31
8/11/2019 Probability Slides
32/84
Bivariate random variables
8/11/2019 Probability Slides
33/84
Given a probability space (, F , P ), we may have two RVs, X and Y ,say. We can then use a joint probability mass function
P (X = x , Y = y ) = P ({ |X ( ) = x }{ |Y ( ) = y })for all x , y R .We can recover the individual probability mass functions for X and Y as follows
P (X = x ) = P ({ |X ( ) = x })= P y Im(Y ) ({ |X ( ) = x }{ |Y ( ) = y })=
y Im(Y )P (X = x , Y = y ) .
Similarly,P (Y = y ) =
x Im(X )P (X = x , Y = y ) .
33
Transformations of random variables
8/11/2019 Probability Slides
34/84
If g : R 2 R then we get a similar result to that obtained in theunivariate caseE (g (X , Y )) = x Im(X ) y Im(Y ) g (x , y )P (X = x , Y = y ) .
This idea can be extended to probability mass functions in themultivariate case with three or more RVs.The linear transformation occurs frequently and is givenby g (x , y ) = ax + by + c where a , b , c R . In this case we nd that
E (aX + bY + c ) = x
y
(ax + by + c )P (X = x , Y = y )
= a x
x P (X = x ) + b y
y P (Y = y ) + c
= a E (X ) + b E (Y ) + c .
34
Independence of random variables
8/11/2019 Probability Slides
35/84
pWe have dened independence for events and can use the sameidea for pairs of RVs X and Y .
DenitionTwo RVs X and Y are independent if { |X ( ) = x }and { |Y ( ) = y } are independent for all x , y R .Thus, if X and Y are independent
P (X = x , Y = y ) = P (X = x )P (Y = y ) .
If X and Y are independent discrete RV with expected values E(X )and E(Y ) respectively then
E (XY ) = x
y
xy P (X = x , Y = y )
= x
y
xy P (X = x )P (Y = y )
= x
x P (X = x ) y
y P (Y = y )
= E (X )E (Y ) .
35
Variance of sums of RVs and Covariance
8/11/2019 Probability Slides
36/84
Given a pair of RVs X and Y consider the variance of their sum X + Y
Var (X + Y ) = E ((X + Y )
E (X + Y )) 2
= E ((X E (X ))+( Y E (Y ))) 2= E ((X E (X )) 2 ) + 2E ((X E (X ))( Y E (Y )))+
E ((Y E (Y )) 2 )= Var(X ) + 2Cov (X , Y ) + Var (Y )
where the covariance of X and Y is given by
Cov (X , Y ) = E ((X E (X ))( Y E (Y )))= E (XY )
E (X )E (Y ) .
So, if X and Y are independent RV then E(XY ) = E (X )E (Y ) andso Cov (X , Y ) = 0 and we have that
Var (X + Y ) = Var(X ) + Var (Y ) .
Notice also that if Y = X then Cov (X , X ) = Var(X ). 36
8/11/2019 Probability Slides
37/84
8/11/2019 Probability Slides
38/84
Distribution functions
8/11/2019 Probability Slides
39/84
Given a probability space (, F , P ) we have so far considereddiscrete RVs that can take a countable number of values. Moregenerally, we dene X :
R as a random variable if
{ |X ( ) x } F x R .Note that a discrete random variable, X , is a random variable since
{
|X ( )
x
}= x Im(X ):x
x
{
|X ( ) = x
}F .
Denition (Distribution function)If X is a RV then the distribution function of X , written F X (x ), isdened by
F X (x ) = P ({ |X ( ) x }) = P (X x ) .
39
Properties of the distribution function
8/11/2019 Probability Slides
40/84
F X (x ) = P (X x )
x
F X (x )
0
1
1. If x y then F X (x ) F X (y ).2. If x then F X (x ) 0.3. If x
then F X (x )
1.
4. If a < b then P(a < X b ) = F X (b ) F X (a ).
40
Continuous random variables
8/11/2019 Probability Slides
41/84
Random variables that take just a countable number of values arecalled discrete . More generally, we have that a RV can be dened byits distribution function , F X (x ). A RV is said to be a continuousrandom variable when the distribution function has sufcientsmoothness that
F X (x ) = P (X x ) = x
f X (u )du
for some function f X (x ). We can then take
f X (x ) =dF X (x )
dx if the derivative exists at x 0 otherwise .
The function f X (x ) is called the probability density function of thecontinuous RV X or often just the density of X .The density function for continuous RVs plays the analogous rle tothe probability mass function for discrete RVs.
41
Properties of the density function
8/11/2019 Probability Slides
42/84
x
f X (x )
0
1. x R . f X (x ) 0.2.
f X (x )dx = 1.3. If a b then P(a X b ) =
b a f X (x )dx .
42
Examples of continuous random variables
8/11/2019 Probability Slides
43/84
We dene some common continuous RVs, X , by their densityfunctions, f X (x ).
Example (Uniform distribution, U (a
,b
))
x
f X (x )
0 a b
1(b a )
Given a R and b R with a < b then
f X (x ) = 1
(b a ) if a < x < b 0 otherwise .
RV, X Parameters Im (X ) Mean VarianceU (a , b ) a , b R (a , b ) a + b 2
(b a )212a < b
Notationally we writeX U (a , b ) .
43
Examples of continuous random variables, ctd
8/11/2019 Probability Slides
44/84
Example (Exponential distribution, Exp ( ))
x
f X (x )
0
1
= 1
Given > 0 then
f X (x ) = e x if x > 00 otherwise .
RV, X Parameters Im (X ) Mean VarianceExp ( ) > 0 R+ 1
1 2
Notationally we write X Exp ( ) .
44
Examples of continuous random variables, ctd
8/11/2019 Probability Slides
45/84
Example (Normal distribution, N ( , 2 ))
x
f X (x )
Given R and 2 > 0 thenf X (x ) =
1 2 2 e
(x )2 / (2 2 ) < x <
RV, X Parameters Im(X
) Mean Variance
N ( , 2 ) R R 2 2 > 0
Notationally we write
X N ( , 2
) .
45
Expectations of continuous random variables
8/11/2019 Probability Slides
46/84
Just as for discrete RVs we can dene the expectation of acontinuous RV with density function f X (x ) by a weighted averaging.
Denition (Expectation)The expectation of X is given by
E (X ) =
xf X (x )dx
whenever the integral exists.In a similar way to the discrete case we have that if g : R R then
E (g (X )) =
g (x )f X (x )dx
whenever the integral exists.
46
Variances of continuous random variables
8/11/2019 Probability Slides
47/84
Similarly, we can dene the variance of a continuous RV X .
Denition (Variance)
The variance , Var (X ), of a continuous RV X with densityfunction f X (x ) is dened as
Var (X ) = E (X E (X )) 2 =
(x )2 f X (x )dx
whenever the integral exists and where = E (X ).
Exercise: check that we again nd the useful result connecting thesecond central moment to the rst and second moments.
Var (X ) = E (X 2 ) (E (X )) 2 .
47
Example: exponential distribution, Exp ( )
8/11/2019 Probability Slides
48/84
Suppose that the RV X has an exponential distribution withparameter > 0 then using integration by parts
E (X ) =
0 x e x dx
= xe x
0+
0e x dx
= 0 + 1
0
e x dx = 1
and
E (X 2 ) =
0x 2 e x dx
= x 2e x
0 +
0 2xe x dx
= 0 + 2
0x e x dx = 2
2 .
Hence, Var (X ) = E (X 2 )
(E (X )) 2 = 2
2
( 1
)2 = 1
2 .
48
Bivariate continuous random variablesGi b bili ( F ) h l i l i
8/11/2019 Probability Slides
49/84
Given a probability space (, F , P ), we may have multiple continuousRVs, X and Y , say.
Denition (joint probability distribution function)The joint probability distribution function is given by
F X ,Y (x , y ) = P ({ |X ( ) x }{ |Y ( ) y })= P (X x , Y y )
for all x , y R .Independence follows in a similar way to the discrete case and wesay that two continuous RVs X and Y are independent if
F X ,Y (x , y ) = F X (x )F Y (y )
for all x , y R .
49
Bivariate density functionsTh bi i d i f i RV X d Y i
8/11/2019 Probability Slides
50/84
The bivariate density of two continuous RVs X and Y satises
F X ,Y (x , y ) =
x
u = y
v = f X ,Y (u , v )dudv
and is given by
f X ,Y (x , y ) = 2 x y F X ,Y (x , y ) if the derivative exists at (x , y )0 otherwise .
We have thatf X ,Y (x , y ) 0 x , y R
and that
f X ,Y
(x , y )dxdy = 1 .
50
Marginal densities and independenceIf X d Y h j i t d it f ti f ( ) th h
8/11/2019 Probability Slides
51/84
If X and Y have a joint density function f X ,Y (x , y ) then we havemarginal densities
f X (x ) =
v = f X ,Y (x , v )dv
andf Y (y ) =
u = f X ,Y (u , y )du .
In the case that X and Y are also independent then
f X ,Y (x , y ) = f X (x )f Y (y )
for all x , y R .
51
Conditional density functionsThe marginal density f (y ) tells us about the variation of the RV Y
8/11/2019 Probability Slides
52/84
The marginal density f Y (y ) tells us about the variation of the RV Y when we have no information about the RV X . Consider the oppositeextreme when we have full information about X , namely, that X = x ,
say. We can not evaluate an expression like
P (Y y |X = x )directly since for a continuous RV P(X = x ) = 0 and our denition ofconditional probability does not apply.Instead, we rst evaluate P(Y y |x X x + x ) for any x > 0.We nd that
P (Y y |x X x + x ) = P(Y y , x X x + x )
P (x X x + x )= x + x u = x y v = f X ,Y (u , v )dudv
x + x u = x f X (u )du
.
52
Conditional density functions, ctdNow divide the numerator and denominator by x and take the limit
8/11/2019 Probability Slides
53/84
Now divide the numerator and denominator by x and take the limitas x 0 to give
P (Y y |x X x + x ) y
v = f X ,Y (x , v )
f X (x ) dv = G (y ), say
where G (y ) is a distribution function with corresponding density
g (y ) = f X ,Y (x , y )
f X (x ) .Accordingly, we dene the notion of a conditional density function asfollows.
Denition
The conditional density function of Y given X = x is dened as
f Y |X (y |x ) = f X ,Y (x , y )
f X (x )
dened for all y R and x R such that f X
(x ) > 0.53
8/11/2019 Probability Slides
54/84
Probability generating functions
54
Probability generating functionsA very common situation is when a RV X can take only non-negative
8/11/2019 Probability Slides
55/84
A very common situation is when a RV, X , can take only non-negativeinteger values, that is Im (X ) {0, 1, 2, . . .}. The probability massfunction, P(X = k ), is given by a sequence of values p 0 , p 1 , p 2 , . . .where p k = P (X = k ) k {0, 1, 2, . . .}and we have that
p k 0 k {0, 1, 2, . . .} and
k = 0
p k = 1 .
The terms of this sequence can be wrapped together to dene acertain function called the probability generating function (PGF).
Denition (Probability generating function)The probability generating function , G X (z ), of a (non-negative
integer-valued) RV X is dened as
G X (z ) =
k = 0
p k z k
for all values of z such that the sum converges appropriately.55
Elementary properties of the PGF1 GX (z ) = k 0 pk z
k so
8/11/2019 Probability Slides
56/84
1. G X (z ) k = 0 p k z so
G X (0) = p 0 and G X (1) = 1 .
2. If g (t ) = z t then
G X (z ) =
k = 0
p k z k =
k = 0
g (k )P (X = k ) = E (g (X )) = E (z X ) .
3. The PGF is dened for all |z | 1 since
k = 0 |p k z k |
k = 0
p k = 1 .
4. Importantly, the PGF characterizes the distribution of a RV in the
sense thatG X (z ) = G Y (z ) z
if and only if
P (X = k ) = P (Y = k ) k {0, 1, 2, . . .}.56
Examples of PGFs
8/11/2019 Probability Slides
57/84
Example (Bernoulli distribution)
G X
(z ) = q + pz where q = 1
p .
Example (Binomial distribution, Bin (n , p ))
G X (z ) =n
k = 0n k p k (q )n k z k = ( q + pz )n where q = 1 p .
Example (Geometric distribution, Geo (p ))
G X (z ) =
k = 1
pq k 1z k = pz
k = 0
(qz )k = pz 1 qz
if |z |< q 1 and q = 1 p .
57
Examples of PGFs, ctd
E l (U if di ib i U(1 ))
8/11/2019 Probability Slides
58/84
Example (Uniform distribution, U (1, n ))
G X (z ) =n
k = 1 z k 1n =
z n
n 1k = 0 z
k = z n
(1
z n )
(1 z ) .
Example (Poisson distribution, Pois ( ))
G X (z ) =
k = 0
k e k ! z k = e z e = e (z 1) .
58
8/11/2019 Probability Slides
59/84
Further derivatives of the PGFTaking the second derivative gives
8/11/2019 Probability Slides
60/84
g g
G X (z ) =
k = 0
k (k
1)z k 2 P (X = k ) .
So that,
G X (1) =
k = 0
k (k 1)P (X = k ) = E (X (X 1))
Generally, we have the following result.TheoremIf the RV X has PGF G X (z ) then the r-th derivative of the PGF,written G (r )X (z ), evaluated at z = 1 is such that
G (r )X (1) = E (X (X 1) (X r + 1)) .
60
8/11/2019 Probability Slides
61/84
Sums of independent random variablesThe following theorem shows how PGFs can be used to nd the PGF
8/11/2019 Probability Slides
62/84
of the sum of independent RVs.
TheoremIf X and Y are independent RVs with PGFs G X (z ) and G Y (z )respectively then
G X + Y (z ) = G X (z )G Y (z ) .
Proof.Using the independence of X and Y we have that
G X + Y (z ) = E (z X + Y )
= E (z X z Y )
= E (z X
)E (z Y
)= G X (z )G Y (z )
62
PGF example: Poisson RVsFor example, suppose that X and Y are independent RVs
8/11/2019 Probability Slides
63/84
with X Pois ( 1 ) and Y Pois ( 2 ), respectively.ThenG X + Y (z ) = G X (z )G Y (z )
= e 1 (z 1)e 2 (z 1)
= e ( 1 + 2 )( z 1) .
Hence X + Y Pois ( 1 + 2 ) is again a Poisson RV but with theparameter 1 + 2 .
63
PGF example: Uniform RVsConsider the case of two fair dice with IID
8/11/2019 Probability Slides
64/84
outcomes X and Y , respectively, so that X U (1, 6)and Y U (1, 6). Let the total score be Z = X + Y and consider the probability generating functionof Z given by G Z (z ) = G X (z )G Y (z ). Then
G Z (z ) = 16
(z + z 2 + + z 6 )16
(z + z 2 + + z 6 )
= 136 [z
2+ 2z
3+ 3z
4+ 4z
5+ 5z
6+ 6z
7+
5z 8 + 4z 9 + 3z 10 + 2z 11 + z 12 ].
k
P (X = k )16
1 2 3 4 5 6k
P (Y = k )16
1 2 3 4 5 6k
P (Z = k )16 =
636
336
2 3 4 5 6 7 8 9 101112
64
8/11/2019 Probability Slides
65/84
Elementary stochastic processes
65
Random walksConsider a sequence Y 1 , Y 2 , . . . of independent and identicallyd b d ( ) h ( ) d ( )
8/11/2019 Probability Slides
66/84
distributed (IID) RVs with P(Y i = 1) = p and P(Y i = 1) = 1 p with p [0, 1].Denition (Simple random walk)The simple random walk is a sequence of RVs {X n |n {1, 2, . . .}}dened by
X n = X 0 + Y 1 + Y 2 + + Y n where X 0
R is the starting value.
Denition (Simple symmetric random walk)A simple symmetric random walk is a simple random walk with thechoice p = 1/ 2.
n
X n
0
X 0
1 2 3 4 5 6 7 8 9
E.g. X 0 = 2 & (Y 1 , Y 2 , . . . , Y 9 , . . .) = ( 1 ,1,1 ,1,1, 1, 1 , 1,1 , . . .)
66
ExamplesPractical examples of random walks abound
h h i l i ( i f i
8/11/2019 Probability Slides
67/84
across the physical sciences (motion of atomicparticles) and the non-physical sciences
(epidemics, gambling, asset prices).
The following is a simple model for the operation of a casino.Suppose that a gambler enters with a capital of X 0 . At each stagethe gambler places a stake of 1 and with probability p wins thegamble otherwise the stake is lost. If the gambler wins the stake isreturned together with an additional sum of 1.Thus at each stage the gamblers capital increases by 1 withprobability p or decreases by 1 with probability 1 p .The gamblers capital X n at stage n thus follows a simple randomwalk except that the gambler is bankrupt if X n reaches 0 and thencan not continue to any further stages.
67
Returning to the starting state for a simple randomwalk
8/11/2019 Probability Slides
68/84
Let X n be a simple random walk and
r n = P (X n = X 0 ) for n = 1, 2, . . .
the probability of returning to the starting state at time n .We will show the following theorem.
Theorem
If n is odd then r n = 0 else if n = 2m is even then
r 2m =2m m
p m (1 p )m .
68
Proof.The position of the random walk will change by an amount
8/11/2019 Probability Slides
69/84
X n X 0 = Y 1 + Y 2 + + Y n between times 0 and n . Hence, for this change X n X 0 to be 0 theremust be an equal number of up steps as down steps. This can neverhappen if n is odd and so r n = 0 in this case. If n = 2m is even thennote that the number of up steps in a total of n steps is a binomial RVwith parameters 2 m and p . Thus,
r 2m = P (X n X 0 = 0) =2m m
p m (1 p )m .
This result tells us about the probability of returning to the startingstate at a given time n .We will now look at the probability that we ever return to our startingstate. For convenience, and without loss of generality, we shall takeour starting value as X 0 = 0 from now on.
69
Recurrence and transience of simple randomwalks
8/11/2019 Probability Slides
70/84
Note rst that E(Y i ) = p (1 p ) = 2p 1 for each i {1, 2, . . .}. Thusthere is a net drift upwards if p > 1/ 2 and a net drift downwardsif p < 1/ 2. Only in the case p = 1/ 2 is there no net drift upwards nordownwards.We say that the simple random walk is recurrent if it is certain torevisit its starting state at some time in the future and transientotherwise.
We shall prove the following theorem.TheoremFor a simple random walk with starting state X 0 = 0 the probability of revisiting the starting state is
P (X n = 0 for some n {1, 2, . . .}) = 1 |2p 1|.Thus a simple random walk is recurrent only when p = 1/ 2.
70
8/11/2019 Probability Slides
71/84
Proof, ctdDene generating functions for the sequences r n and f n by
8/11/2019 Probability Slides
72/84
R (z ) =
n = 0
r n z n and F (z ) =
n = 0
f n z n
where r 0 = 1 and f 0 = 0 and take |z |< 1. We have that
n = 1
r n z n =
n = 1
n
m = 1
f m r n m z n
=
m = 1
n = m
f m z m r n m z n m
=
m = 1
f m z m
k = 0
r k z k
= F (z )R (z ) .
The left hand side is R (z ) r 0z 0 = R (z ) 1 thus we have thatR (z ) = R (z )F (z ) + 1 if |z |< 1.
72
Proof, ctdNow,
8/11/2019 Probability Slides
73/84
R (z ) =
n = 0
r n z n
=
m = 0
r 2m z 2m as r n = 0 if n is odd
=
m = 0
2m m
(p (1 p )z 2 )m
= ( 1 4p (1 p )z 2 )12 .
The last step follows from the binomial series expansionof (1 4 )
12 and the choice = p (1 p )z 2 .
Hence, F (z ) = 1 (1 4p (1 p )z 2 )12 for |z |< 1 .
73
8/11/2019 Probability Slides
74/84
Mean return timeConsider the recurrent case when p = 1/ 2 and set
8/11/2019 Probability Slides
75/84
T = min{n 1 |X n = 0} so that P(T = n ) = f n where T is the time of the rst return to the starting state. Then
E (T ) =
n = 1
nf n
= G T (1)
where G T (z ) is the PGF of the RV T and for p = 1/ 2 we havethat 4 p (1 p ) = 1 so
G T (z ) = 1 (1 z 2 )12
so thatG T (z ) = z (1 z 2 )
12 as z 1 .
Thus, the simple symmetric random walk ( p = 1/ 2) is recurrent butthe expected time to rst return to the starting state is innite .
75
The Gamblers ruin problemWe now consider a variant of the simple random walk. Consider twoplayers A and B with a joint capital between them of N. Suppose that
8/11/2019 Probability Slides
76/84
p y j p ppinitially A has X 0 = a (0 a N ).At each time step player B gives A 1 with probability p and withprobability q = ( 1 p ) player A gives 1 to B instead. The outcomesat each time step are independent.The game ends at the rst time T a if either X T a = 0 or X T a = N forsome T a {0, 1, . . .}.We can think of As wealth, X n , at time n as a simple random walk onthe states {0, 1, . . . , N } with absorbing barriers at 0 and N .Dene the probability of ruin for gambler A as
a = P (A is ruined ) = P (B wins ) for 0 a N .
n
X n
0
N = 5
X 0 = 2
1 2 3 4 5 6 7 8 9
E.g. N = 5, X 0 = a = 2 & (Y 1 , Y 2 , Y 3 , Y 4 ) = ( 1,
1,
1,
1)
T 2 = 4 & X T 2 = X 4 = 0
76
Solution of the Gamblers ruin problemTheoremThe probability of ruin when A starts with an initial capital of a is given
8/11/2019 Probability Slides
77/84
The probability of ruin when A starts with an initial capital of a is given by
a =
a
N
1 N if p = q 1 a N if p = q = 1/ 2where = q / p.For illustration here is a set of graphs of a for N = 100 and three
possible choices of p .
a
a
10 20 30 40 50 60 70 80 90 1000
1
0.75
0 .5
0.25
p = 0.49
p = 0.5
p = 0.51
77
ProofConsider what happens at the rst time step
8/11/2019 Probability Slides
78/84
a = P (ruin Y 1 = + 1|X 0 = a ) + P (ruin Y 1 = 1|X 0 = a )= p P (ruin |X 0 = a + 1) + q P (ruin |X 0 = a 1)= p a + 1 + q a 1
Now look for a solution to this difference equation of the form a withboundary conditions 0 = 1 and N = 0.Try a solution of the form a = a to give
a = p a + 1 + q a 1
Hence,p 2 + q = 0
with solutions = 1 and = q / p .
78
Proof, ctdIf p = q there are two distinct solutions and the general solution of thedifference equation is of the form A + B (q / p )a .
8/11/2019 Probability Slides
79/84
Applying the boundary conditions
1 = 0 = A + B and 0 = N = A + B (q / p )N
we getA = B (q / p )N
and1 = B B (q / p )N
so
B = 1
1 (q / p )N and A = (q / p )N
1 (q / p )N .
Hence,a =
(q / p )a (q / p )N 1 (q / p )N
.
79
Proof, ctdIf p = q = 1/ 2 then the general solution is C + Da .So with the boundary conditions
8/11/2019 Probability Slides
80/84
1 = 0 = C + D (0) and 0 = N = C + D (N ) .
Therefore,C = 1 and 0 = 1 + D (N )
soD =
1/ N
anda = 1 a / N .
80
Mean duration timeSet T a as the time to be absorbed at either 0 or N starting from theinitial state a and write a = E (T a ).
8/11/2019 Probability Slides
81/84
Then, conditioning on the rst step as before
a = 1 + p a + 1 + q a 1 for 1 a N 1and 0 = N = 0.It can be shown that a is given by
a = 1p
q N
(q / p )a 1(q / p )N
1 a if p = q
a (N a ) if p = q = 1/ 2 .We skip the proof here but note the following cases can be used toestablish the result.Case p = q : trying a particular solution of the form a = ca showsthat c = 1/ (q
p ) and the general solution is then of the
form a = A + B (q / p )a + a / (q p ). Fixing the boundary conditionsgives the result.Case p = q = 1/ 2: now the particular solution is a 2 so the generalsolution is of the form a = A + Ba a 2 and xing the boundaryconditions gives the result.
81
Properties of discrete RVsRV, X Parameters Im (X ) P(X = k ) E(X ) Var(X ) G X (z )
Bernoulli p [0, 1]
{0 , 1} (1 p ) if k = 0 or p if k = 1 p p (1 p ) (1 p + pz )Bi ( ) {1 2 } {0 1 } n k (1 )n k (1 ) (1 )n
8/11/2019 Probability Slides
82/84
Bin(n , p ) n {1, 2 , . . .} {0 , 1, . . . , n } n k p k (1 p )n k np np (1 p ) (1 p + pz )n p [0, 1]Geo (p ) 0 < p
1
{1 , 2, . . .
} p (1
p )k 1 1p 1p
p 2
pz 1
(1
p )z
U (1 , n ) n {1, 2 , . . .} {1 , 2, . . . , n } 1n n + 12 n 2112 z (1z
n )n (1z )
Pois ( ) > 0 {0 , 1, . . .} k e
k ! e (z 1)
82
Properties of continuous RVsRV, X Parameters Im (X ) f X (x ) E(X ) Var(X )
1 a+ b (b a )2
8/11/2019 Probability Slides
83/84
U (a , b ) a , b R (a , b ) 1b a a + b
2(b a )12
a < b Exp ( ) > 0 R+ e x 1 1 2N ( , 2 ) R R 1 2 2 e (x )
2 / (2 2 ) 2
2 > 0
83
Notation sample space of possible outcomes F event space : set of random events E I(E ) indicator function of the event E F
8/11/2019 Probability Slides
84/84
I(E ) indicator function of the event E F P (E ) probability that event E occurs, e.g. E ={
X = k
}RV random variableX U (0, 1) RV X has the distribution U (0, 1)P (X = k ) probability mass function of RV X F X (x ) distribution function , F X (x ) = P (X x )f X (x ) density of RV X given, when it exists, by F X (x )
PGF probability generating function G X (z ) for RV X E (X ) expected value of RV X
E (X n ) n th moment of RV X , for n = 1, 2, . . .Var (X ) variance of RV X
IID independent, identically distributed
X n sample mean of random sample X 1 , X 2 , . . . , X n
84