Probability Slides

Embed Size (px)

Citation preview

  • 8/11/2019 Probability Slides

    1/84

    ProbabilityComputer Science Tripos, Part IA

    R.J. Gibbens

    Computer LaboratoryUniversity of Cambridge

    Lent Term 2010/11

    Last revision: 2010-12-17/r-43

    1

  • 8/11/2019 Probability Slides

    2/84

    Outline Elementary probability theory (2 lectures)

    Probability spaces, random variables, discrete/continuousdistributions, means and variances, independence, conditionalprobabilities, Bayess theorem.

    Probability generating functions (1 lecture)Denitions and properties; use in calculating moments of randomvariables and for nding the distribution of sums of independentrandom variables.

    Multivariate distributions and independence (1 lecture)Random vectors and independence; joint and marginal densityfunctions; variance, covariance and correlation; conditional densityfunctions.

    Elementary stochastic processes (2 lectures)Simple random walks; recurrence and transience; the GamblersRuin Problem and solution using difference equations.

    2

  • 8/11/2019 Probability Slides

    3/84

  • 8/11/2019 Probability Slides

    4/84

    Elementary probability theory

    4

  • 8/11/2019 Probability Slides

    5/84

  • 8/11/2019 Probability Slides

    6/84

    Event spacesWe formalize the notion of an event space , F , by requiring thefollowing to hold.

    Denition (Event space)1. F is non-empty2. E F \E F 3. ( i I . E i F ) i I E i F

    Example any set and F = P (), the power set of .

    Example

    any set with some event E and F = { / 0 , E , \E , }.Note that \E is often written using the shorthand E c for thecomplement of E with respect to .

    6

  • 8/11/2019 Probability Slides

    7/84

    Probability spacesGiven an experiment with outcomes in a sample space with anevent space F we associate probabilities to events by dening aprobability function P : F

    R as follows.

    Denition (Probability function)

    1. E F . P (E ) 02. P() = 1 and P( / 0 ) = 0

    3. E i F for i I disjoint (that is, E i E j = / 0 for i = j ) thenP ( i I E i ) =

    i I P (E i ) .

    We call the triple (, F , P ) a probability space .

    7

  • 8/11/2019 Probability Slides

    8/84

    Examples of probability spaces any set with some event E (E = / 0 , E = ) .Take F = { / 0 , E , \E , } as before and dene the probabilityfunction P(E )by

    P (E ) =

    0 E = / 0p E = E 1 p E = \E 1 E =

    for any 0 p 1. = { 1 , 2 , . . . , n } with F = P () and probabilities given forall E F by

    P (E ) = |E

    |n . For a six-sided fair die = {1, 2, 3, 4, 5, 6} we take

    P ({i }) = 16

    .

    8

  • 8/11/2019 Probability Slides

    9/84

  • 8/11/2019 Probability Slides

    10/84

    Conditional probabilities

    E 1 E 2

    Given a probability space (, F , P ) andtwo events E 1 , E 2 F how doesknowledge that the random event E

    2, say,

    has occurred inuence the probabilitythat E 1 has also occurred?This question leads to the notion ofconditional probability .

    Denition (Conditional probability)If P(E 2 ) > 0, dene the conditional probability , P(E 1|E 2 ), of E 1given E 2 by

    P (E 1

    |E 2 ) =

    P(E 1 E 2 )P

    (E

    2 )

    .

    Note that P(E 2|E 2 ) = 1.Exercise: check that for any E F such that P(E ) > 0then (, F , Q ) is a probability space where Q : F R is dened byQ

    (E ) =P

    (E |E ) E F

    .10

  • 8/11/2019 Probability Slides

    11/84

    Independent eventsGiven a probability space (, F , P ) we can dene independencebetween random events as follows.

    Denition (Independent events)Two events, E 1 , E 2 F are independent if

    P (E 1 E 2 ) = P (E 1 )P (E 2 )

    Otherwise, the events are dependent . Note that if E 1 and E 2 areindependent events then

    P (E 1 |E 2 ) = P (E 1 )P (E 2 |E 1 ) = P (E 2 ) .

    11

  • 8/11/2019 Probability Slides

    12/84

    Independence of multiple eventsMore generally, a collection of events {E i |i I } are independentevents if for all subsets J of I

    P ( j J E j ) = j J P (E j ) .

    When this holds just for all those subsets J such that |J |= 2 we havepairwise independence .Note that pairwise independence does not imply independence(unless |I |= 2).

    12

  • 8/11/2019 Probability Slides

    13/84

    Venn diagramsJohn Venn 18341923

    Example ( |I |= 3 events)E 1 , E 2 , E 3 are independent events if

    P (E 1 E 2 ) = P (E 1 )P (E 2 )P (E 1 E 3 ) = P (E 1 )P (E 3 )P (E 2 E 3 ) = P (E 2 )P (E 3 )

    P (E 1 E 2 E 3 ) = P (E 1 )P (E 2 )P (E 3 )13

  • 8/11/2019 Probability Slides

    14/84

    Bayes theoremThomas Bayes (17021761)

    Theorem (Bayes theorem)If E 1 and E 2 are two events with P(E 1 ) > 0 and P(E 2 ) > 0 then

    P (E 1|E 2 ) = P(E 2|E 1 )P (E 1 )

    P (E 2)

    .

    Proof.We have that

    P (E 1|E 2 )P (E 2 ) = P (E 1 E 2 ) = P (E 2 E 1 ) = P (E 2|E 1 )P (E 1 ) .

    Thus Bayes theorem provides a way to reverse the order ofconditioning.

    14

  • 8/11/2019 Probability Slides

    15/84

    Partitions

    E 1 E 2

    E 3 E 4

    E 5

    Given a probability space (, F , P ) dene apartition of as follows.

    Denition (Partition)A partition of is a collection of disjointevents {E i F |i I } with

    i I E i = .

    We then have the following theorem (a.k.a. the law of totalprobability ).

    Theorem (Partition theorem)If {E i F |i I } is a partition of and P(E i ) > 0 for all i I then

    P (E ) = i I

    P (E |E i )P (E i ) E F .

    15

  • 8/11/2019 Probability Slides

    16/84

    Proof of partition theoremWe prove the partition theorem as follows.

    Proof.

    P (E ) = P (E ( i I E i ))= P ( i I (E E i ))=

    i I

    P (E

    E i )

    = i I

    P (E |E i )P (E i )

    16

  • 8/11/2019 Probability Slides

    17/84

    Bayes theorem and partitionsA (slight) generalization of Bayes theorem can be stated as followscombining Bayes theorem with the partition theorem.

    P (E i |E ) = P(E |E i )P (E i )

    j I P (E |E j )P (E j ) i I

    where {E i F |i I } forms a partition of .

    E 1

    E 2 = \E 1As a special case consider thepartition {E 1 , E 2 = \E 1}.

    Then we have

    P (E 1|E ) = P(E |E 1 )P (E 1 )

    P (E

    |E 1 )P (E 1 ) + P (E

    |

    \E 1 )P (

    \E 1 )

    .

    17

  • 8/11/2019 Probability Slides

    18/84

    Bayes theorem exampleSuppose that you have a good game of tablefootball two times in three, otherwise a poorgame.Your chance of scoring a goal is 3/4 in a goodgame and 1/4 in a poor game.

    What is your chance of scoring a goal in any given game?Conditional on having scored in a game, what is the chance that youhad a good game?So we know that

    P(Good ) = 2/ 3,

    P(Poor ) = 1/ 3, P(Score |Good ) = 3/ 4, P(Score |Poor ) = 1/ 4.

    18

    h l d

  • 8/11/2019 Probability Slides

    19/84

    Bayes theorem example, ctdThus, noting that {Good , Poor } forms a partition of the sample spaceof outcomes,

    P (Score ) = P (Score |Good )P (Good ) + P (Score |Poor )P (Poor )= ( 3/ 4) (2/ 3) + ( 1/ 4) (1/ 3) = 7/ 12 .

    Then by Bayes theorem we have that

    P (Good |Score ) = P(Score |Good )P (Good )P (Score ) = (3/ 4) (2/ 3)(7/ 12 ) = 6/ 7.

    19

  • 8/11/2019 Probability Slides

    20/84

  • 8/11/2019 Probability Slides

    21/84

    E l f di t di t ib ti

  • 8/11/2019 Probability Slides

    22/84

    Examples of discrete distributions

    Example (Bernoulli distribution)

    k

    P (X = k )

    0

    0.25

    0 .5

    0.75

    1

    0 1

    e.g. p = 0.75

    Here Im (X ) =

    {0, 1

    } and for given p [0, 1]

    P (X = k ) =p k = 11 p k = 0 .

    RV, X Parameters Im (X ) Mean VarianceBernoulli p [0, 1] {0, 1} p p (1 p )

    22

    E l f di t di t ib ti td

  • 8/11/2019 Probability Slides

    23/84

    Examples of discrete distributions, ctd

    Example (Binomial distribution, Bin (n , p ))

    k

    P (X = k )0 .3

    0 .2

    0 .1

    00 1 2 3 4 5 6 7 8 9 10

    e.g. n = 10 , p = 0.5

    Here Im (X ) =

    {0, 1, . . . , n

    } for some positive

    integer n and given p [0, 1]

    P (X = k ) =n k

    p k (1 p )n k k {0, 1, . . . , n }.

    RV, X Parameters Im (X ) Mean VarianceBin(n , p ) n {1, 2, . . .} {0, 1, . . . , n } np np (1 p )p [0, 1]

    We use the notationX Bin(n , p )

    as a shorthand for the statement that the RV X is distributedaccording to stated Binomial distribution. We shall use this shorthand

    notation for our other named distributions.23

    Examples of discrete distributions ctd

  • 8/11/2019 Probability Slides

    24/84

    Examples of discrete distributions, ctd

    Example (Geometric distribution, Geo (p ))

    k

    P (X = k )1 .0

    0.75

    0 .5

    0.25

    01 2 3 4 5 6 7 8 9 10

    e.g. p = 0.75

    Here Im (X ) =

    {1, 2, . . .

    } and 0 < p

    1

    P (X = k ) = p (1 p )k 1 k {1, 2, . . .}.

    RV, X Parameters Im (X ) Mean VarianceGeo (p ) 0 < p 1 {1, 2, . . .} 1p 1p p 2

    Notationally we writeX Geo (p ) .

    Beware possible confusion : some authors prefer to dene our X 1as a Geometric RV!

    24

  • 8/11/2019 Probability Slides

    25/84

    Examples of discrete distributions ctd

  • 8/11/2019 Probability Slides

    26/84

    Examples of discrete distributions, ctd

    Example (Poisson distribution, Pois ( ))

    k

    P (X = k )0 .4

    0 .3

    0 .2

    0 .1

    00 1 2 3 4 5 6 7 8 9 10

    = 1

    = 5

    Here Im (X ) =

    {0, 1, . . .

    } and > 0

    P (X = k ) = k e

    k ! k {0, 1, . . .}.

    RV, X Parameters Im (X ) Mean VariancePois ( ) > 0 {0, 1, . . .}

    Notationally we write X Pois ( ) .

    26

    Expectation

  • 8/11/2019 Probability Slides

    27/84

    ExpectationOne way to summarize the distribution of some RV, X , would be toconstruct a weighted average of the observed values, weighted bythe probabilities of actually observing these values. This is the idea ofexpectation dened as follows.

    Denition (Expectation)The expectation , E(X ), of a discrete RV X is dened as

    E (X ) = x Im(X )

    x P (X = x )

    so long as this sum is (absolutely) convergent (that is,x Im(X ) |x P (X = x )|< ).The expectation of a RV X is also known as the expected value , the

    mean , the rst moment or simply the average .

    27

    Expectations and transformations

  • 8/11/2019 Probability Slides

    28/84

    Expectations and transformationsSuppose that X is a discrete RV and g : R R is sometransformation. We can check that Y = g (X ) is again a RV denedby Y ( ) = g (X )( ) = g (X ( )) .TheoremWe have that

    E (g (X )) = x

    g (x )P (X = x )

    whenever the sum is absolutely convergent.

    Proof.

    E (g (X )) = E (Y ) = y g (Im(X ))

    y P (Y = y )

    = y g (Im(X ))

    y x Im(X ):g (x )= y

    P (X = x )

    = x Im(X )

    g (x )P (X = x )

    28

  • 8/11/2019 Probability Slides

    29/84

    First and second moments of random variables

  • 8/11/2019 Probability Slides

    30/84

    First and second moments of random variablesJust as the expectation or mean, E(X ), is called the rst moment ofthe RV X , E(X 2 ) is called the second moment of X .The variance Var (X ) = E (X

    E (X )) 2 is called the second central

    moment of X since it measures the dispersion in the values of X centred about their mean value.Note that we have the following property where a , b R .

    Var (aX + b ) = E (aX + b

    E (aX + b )) 2

    = E (aX + b a E (X ) b )2= E a 2 (X E (X )) 2= a 2Var (X ) .

    30

    Calculating variances

  • 8/11/2019 Probability Slides

    31/84

    Calculating variancesNote that we can expand our expression for the variance where againwe use = E (X ) as follows

    Var (X ) = x (x )2 P (X = x )=

    x (x 2 2 x + 2 )P (X = x )

    = x

    x 2 P (X = x ) 2 x x P (X = x ) + 2

    x P (X = x )

    = E (X 2 ) 2 2 + 2= E (X 2 ) 2= E (X 2 ) (E (X )) 2 .

    This useful result determines the second central moment of a RV X interms of the rst and second moments of X . This usually is the bestmethod to calculate the variance.

    31

  • 8/11/2019 Probability Slides

    32/84

    Bivariate random variables

  • 8/11/2019 Probability Slides

    33/84

    Given a probability space (, F , P ), we may have two RVs, X and Y ,say. We can then use a joint probability mass function

    P (X = x , Y = y ) = P ({ |X ( ) = x }{ |Y ( ) = y })for all x , y R .We can recover the individual probability mass functions for X and Y as follows

    P (X = x ) = P ({ |X ( ) = x })= P y Im(Y ) ({ |X ( ) = x }{ |Y ( ) = y })=

    y Im(Y )P (X = x , Y = y ) .

    Similarly,P (Y = y ) =

    x Im(X )P (X = x , Y = y ) .

    33

    Transformations of random variables

  • 8/11/2019 Probability Slides

    34/84

    If g : R 2 R then we get a similar result to that obtained in theunivariate caseE (g (X , Y )) = x Im(X ) y Im(Y ) g (x , y )P (X = x , Y = y ) .

    This idea can be extended to probability mass functions in themultivariate case with three or more RVs.The linear transformation occurs frequently and is givenby g (x , y ) = ax + by + c where a , b , c R . In this case we nd that

    E (aX + bY + c ) = x

    y

    (ax + by + c )P (X = x , Y = y )

    = a x

    x P (X = x ) + b y

    y P (Y = y ) + c

    = a E (X ) + b E (Y ) + c .

    34

    Independence of random variables

  • 8/11/2019 Probability Slides

    35/84

    pWe have dened independence for events and can use the sameidea for pairs of RVs X and Y .

    DenitionTwo RVs X and Y are independent if { |X ( ) = x }and { |Y ( ) = y } are independent for all x , y R .Thus, if X and Y are independent

    P (X = x , Y = y ) = P (X = x )P (Y = y ) .

    If X and Y are independent discrete RV with expected values E(X )and E(Y ) respectively then

    E (XY ) = x

    y

    xy P (X = x , Y = y )

    = x

    y

    xy P (X = x )P (Y = y )

    = x

    x P (X = x ) y

    y P (Y = y )

    = E (X )E (Y ) .

    35

    Variance of sums of RVs and Covariance

  • 8/11/2019 Probability Slides

    36/84

    Given a pair of RVs X and Y consider the variance of their sum X + Y

    Var (X + Y ) = E ((X + Y )

    E (X + Y )) 2

    = E ((X E (X ))+( Y E (Y ))) 2= E ((X E (X )) 2 ) + 2E ((X E (X ))( Y E (Y )))+

    E ((Y E (Y )) 2 )= Var(X ) + 2Cov (X , Y ) + Var (Y )

    where the covariance of X and Y is given by

    Cov (X , Y ) = E ((X E (X ))( Y E (Y )))= E (XY )

    E (X )E (Y ) .

    So, if X and Y are independent RV then E(XY ) = E (X )E (Y ) andso Cov (X , Y ) = 0 and we have that

    Var (X + Y ) = Var(X ) + Var (Y ) .

    Notice also that if Y = X then Cov (X , X ) = Var(X ). 36

  • 8/11/2019 Probability Slides

    37/84

  • 8/11/2019 Probability Slides

    38/84

    Distribution functions

  • 8/11/2019 Probability Slides

    39/84

    Given a probability space (, F , P ) we have so far considereddiscrete RVs that can take a countable number of values. Moregenerally, we dene X :

    R as a random variable if

    { |X ( ) x } F x R .Note that a discrete random variable, X , is a random variable since

    {

    |X ( )

    x

    }= x Im(X ):x

    x

    {

    |X ( ) = x

    }F .

    Denition (Distribution function)If X is a RV then the distribution function of X , written F X (x ), isdened by

    F X (x ) = P ({ |X ( ) x }) = P (X x ) .

    39

    Properties of the distribution function

  • 8/11/2019 Probability Slides

    40/84

    F X (x ) = P (X x )

    x

    F X (x )

    0

    1

    1. If x y then F X (x ) F X (y ).2. If x then F X (x ) 0.3. If x

    then F X (x )

    1.

    4. If a < b then P(a < X b ) = F X (b ) F X (a ).

    40

    Continuous random variables

  • 8/11/2019 Probability Slides

    41/84

    Random variables that take just a countable number of values arecalled discrete . More generally, we have that a RV can be dened byits distribution function , F X (x ). A RV is said to be a continuousrandom variable when the distribution function has sufcientsmoothness that

    F X (x ) = P (X x ) = x

    f X (u )du

    for some function f X (x ). We can then take

    f X (x ) =dF X (x )

    dx if the derivative exists at x 0 otherwise .

    The function f X (x ) is called the probability density function of thecontinuous RV X or often just the density of X .The density function for continuous RVs plays the analogous rle tothe probability mass function for discrete RVs.

    41

    Properties of the density function

  • 8/11/2019 Probability Slides

    42/84

    x

    f X (x )

    0

    1. x R . f X (x ) 0.2.

    f X (x )dx = 1.3. If a b then P(a X b ) =

    b a f X (x )dx .

    42

    Examples of continuous random variables

  • 8/11/2019 Probability Slides

    43/84

    We dene some common continuous RVs, X , by their densityfunctions, f X (x ).

    Example (Uniform distribution, U (a

    ,b

    ))

    x

    f X (x )

    0 a b

    1(b a )

    Given a R and b R with a < b then

    f X (x ) = 1

    (b a ) if a < x < b 0 otherwise .

    RV, X Parameters Im (X ) Mean VarianceU (a , b ) a , b R (a , b ) a + b 2

    (b a )212a < b

    Notationally we writeX U (a , b ) .

    43

    Examples of continuous random variables, ctd

  • 8/11/2019 Probability Slides

    44/84

    Example (Exponential distribution, Exp ( ))

    x

    f X (x )

    0

    1

    = 1

    Given > 0 then

    f X (x ) = e x if x > 00 otherwise .

    RV, X Parameters Im (X ) Mean VarianceExp ( ) > 0 R+ 1

    1 2

    Notationally we write X Exp ( ) .

    44

    Examples of continuous random variables, ctd

  • 8/11/2019 Probability Slides

    45/84

    Example (Normal distribution, N ( , 2 ))

    x

    f X (x )

    Given R and 2 > 0 thenf X (x ) =

    1 2 2 e

    (x )2 / (2 2 ) < x <

    RV, X Parameters Im(X

    ) Mean Variance

    N ( , 2 ) R R 2 2 > 0

    Notationally we write

    X N ( , 2

    ) .

    45

    Expectations of continuous random variables

  • 8/11/2019 Probability Slides

    46/84

    Just as for discrete RVs we can dene the expectation of acontinuous RV with density function f X (x ) by a weighted averaging.

    Denition (Expectation)The expectation of X is given by

    E (X ) =

    xf X (x )dx

    whenever the integral exists.In a similar way to the discrete case we have that if g : R R then

    E (g (X )) =

    g (x )f X (x )dx

    whenever the integral exists.

    46

    Variances of continuous random variables

  • 8/11/2019 Probability Slides

    47/84

    Similarly, we can dene the variance of a continuous RV X .

    Denition (Variance)

    The variance , Var (X ), of a continuous RV X with densityfunction f X (x ) is dened as

    Var (X ) = E (X E (X )) 2 =

    (x )2 f X (x )dx

    whenever the integral exists and where = E (X ).

    Exercise: check that we again nd the useful result connecting thesecond central moment to the rst and second moments.

    Var (X ) = E (X 2 ) (E (X )) 2 .

    47

    Example: exponential distribution, Exp ( )

  • 8/11/2019 Probability Slides

    48/84

    Suppose that the RV X has an exponential distribution withparameter > 0 then using integration by parts

    E (X ) =

    0 x e x dx

    = xe x

    0+

    0e x dx

    = 0 + 1

    0

    e x dx = 1

    and

    E (X 2 ) =

    0x 2 e x dx

    = x 2e x

    0 +

    0 2xe x dx

    = 0 + 2

    0x e x dx = 2

    2 .

    Hence, Var (X ) = E (X 2 )

    (E (X )) 2 = 2

    2

    ( 1

    )2 = 1

    2 .

    48

    Bivariate continuous random variablesGi b bili ( F ) h l i l i

  • 8/11/2019 Probability Slides

    49/84

    Given a probability space (, F , P ), we may have multiple continuousRVs, X and Y , say.

    Denition (joint probability distribution function)The joint probability distribution function is given by

    F X ,Y (x , y ) = P ({ |X ( ) x }{ |Y ( ) y })= P (X x , Y y )

    for all x , y R .Independence follows in a similar way to the discrete case and wesay that two continuous RVs X and Y are independent if

    F X ,Y (x , y ) = F X (x )F Y (y )

    for all x , y R .

    49

    Bivariate density functionsTh bi i d i f i RV X d Y i

  • 8/11/2019 Probability Slides

    50/84

    The bivariate density of two continuous RVs X and Y satises

    F X ,Y (x , y ) =

    x

    u = y

    v = f X ,Y (u , v )dudv

    and is given by

    f X ,Y (x , y ) = 2 x y F X ,Y (x , y ) if the derivative exists at (x , y )0 otherwise .

    We have thatf X ,Y (x , y ) 0 x , y R

    and that

    f X ,Y

    (x , y )dxdy = 1 .

    50

    Marginal densities and independenceIf X d Y h j i t d it f ti f ( ) th h

  • 8/11/2019 Probability Slides

    51/84

    If X and Y have a joint density function f X ,Y (x , y ) then we havemarginal densities

    f X (x ) =

    v = f X ,Y (x , v )dv

    andf Y (y ) =

    u = f X ,Y (u , y )du .

    In the case that X and Y are also independent then

    f X ,Y (x , y ) = f X (x )f Y (y )

    for all x , y R .

    51

    Conditional density functionsThe marginal density f (y ) tells us about the variation of the RV Y

  • 8/11/2019 Probability Slides

    52/84

    The marginal density f Y (y ) tells us about the variation of the RV Y when we have no information about the RV X . Consider the oppositeextreme when we have full information about X , namely, that X = x ,

    say. We can not evaluate an expression like

    P (Y y |X = x )directly since for a continuous RV P(X = x ) = 0 and our denition ofconditional probability does not apply.Instead, we rst evaluate P(Y y |x X x + x ) for any x > 0.We nd that

    P (Y y |x X x + x ) = P(Y y , x X x + x )

    P (x X x + x )= x + x u = x y v = f X ,Y (u , v )dudv

    x + x u = x f X (u )du

    .

    52

    Conditional density functions, ctdNow divide the numerator and denominator by x and take the limit

  • 8/11/2019 Probability Slides

    53/84

    Now divide the numerator and denominator by x and take the limitas x 0 to give

    P (Y y |x X x + x ) y

    v = f X ,Y (x , v )

    f X (x ) dv = G (y ), say

    where G (y ) is a distribution function with corresponding density

    g (y ) = f X ,Y (x , y )

    f X (x ) .Accordingly, we dene the notion of a conditional density function asfollows.

    Denition

    The conditional density function of Y given X = x is dened as

    f Y |X (y |x ) = f X ,Y (x , y )

    f X (x )

    dened for all y R and x R such that f X

    (x ) > 0.53

  • 8/11/2019 Probability Slides

    54/84

    Probability generating functions

    54

    Probability generating functionsA very common situation is when a RV X can take only non-negative

  • 8/11/2019 Probability Slides

    55/84

    A very common situation is when a RV, X , can take only non-negativeinteger values, that is Im (X ) {0, 1, 2, . . .}. The probability massfunction, P(X = k ), is given by a sequence of values p 0 , p 1 , p 2 , . . .where p k = P (X = k ) k {0, 1, 2, . . .}and we have that

    p k 0 k {0, 1, 2, . . .} and

    k = 0

    p k = 1 .

    The terms of this sequence can be wrapped together to dene acertain function called the probability generating function (PGF).

    Denition (Probability generating function)The probability generating function , G X (z ), of a (non-negative

    integer-valued) RV X is dened as

    G X (z ) =

    k = 0

    p k z k

    for all values of z such that the sum converges appropriately.55

    Elementary properties of the PGF1 GX (z ) = k 0 pk z

    k so

  • 8/11/2019 Probability Slides

    56/84

    1. G X (z ) k = 0 p k z so

    G X (0) = p 0 and G X (1) = 1 .

    2. If g (t ) = z t then

    G X (z ) =

    k = 0

    p k z k =

    k = 0

    g (k )P (X = k ) = E (g (X )) = E (z X ) .

    3. The PGF is dened for all |z | 1 since

    k = 0 |p k z k |

    k = 0

    p k = 1 .

    4. Importantly, the PGF characterizes the distribution of a RV in the

    sense thatG X (z ) = G Y (z ) z

    if and only if

    P (X = k ) = P (Y = k ) k {0, 1, 2, . . .}.56

    Examples of PGFs

  • 8/11/2019 Probability Slides

    57/84

    Example (Bernoulli distribution)

    G X

    (z ) = q + pz where q = 1

    p .

    Example (Binomial distribution, Bin (n , p ))

    G X (z ) =n

    k = 0n k p k (q )n k z k = ( q + pz )n where q = 1 p .

    Example (Geometric distribution, Geo (p ))

    G X (z ) =

    k = 1

    pq k 1z k = pz

    k = 0

    (qz )k = pz 1 qz

    if |z |< q 1 and q = 1 p .

    57

    Examples of PGFs, ctd

    E l (U if di ib i U(1 ))

  • 8/11/2019 Probability Slides

    58/84

    Example (Uniform distribution, U (1, n ))

    G X (z ) =n

    k = 1 z k 1n =

    z n

    n 1k = 0 z

    k = z n

    (1

    z n )

    (1 z ) .

    Example (Poisson distribution, Pois ( ))

    G X (z ) =

    k = 0

    k e k ! z k = e z e = e (z 1) .

    58

  • 8/11/2019 Probability Slides

    59/84

    Further derivatives of the PGFTaking the second derivative gives

  • 8/11/2019 Probability Slides

    60/84

    g g

    G X (z ) =

    k = 0

    k (k

    1)z k 2 P (X = k ) .

    So that,

    G X (1) =

    k = 0

    k (k 1)P (X = k ) = E (X (X 1))

    Generally, we have the following result.TheoremIf the RV X has PGF G X (z ) then the r-th derivative of the PGF,written G (r )X (z ), evaluated at z = 1 is such that

    G (r )X (1) = E (X (X 1) (X r + 1)) .

    60

  • 8/11/2019 Probability Slides

    61/84

    Sums of independent random variablesThe following theorem shows how PGFs can be used to nd the PGF

  • 8/11/2019 Probability Slides

    62/84

    of the sum of independent RVs.

    TheoremIf X and Y are independent RVs with PGFs G X (z ) and G Y (z )respectively then

    G X + Y (z ) = G X (z )G Y (z ) .

    Proof.Using the independence of X and Y we have that

    G X + Y (z ) = E (z X + Y )

    = E (z X z Y )

    = E (z X

    )E (z Y

    )= G X (z )G Y (z )

    62

    PGF example: Poisson RVsFor example, suppose that X and Y are independent RVs

  • 8/11/2019 Probability Slides

    63/84

    with X Pois ( 1 ) and Y Pois ( 2 ), respectively.ThenG X + Y (z ) = G X (z )G Y (z )

    = e 1 (z 1)e 2 (z 1)

    = e ( 1 + 2 )( z 1) .

    Hence X + Y Pois ( 1 + 2 ) is again a Poisson RV but with theparameter 1 + 2 .

    63

    PGF example: Uniform RVsConsider the case of two fair dice with IID

  • 8/11/2019 Probability Slides

    64/84

    outcomes X and Y , respectively, so that X U (1, 6)and Y U (1, 6). Let the total score be Z = X + Y and consider the probability generating functionof Z given by G Z (z ) = G X (z )G Y (z ). Then

    G Z (z ) = 16

    (z + z 2 + + z 6 )16

    (z + z 2 + + z 6 )

    = 136 [z

    2+ 2z

    3+ 3z

    4+ 4z

    5+ 5z

    6+ 6z

    7+

    5z 8 + 4z 9 + 3z 10 + 2z 11 + z 12 ].

    k

    P (X = k )16

    1 2 3 4 5 6k

    P (Y = k )16

    1 2 3 4 5 6k

    P (Z = k )16 =

    636

    336

    2 3 4 5 6 7 8 9 101112

    64

  • 8/11/2019 Probability Slides

    65/84

    Elementary stochastic processes

    65

    Random walksConsider a sequence Y 1 , Y 2 , . . . of independent and identicallyd b d ( ) h ( ) d ( )

  • 8/11/2019 Probability Slides

    66/84

    distributed (IID) RVs with P(Y i = 1) = p and P(Y i = 1) = 1 p with p [0, 1].Denition (Simple random walk)The simple random walk is a sequence of RVs {X n |n {1, 2, . . .}}dened by

    X n = X 0 + Y 1 + Y 2 + + Y n where X 0

    R is the starting value.

    Denition (Simple symmetric random walk)A simple symmetric random walk is a simple random walk with thechoice p = 1/ 2.

    n

    X n

    0

    X 0

    1 2 3 4 5 6 7 8 9

    E.g. X 0 = 2 & (Y 1 , Y 2 , . . . , Y 9 , . . .) = ( 1 ,1,1 ,1,1, 1, 1 , 1,1 , . . .)

    66

    ExamplesPractical examples of random walks abound

    h h i l i ( i f i

  • 8/11/2019 Probability Slides

    67/84

    across the physical sciences (motion of atomicparticles) and the non-physical sciences

    (epidemics, gambling, asset prices).

    The following is a simple model for the operation of a casino.Suppose that a gambler enters with a capital of X 0 . At each stagethe gambler places a stake of 1 and with probability p wins thegamble otherwise the stake is lost. If the gambler wins the stake isreturned together with an additional sum of 1.Thus at each stage the gamblers capital increases by 1 withprobability p or decreases by 1 with probability 1 p .The gamblers capital X n at stage n thus follows a simple randomwalk except that the gambler is bankrupt if X n reaches 0 and thencan not continue to any further stages.

    67

    Returning to the starting state for a simple randomwalk

  • 8/11/2019 Probability Slides

    68/84

    Let X n be a simple random walk and

    r n = P (X n = X 0 ) for n = 1, 2, . . .

    the probability of returning to the starting state at time n .We will show the following theorem.

    Theorem

    If n is odd then r n = 0 else if n = 2m is even then

    r 2m =2m m

    p m (1 p )m .

    68

    Proof.The position of the random walk will change by an amount

  • 8/11/2019 Probability Slides

    69/84

    X n X 0 = Y 1 + Y 2 + + Y n between times 0 and n . Hence, for this change X n X 0 to be 0 theremust be an equal number of up steps as down steps. This can neverhappen if n is odd and so r n = 0 in this case. If n = 2m is even thennote that the number of up steps in a total of n steps is a binomial RVwith parameters 2 m and p . Thus,

    r 2m = P (X n X 0 = 0) =2m m

    p m (1 p )m .

    This result tells us about the probability of returning to the startingstate at a given time n .We will now look at the probability that we ever return to our startingstate. For convenience, and without loss of generality, we shall takeour starting value as X 0 = 0 from now on.

    69

    Recurrence and transience of simple randomwalks

  • 8/11/2019 Probability Slides

    70/84

    Note rst that E(Y i ) = p (1 p ) = 2p 1 for each i {1, 2, . . .}. Thusthere is a net drift upwards if p > 1/ 2 and a net drift downwardsif p < 1/ 2. Only in the case p = 1/ 2 is there no net drift upwards nordownwards.We say that the simple random walk is recurrent if it is certain torevisit its starting state at some time in the future and transientotherwise.

    We shall prove the following theorem.TheoremFor a simple random walk with starting state X 0 = 0 the probability of revisiting the starting state is

    P (X n = 0 for some n {1, 2, . . .}) = 1 |2p 1|.Thus a simple random walk is recurrent only when p = 1/ 2.

    70

  • 8/11/2019 Probability Slides

    71/84

    Proof, ctdDene generating functions for the sequences r n and f n by

  • 8/11/2019 Probability Slides

    72/84

    R (z ) =

    n = 0

    r n z n and F (z ) =

    n = 0

    f n z n

    where r 0 = 1 and f 0 = 0 and take |z |< 1. We have that

    n = 1

    r n z n =

    n = 1

    n

    m = 1

    f m r n m z n

    =

    m = 1

    n = m

    f m z m r n m z n m

    =

    m = 1

    f m z m

    k = 0

    r k z k

    = F (z )R (z ) .

    The left hand side is R (z ) r 0z 0 = R (z ) 1 thus we have thatR (z ) = R (z )F (z ) + 1 if |z |< 1.

    72

    Proof, ctdNow,

  • 8/11/2019 Probability Slides

    73/84

    R (z ) =

    n = 0

    r n z n

    =

    m = 0

    r 2m z 2m as r n = 0 if n is odd

    =

    m = 0

    2m m

    (p (1 p )z 2 )m

    = ( 1 4p (1 p )z 2 )12 .

    The last step follows from the binomial series expansionof (1 4 )

    12 and the choice = p (1 p )z 2 .

    Hence, F (z ) = 1 (1 4p (1 p )z 2 )12 for |z |< 1 .

    73

  • 8/11/2019 Probability Slides

    74/84

    Mean return timeConsider the recurrent case when p = 1/ 2 and set

  • 8/11/2019 Probability Slides

    75/84

    T = min{n 1 |X n = 0} so that P(T = n ) = f n where T is the time of the rst return to the starting state. Then

    E (T ) =

    n = 1

    nf n

    = G T (1)

    where G T (z ) is the PGF of the RV T and for p = 1/ 2 we havethat 4 p (1 p ) = 1 so

    G T (z ) = 1 (1 z 2 )12

    so thatG T (z ) = z (1 z 2 )

    12 as z 1 .

    Thus, the simple symmetric random walk ( p = 1/ 2) is recurrent butthe expected time to rst return to the starting state is innite .

    75

    The Gamblers ruin problemWe now consider a variant of the simple random walk. Consider twoplayers A and B with a joint capital between them of N. Suppose that

  • 8/11/2019 Probability Slides

    76/84

    p y j p ppinitially A has X 0 = a (0 a N ).At each time step player B gives A 1 with probability p and withprobability q = ( 1 p ) player A gives 1 to B instead. The outcomesat each time step are independent.The game ends at the rst time T a if either X T a = 0 or X T a = N forsome T a {0, 1, . . .}.We can think of As wealth, X n , at time n as a simple random walk onthe states {0, 1, . . . , N } with absorbing barriers at 0 and N .Dene the probability of ruin for gambler A as

    a = P (A is ruined ) = P (B wins ) for 0 a N .

    n

    X n

    0

    N = 5

    X 0 = 2

    1 2 3 4 5 6 7 8 9

    E.g. N = 5, X 0 = a = 2 & (Y 1 , Y 2 , Y 3 , Y 4 ) = ( 1,

    1,

    1,

    1)

    T 2 = 4 & X T 2 = X 4 = 0

    76

    Solution of the Gamblers ruin problemTheoremThe probability of ruin when A starts with an initial capital of a is given

  • 8/11/2019 Probability Slides

    77/84

    The probability of ruin when A starts with an initial capital of a is given by

    a =

    a

    N

    1 N if p = q 1 a N if p = q = 1/ 2where = q / p.For illustration here is a set of graphs of a for N = 100 and three

    possible choices of p .

    a

    a

    10 20 30 40 50 60 70 80 90 1000

    1

    0.75

    0 .5

    0.25

    p = 0.49

    p = 0.5

    p = 0.51

    77

    ProofConsider what happens at the rst time step

  • 8/11/2019 Probability Slides

    78/84

    a = P (ruin Y 1 = + 1|X 0 = a ) + P (ruin Y 1 = 1|X 0 = a )= p P (ruin |X 0 = a + 1) + q P (ruin |X 0 = a 1)= p a + 1 + q a 1

    Now look for a solution to this difference equation of the form a withboundary conditions 0 = 1 and N = 0.Try a solution of the form a = a to give

    a = p a + 1 + q a 1

    Hence,p 2 + q = 0

    with solutions = 1 and = q / p .

    78

    Proof, ctdIf p = q there are two distinct solutions and the general solution of thedifference equation is of the form A + B (q / p )a .

  • 8/11/2019 Probability Slides

    79/84

    Applying the boundary conditions

    1 = 0 = A + B and 0 = N = A + B (q / p )N

    we getA = B (q / p )N

    and1 = B B (q / p )N

    so

    B = 1

    1 (q / p )N and A = (q / p )N

    1 (q / p )N .

    Hence,a =

    (q / p )a (q / p )N 1 (q / p )N

    .

    79

    Proof, ctdIf p = q = 1/ 2 then the general solution is C + Da .So with the boundary conditions

  • 8/11/2019 Probability Slides

    80/84

    1 = 0 = C + D (0) and 0 = N = C + D (N ) .

    Therefore,C = 1 and 0 = 1 + D (N )

    soD =

    1/ N

    anda = 1 a / N .

    80

    Mean duration timeSet T a as the time to be absorbed at either 0 or N starting from theinitial state a and write a = E (T a ).

  • 8/11/2019 Probability Slides

    81/84

    Then, conditioning on the rst step as before

    a = 1 + p a + 1 + q a 1 for 1 a N 1and 0 = N = 0.It can be shown that a is given by

    a = 1p

    q N

    (q / p )a 1(q / p )N

    1 a if p = q

    a (N a ) if p = q = 1/ 2 .We skip the proof here but note the following cases can be used toestablish the result.Case p = q : trying a particular solution of the form a = ca showsthat c = 1/ (q

    p ) and the general solution is then of the

    form a = A + B (q / p )a + a / (q p ). Fixing the boundary conditionsgives the result.Case p = q = 1/ 2: now the particular solution is a 2 so the generalsolution is of the form a = A + Ba a 2 and xing the boundaryconditions gives the result.

    81

    Properties of discrete RVsRV, X Parameters Im (X ) P(X = k ) E(X ) Var(X ) G X (z )

    Bernoulli p [0, 1]

    {0 , 1} (1 p ) if k = 0 or p if k = 1 p p (1 p ) (1 p + pz )Bi ( ) {1 2 } {0 1 } n k (1 )n k (1 ) (1 )n

  • 8/11/2019 Probability Slides

    82/84

    Bin(n , p ) n {1, 2 , . . .} {0 , 1, . . . , n } n k p k (1 p )n k np np (1 p ) (1 p + pz )n p [0, 1]Geo (p ) 0 < p

    1

    {1 , 2, . . .

    } p (1

    p )k 1 1p 1p

    p 2

    pz 1

    (1

    p )z

    U (1 , n ) n {1, 2 , . . .} {1 , 2, . . . , n } 1n n + 12 n 2112 z (1z

    n )n (1z )

    Pois ( ) > 0 {0 , 1, . . .} k e

    k ! e (z 1)

    82

    Properties of continuous RVsRV, X Parameters Im (X ) f X (x ) E(X ) Var(X )

    1 a+ b (b a )2

  • 8/11/2019 Probability Slides

    83/84

    U (a , b ) a , b R (a , b ) 1b a a + b

    2(b a )12

    a < b Exp ( ) > 0 R+ e x 1 1 2N ( , 2 ) R R 1 2 2 e (x )

    2 / (2 2 ) 2

    2 > 0

    83

    Notation sample space of possible outcomes F event space : set of random events E I(E ) indicator function of the event E F

  • 8/11/2019 Probability Slides

    84/84

    I(E ) indicator function of the event E F P (E ) probability that event E occurs, e.g. E ={

    X = k

    }RV random variableX U (0, 1) RV X has the distribution U (0, 1)P (X = k ) probability mass function of RV X F X (x ) distribution function , F X (x ) = P (X x )f X (x ) density of RV X given, when it exists, by F X (x )

    PGF probability generating function G X (z ) for RV X E (X ) expected value of RV X

    E (X n ) n th moment of RV X , for n = 1, 2, . . .Var (X ) variance of RV X

    IID independent, identically distributed

    X n sample mean of random sample X 1 , X 2 , . . . , X n

    84