Prof[1] F P Kelly - Probability

Embed Size (px)

Citation preview

  • 8/14/2019 Prof[1] F P Kelly - Probability

    1/78

    Probability

    Prof. F.P. Kelly

    Lent 1996

    These notes are maintained by Paul Metcalfe.

    Comments and corrections to [email protected].

  • 8/14/2019 Prof[1] F P Kelly - Probability

    2/78

    Revision: 1.1

    Date: 2002/09/20 14:45:43

    The following people have maintained these notes.

    June 2000 Kate Metcalfe

    June 2000 July 2004 Andrew RogersJuly 2004 date Paul Metcalfe

  • 8/14/2019 Prof[1] F P Kelly - Probability

    3/78

    Contents

    Introduction v

    1 Basic Concepts 11.1 Sample Space . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 Classical Probability . . . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.3 Combinatorial Analysis . . . . . . . . . . . . . . . . . . . . . . . . . 2

    1.4 Stirlings Formula . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

    2 The Axiomatic Approach 5

    2.1 The Axioms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

    2.2 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

    2.3 Distributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8

    2.4 Conditional Probability . . . . . . . . . . . . . . . . . . . . . . . . . 9

    3 Random Variables 11

    3.1 Expectation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.2 Variance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

    3.3 Indicator Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

    3.4 Inclusion - Exclusion Formula . . . . . . . . . . . . . . . . . . . . . 18

    3.5 Independence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    4 Inequalities 23

    4.1 Jensens Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

    4.2 Cauchy-Schwarz Inequality . . . . . . . . . . . . . . . . . . . . . . . 26

    4.3 Markovs Inequality . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4.4 Chebyshevs Inequality . . . . . . . . . . . . . . . . . . . . . . . . . 27

    4.5 Law of Large Numbers . . . . . . . . . . . . . . . . . . . . . . . . . 28

    5 Generating Functions 31

    5.1 Combinatorial Applications . . . . . . . . . . . . . . . . . . . . . . . 34

    5.2 Conditional Expectation . . . . . . . . . . . . . . . . . . . . . . . . 34

    5.3 Properties of Conditional Expectation . . . . . . . . . . . . . . . . . 35

    5.4 Branching Processes . . . . . . . . . . . . . . . . . . . . . . . . . . 37

    5.5 Random Walks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

    6 Continuous Random Variables 47

    6.1 Jointly Distributed Random Variables . . . . . . . . . . . . . . . . . 50

    6.2 Transformation of Random Variables . . . . . . . . . . . . . . . . . . 57

    6.3 Moment Generating Functions . . . . . . . . . . . . . . . . . . . . . 64

    iii

  • 8/14/2019 Prof[1] F P Kelly - Probability

    4/78

    iv CONTENTS

    6.4 Central Limit Theorem . . . . . . . . . . . . . . . . . . . . . . . . . 67

    6.5 Multivariate normal distribution . . . . . . . . . . . . . . . . . . . . 71

  • 8/14/2019 Prof[1] F P Kelly - Probability

    5/78

    Introduction

    These notes are based on the course Probability given by Prof. F.P. Kelly in Cam-

    bridge in the Lent Term 1996. This typed version of the notes is totally unconnected

    with Prof. Kelly.

    Other sets of notes are available for different courses. At the time of typing thesecourses were:

    Probability Discrete Mathematics

    Analysis Further Analysis

    Methods Quantum Mechanics

    Fluid Dynamics 1 Quadratic Mathematics

    Geometry Dynamics of D.E.s

    Foundations of QM Electrodynamics

    Methods of Math. Phys Fluid Dynamics 2

    Waves (etc.) Statistical Physics

    General Relativity Dynamical Systems

    Combinatorics Bifurcations in Nonlinear Convection

    They may be downloaded from

    http://www.istari.ucam.org/maths/ .

    v

    http://www.istari.ucam.org/maths/http://www.istari.ucam.org/maths/http://www.istari.ucam.org/maths/
  • 8/14/2019 Prof[1] F P Kelly - Probability

    6/78

    vi INTRODUCTION

  • 8/14/2019 Prof[1] F P Kelly - Probability

    7/78

    Chapter 1

    Basic Concepts

    1.1 Sample Space

    Suppose we have an experiment with a set of outcomes. Then is called the samplespace. A potential outcome is called a sample point.

    For instance, if the experiment is tossing coins, then = {H, T}, or if the experi-ment was tossing two dice, then = {(i, j) : i, j {1, . . . , 6}}.

    A subset A of is called an event. An event A occurs is when the experiment isperformed, the outcome satisfies A. For the coin-tossing experiment, thenthe event of a head appearing is A = {H} and for the two dice, the event rolling afour would be A = {(1, 3), (2, 2), (3, 1)}.

    1.2 Classical Probability

    If is finite, = {1, . . . , n}, and each of the n sample points is equally likelythen the probability of event A occurring is

    P(A) =|A|||

    Example. Choose r digits from a table of random numbers. Find the probability thatfor0 k 9,

    1. no digit exceeds k,

    2. k is the greatest digit drawn.

    Solution. The event that no digit exceeds k is

    Ak = {(a1, . . . , ar) : 0 ai k, i = 1 . . . r} .

    Now |Ak| = (k + 1)r, so that P(Ak) =k+110

    r.

    Let Bk be the event that k is the greatest digit drawn. Then Bk = Ak \ Ak1. AlsoAk1 Ak, so that |Bk| = (k + 1)r kr . Thus P(Bk) = (k+1)

    rkr10r

    1

  • 8/14/2019 Prof[1] F P Kelly - Probability

    8/78

    2 CHAPTER 1. BASIC CONCEPTS

    The problem of the points

    Players A and B play a series of games. The winner of a game wins a point. The twoplayers are equally skillful and the stake will be won by the first player to reach a target.

    They are forced to stop when A is within 2 points and B within 3 points. How should

    the stake be divided?

    Pascal suggested that the following continuations were equally likely

    AAAA AAAB AABB ABBB BBBB

    AABA ABBA BABB

    ABAA ABAB BBAB

    BAAA BABA BBBA

    BAAB

    BBAA

    This makes the ratio 11 : 5. It was previously thought that the ratio should be 6 : 4on considering termination, but these results are not equally likely.

    1.3 Combinatorial Analysis

    The fundamental rule is:

    Suppose r experiments are such that the first may result in any of n1 possible out-comes and such that for each of the possible outcomes of the first i 1 experimentsthere are ni possible outcomes to experiment i. Let ai be the outcome of experiment i.Then there are a total of

    ri=1 ni distinct r-tuples (a1, . . . , ar) describing the possible

    outcomes of the r experiments.

    Proof. Induction.

    1.4 Stirlings Formula

    For functions g(n) and h(n), we say that g is asymptotically equivalent to h and write

    g(n) h(n) if g(n)h(n) 1 as n .Theorem 1.1 (Stirlings Formula). As n ,

    logn!

    2nnnen 0

    and thus n! 2nnnen.

    We first prove the weak form of Stirlings formula, that log(n!) n log n.Proof. log n! =

    n1 log k. Nown

    1

    log xdx n1

    log k n+11

    log xdx,

    andz1

    logx dx = z log z z + 1, and son log n n + 1 log n! (n + 1) log(n + 1) n.

    Divide by n log n and let n to sandwich logn!n logn between terms that tend to 1.Therefore log n! n log n.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    9/78

    1.4. STIRLINGS FORMULA 3

    Now we prove the strong form.

    Proof. For x > 0, we have

    1 x + x2 x3 < 11 + x

    < 1 x + x2.

    Now integrate from 0 to y to obtain

    y y2/2 + y3/3 y4/4 < log(1 + y) < y y2/2 + y3/3.

    Let hn = logn!en

    nn+1/2. Then1 we obtain

    1

    12n2 1

    12n3 hn hn+1 1

    12n2+

    1

    6n3.

    For n 2, 0 hn hn+1 1n2 . Thus hn is a decreasing sequence, and 0 h2hn+1

    nr=2(hrhr+1)

    1

    1r2 . Therefore hn is bounded below, decreasing

    so is convergent. Let the limit be A. We have obtained

    n! eAnn+1/2en.

    We need a trick to find A. Let Ir =/20

    sinr d. We obtain the recurrence Ir =r1r Ir2 by integrating by parts. Therefore I2n =

    (2n)!(2nn!)2

    /2 and I2n+1 =(2nn!)2

    (2n+1)!.

    Now In is decreasing, so

    1 I2nI2n+1

    I2n1I2n+1

    = 1 +1

    2n 1.

    But by substituting our formula in, we get that

    I2nI2n+1

    2

    2n + 1

    n

    2

    e2A 2

    e2A.

    Therefore e2A = 2 as required.

    1by playing silly buggers with log 1 + 1n

  • 8/14/2019 Prof[1] F P Kelly - Probability

    10/78

    4 CHAPTER 1. BASIC CONCEPTS

  • 8/14/2019 Prof[1] F P Kelly - Probability

    11/78

    Chapter 2

    The Axiomatic Approach

    2.1 The Axioms

    Let be a sample space. Then probabilityP is a real valued function defined on subsetsof satisfying :-

    1. 0 P(A) 1 for A ,

    2. P() = 1,

    3. for a finite or infinite sequence A1, A2, of disjoint events, P(Ai) =i P(Au).

    The number P(A) is called the probability of event A.

    We can look at some distributions here. Consider an arbitrary finite or countable

    = {1, 2, . . . } and an arbitrary collection {p1, p2, . . . } of non-negative numberswith sum 1. If we define

    P(A) =

    i:iApi,

    it is easy to see that this function satisfies the axioms. The numbers p1, p2, . . . arecalled a probability distribution. If is finite with n elements, and ifp1 = p2 = =pn =

    1n we recover the classical definition of probability.

    Another example would be to let = {0, 1, . . . } and attach to outcome r theprobability pr = e

    r

    r! for some > 0. This is a distribution (as may be easilyverified), and is called the Poisson distribution with parameter .

    Theorem 2.1 (Properties ofP). A probability P satisfies

    1. P(Ac) = 1 P(A),

    2. P() = 0,

    3. ifA B then P(A) P(B),

    4. P(A B) = P(A) + P(B) P(A B).

    5

  • 8/14/2019 Prof[1] F P Kelly - Probability

    12/78

    6 CHAPTER 2. THE AXIOMATIC APPROACH

    Proof. Note that = AAc, and AAc = . Thus 1 = P() = P(A)+P(Ac). Nowwe can use this to obtain

    P

    () = 1 P

    (c

    ) = 0. IfA B, write B = A (B Ac

    ),so that P(B) = P(A) + P(B Ac) P(A). Finally, write A B = A (B Ac)and B = (B A) (B Ac). Then P(A B) = P(A) + P(B Ac) and P(B) =P(B A) + P(B Ac), which gives the result.Theorem 2.2 (Booles Inequality). For any A1, A2, ,

    P

    n1

    Ai

    ni

    P(Ai)

    P

    1

    Ai

    i

    P(Ai)

    Proof. Let B1 = A1 and then inductively let Bi = Ai\

    i11

    Bk. Thus the Bis aredisjoint and

    i Bi =

    i Ai. Therefore

    P

    i

    Ai

    = P

    i

    Bi

    =i

    P(Bi)

    i

    P(Ai) as Bi Ai.

    Theorem 2.3 (Inclusion-Exclusion Formula).

    P

    n1

    Ai

    =

    S{1,...,n}

    S=

    (1)|S|1P

    jSAj

    .

    Proof. We know that P(A1 A2) = P(A1) + P(A2) P(A1 A2). Thus the resultis true for n = 2. We also have that

    P(A1 An) = P(A1 An1) + P(An) P((A1 An1) An) .But by distributivity, we have

    P

    n

    i

    Ai

    = P

    n1

    1

    Ai

    + P(An) P

    n1

    1

    (Ai An)

    .

    Application of the inductive hypothesis yields the result.

    Corollary (Bonferroni Inequalities).

    S{1,...,r}

    S=

    (1)|S|1P

    jSAj

    or

    P

    n1

    Ai

    according as r is even or odd. Or in other words, if the inclusion-exclusion formula istruncated, the error has the sign of the omitted term and is smaller in absolute value.

    Note that the case r = 1 is Booles inequality.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    13/78

    2.2. INDEPENDENCE 7

    Proof. The result is true for n = 2. If true for n 1, then it is true for n and 1 r n 1 by the inductive step above, which expresses a n-union in terms of two n 1unions. It is true for r = n by the inclusion-exclusion formula.Example (Derangements). After a dinner, the n guests take coats at random from a

    pile. Find the probability that at least one guest has the right coat.

    Solution. Let Ak be the event that guest k has his1 own coat.

    We want P(n

    i=1 Ai). Now,

    P(Ai1 Air) =(n r)!

    n!,

    by counting the number of ways of matching guests and coats after i1, . . . , ir havetaken theirs. Thus

    i1

  • 8/14/2019 Prof[1] F P Kelly - Probability

    14/78

    8 CHAPTER 2. THE AXIOMATIC APPROACH

    Solution. We first calculate the probabilities of the events A1, A2, A3, A1A2, A1A3and A1 A2 A3.

    Event Probability

    A11836

    = 12

    A2 As above,12

    A36336

    = 12

    A1 A2 3336 = 14A1

    A3

    3336

    = 14

    A1 A2 A3 0

    Thus by a series of multiplications, we can see that A1 and A2 are independent, A1and A3 are independent (also A2 and A3), but that A1, A2 and A3 are notindependent.

    Now we wish to state what we mean by 2 independent experiments 2. Consider

    1 = {1, . . . } and 2 = {1, . . . } with associated probability distributions {p1, . . . }and {q1, . . . }. Then, by 2 independent experiments, we mean the sample space1 2 with probability distribution P((i, j)) = piqj .

    Now, suppose A

    1 and B

    2. The event A can be interpreted as an event in

    1 2, namely A 2, and similarly for B. ThenP(A B) =

    iAjB

    piqj =iA

    pijB

    qj = P(A)P(B) ,

    which is why they are called independent experiments. The obvious generalisation

    to n experiments can be made, but for an infinite sequence of experiments we mean asample space 1 2 . . . satisfying the appropriate formula n N.

    You might like to find the probability that n independent tosses of a biased coinwith the probability of heads p results in a total ofr heads.

    2.3 Distributions

    The binomial distribution with parameters n andp, 0 p 1 has = {0, . . . , n} andprobabilities pi =

    ni

    pi(1 p)ni.

    Theorem 2.4 (Poisson approximation to binomial). Ifn , p 0 with np = held fixed, then

    n

    r

    pr(1 p)nr e

    r

    r!.

    2or more generally, n.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    15/78

    2.4. CONDITIONAL PROBABILITY 9

    Proof. nr

    pr(1 p)nr = n(n 1) . . . (n r + 1)

    r!pr(1 p)nr

    =n

    n

    n 1n

    . . .n r + 1

    n

    (np)r

    r!(1 p)nr

    =r

    i=1

    n i + 1

    n

    r

    r!

    1

    n

    n1

    n

    r

    1 r

    r! e 1

    = er

    r!.

    Suppose an infinite sequence of independent trials is to be performed. Each trial

    results in a success with probability p (0, 1) or a failure with probability 1 p. Sucha sequence is called a sequence of Bernoulli trials. The probability that the first success

    occurs after exactly r failures ispr = p(1p)r. This is the geometric distribution withparameter p. Since

    0 pr = 1, the probability that all trials result in failure is zero.

    2.4 Conditional Probability

    Definition 2.2. ProvidedP(B) > 0 , we define the conditional probability ofA|B3 tobe

    P(A

    |B) =

    P(A B)P(B)

    .

    Whenever we write P(A|B), we assume thatP(B) > 0.Note that ifA and B are independent then P(A|B) = P(A).

    Theorem 2.5. 1. P(A B) = P(A|B)P(B),2. P(A B C) = P(A|B C)P(B|C)P(C),3. P(A|B C) = P(AB|C)

    P(B|C) ,

    4. the function P(|B) restricted to subsets ofB is a probability function on B.Proof. Results 1 to 3 are immediate from the definition of conditional probability. For

    result 4, note that AB B, so P(A B) P(B) and thus P(A|B) 1. P(B|B) =1 (obviously), so it just remains to show the last axiom. For disjoint Ais,

    P

    i

    Ai

    B

    =P(

    i(Ai B))P(B)

    =

    i P(Ai B)P(B)

    =i

    P(Ai|B) , as required.

    3read A givenB.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    16/78

  • 8/14/2019 Prof[1] F P Kelly - Probability

    17/78

    Chapter 3

    Random Variables

    Let be finite or countable, and let p = P({}) for .

    Definition 3.1. A random variable X is a function X : R.

    Note that random variable is a somewhat inaccurate term, a random variable is

    neither random nor a variable.

    Example. If = {(i, j), 1 i, j t} , then we can define random variables X andY by X(i, j) = i +j andY(i, j) = max{i, j}

    LetRX

    be the image of

    underX

    . When the range is finite or countable then the

    random variable is said to be discrete.

    We write P(X = xi) for

    :X()=xip , and for B R

    P(X B) =xB

    P(X = x) .

    Then

    (P(X = x) , x RX)

    is the distribution of the random variable X. Note that it is a probability distributionover RX .

    3.1 Expectation

    Definition 3.2. The expectation of a random variable X is the number

    E[X] =

    pwX()

    provided that this sum converges absolutely.

    11

  • 8/14/2019 Prof[1] F P Kelly - Probability

    18/78

    12 CHAPTER 3. RANDOM VARIABLES

    Note that

    E[X] =

    pwX()

    =

    xRX

    :X()=x

    pX()

    =

    xRXx

    :X()=x

    p

    =

    xRXxP(X = x) .

    Absolute convergence allows the sum to be taken in any order.

    If X is a positive random variable and if

    pX() = we write E[X] =+. If

    xRXx0

    xP(X = x) = and

    xRXx

  • 8/14/2019 Prof[1] F P Kelly - Probability

    19/78

    3.1. EXPECTATION 13

    Solution.

    E[X] =n

    r=0

    rpr(1 p)nr

    nr

    =n

    r=0

    rn!

    r!(n r)!pr(1 p)nr

    = nn

    r=1

    (n 1)!(r 1)!(n r)!p

    r(1 p)nr

    = np

    nr=1

    (n 1)!(r 1)!(n r)!p

    r1(1 p)nr

    = np

    n1r=1

    (n

    1)!

    (r)!(n r)!pr

    (1 p)n

    1

    r

    = npn1r=1

    n 1

    r

    pr(1 p)n1r

    = np

    For any function f: R R the composition of f and X defines a new randomvariable f and X defines the new random variable f(X) given by

    f(X)(w) = f(X(w)).

    Example. Ifa, b andc are constants, then a + bXand(X c)2 are random variablesdefined by

    (a + bX)(w) = a + bX(w) and

    (X c)2(w) = (X(w) c)2.Note that E[X] is a constant.

    Theorem 3.1.

    1. IfX 0 then E[X] 0.2. IfX 0 andE[X] = 0 then P(X = 0) = 1.

    3. Ifa andb are constants then E[a + bX] = a + bE[X].

    4. For any random variables X, Y then E[X+ Y] = E[X] + E[Y].

    5. E[X] is the constant which minimises E

    (X c)2

    .

    Proof. 1. X 0 means Xw 0 w

    So E[X] =

    pX() 0

    2. If with p > 0 and X() > 0 then E[X] > 0, therefore P(X = 0) = 1.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    20/78

    14 CHAPTER 3. RANDOM VARIABLES

    3.

    E[a + bX] =

    (a + bX())p

    = a

    p + b

    pX()

    = a + E[X] .

    4. Trivial.

    5. Now

    E

    (X c)2

    = E

    (X E[X] + E[X] c)2

    = E

    [(X E[X])

    2+ 2(X E[X])(E[X] c) + [(E[X] c)]

    2

    ]= E

    (X E[X])2 + 2(E[X] c)E[(X E[X])] + (E[X] c)2

    = E

    (X E[X])2 + (E[X] c)2.This is clearly minimised when c = E[X].

    Theorem 3.2. For any random variables X1, X2,....,Xn

    E

    n

    i=1

    Xi

    =

    ni=1

    E[Xi]

    Proof.

    E

    n

    i=1

    Xi

    = E

    n1i=1

    Xi + Xn

    = E

    n1i=1

    Xi

    + E[X]

    Result follows by induction.

    3.2 Variance

    Var X = E

    X2 E[X]2 for Random Variable X

    = E[X E[X]]2 = 2Standard Deviation =

    Var X

    Theorem 3.3. Properties of Variance

    (i) Var X 0 ifVar X = 0, then P(X = E[X]) = 1Proof - from property 1 of expectation

    (ii) Ifa, b constants, Var (a + bX) = b2 Var X

  • 8/14/2019 Prof[1] F P Kelly - Probability

    21/78

    3.2. VARIANCE 15

    Proof.

    Var a + bX = E[a + bX a bE[X]]= b2E[X E[X]]= b2 Var X

    (iii) Var X = E

    X2 E[X]2

    Proof.

    E[X E[X]]2 = EX2 2XE[X] + (E[X])2= E

    X2

    2E[X]E[X] + E[X]2

    = EX2 (E[X])2

    Example. Let X have the geometric distribution P(X = r) = pqr with r = 0, 1, 2...andp + q = 1. Then E[X] = qp andVar X =

    qp2 .

    Solution.

    E[X] =r=0

    rpqr = pqr=0

    rqr1

    =1

    pq

    r=0

    d

    dq(qr) = pq

    d

    dq

    11 q

    = pq(1 q)2

    =

    q

    p

    E

    X2

    =r=0

    r2p2q2r

    = pq

    r=1

    r(r + 1)qr1 r=1

    rqr1

    = pq(2

    (1 q)3 1

    (1 q)2 =2q

    p2 q

    p

    Var X = E

    X2 E[X]2

    =2q

    p2 q

    p q

    2

    p

    =q

    p2

    Definition 3.3. The co-variance of random variables X andY is:

    Cov(X, Y) = E[(X E[X])(Y E[Y])]The correlation ofX andY is:

    Corr(X, Y) =Cov(X, Y)Var XVar Y

  • 8/14/2019 Prof[1] F P Kelly - Probability

    22/78

    16 CHAPTER 3. RANDOM VARIABLES

    Linear Regression

    Theorem 3.4. Var (X+ Y) = Var X + Var Y + 2Cov(X, Y)

    Proof.

    Var (X + Y) = E

    (X+ Y)2 E[X] E[Y]2= E

    (X E[X])2 + (Y E[Y])2 + 2(X E[X])(Y E[Y])

    = Var X+ Var Y + 2Cov(X, Y)

    3.3 Indicator Function

    Definition 3.4. The Indicator Function I[A] of an eventA is the function

    I[A](w) =

    1, if A;0, if / A. (3.1)

    NB that I[A] is a random variable

    1.

    E[I[A]] = P(A)

    E[I[A]] =

    pI[A](w)

    = P(A)

    2. I[Ac] = 1 I[A]

    3. I[A

    B] = I[A]I[B]

    4.

    I[A B] = I[A] + I[B] I[A]I[B]I[A B]() = 1 if A or BI[A B]() = I[A]() + I[B]() I[A]I[B]() WORKS!

    Example. n couples are arranged randomly around a table such that males and fe-males alternate. LetN = The number of husbands sitting next to their wives. Calculate

  • 8/14/2019 Prof[1] F P Kelly - Probability

    23/78

    3.3. INDICATOR FUNCTION 17

    the E[N] and the Var N.

    N =n

    i=1

    I[Ai] Ai = event couple i are together

    E[N] = E

    n

    i=1

    I[Ai]

    =n

    i=1

    E[I[Ai]]

    =

    n

    i=1

    2

    n

    Thus E[N] = n2

    n= 2

    E

    N2

    = E

    n

    i=1

    I[Ai]

    2

    = E

    n

    i=1

    I[Ai]

    2

    + 2ij

    I[Ai]I[Aj ]

    = nE

    I[Ai]2

    + n(n 1)E[(I[A1]I[A2])]EI[Ai]2 = E[I[Ai]] = 2

    nE[(I[A1]I[A2])] = IE[[A1 B2]] = P(A1 A2)

    = P(A1)P(A2|A1)

    =2

    n

    1

    n 11

    n 1 n 2n 1

    2

    n 1

    Var N = E

    N2 E[N]2

    =2

    n 1 (1 + 2(n 2)) 2

    =2(n 2)

    n 1

  • 8/14/2019 Prof[1] F P Kelly - Probability

    24/78

  • 8/14/2019 Prof[1] F P Kelly - Probability

    25/78

    3.5. INDEPENDENCE 19

    Proof.

    P(f1(X1) = y1, . . . , f n(Xn) = yn) =

    x1:f1(X1)=y1,...xn:fn(Xn)=yn

    P(X1 = x1, . . . , X n = xn)

    =N1

    xi:fi(Xi)=yi

    P(Xi = xi)

    =N1

    P(fi(Xi) = yi)

    Theorem 3.6. IfX1.....Xn are independent random variables then:

    E

    N1

    Xi

    =

    N1

    E[Xi]

    NOTE that E[

    Xi] =E[Xi] without requiring independence.

    Proof. Write Ri for RXi the range ofXi

    EN

    1

    Xi = x1R1

    .... xnRn

    x1..xnP(X1 = x1, X2 = x2......., Xn = xn)

    =N1

    xiRi

    P(Xi = xi)

    =N1

    E[Xi]

    Theorem 3.7. IfX1,...,Xn are independent random variables and f1....fn are func-tion R

    R then:

    E

    N1

    fi(Xi)

    =

    N1

    E[fi(Xi)]

    Proof. Obvious from last two theorems!

    Theorem 3.8. IfX1,...,Xn are independent random variables then:

    Var

    n

    i=1

    Xi

    =

    ni=1

    Var Xi

  • 8/14/2019 Prof[1] F P Kelly - Probability

    26/78

    20 CHAPTER 3. RANDOM VARIABLES

    Proof.

    Var

    n

    i=1

    Xi

    = E

    n

    i=1

    Xi

    2 E

    ni=1

    Xi

    2

    = E

    i

    X2i +i=j

    XiXj

    E

    n

    i=1

    Xi

    2

    =i

    E

    X2i

    +i=j

    E[XiXj ] i

    E[Xi]2

    i=j

    E[Xi]E[Xj ]

    =i

    E

    X2i E[Xi]2

    =i

    Var Xi

    Theorem 3.9. IfX1,...,Xn are independent identically distributed random variablesthen

    Var

    1

    n

    n

    i=1Xi

    =

    1

    nVar Xi

    Proof.

    Var

    1

    n

    ni=1

    Xi

    =

    1

    n2Var Xi

    =1

    n2

    ni=1

    Var Xi

    =1

    nVar Xi

    Example. Experimental Design. Two rods of unknown lengths a, b. A rule canmeasure the length but with but with error having 0 mean (unbiased) and variance 2.

    Errors independent from measurement to measurement. To estimate a, b we could takeseparate measurements A, B of each rod.

    E[A] = a Var A = 2

    E[B] = b Var B = 2

  • 8/14/2019 Prof[1] F P Kelly - Probability

    27/78

    3.5. INDEPENDENCE 21

    Can we do better? YEP! Measure a + b as X anda b as Y

    E[X] = a + b Var X = 2

    E[Y] = a b Var Y = 2

    E

    X+ Y

    2

    = a

    VarX+ Y

    2=

    1

    22

    E

    X Y

    2

    = b

    VarX Y

    2=

    1

    22

    So this is better.

    Example. Non standard dice. You choose 1 then I choose one. Around this cycle

    a B P(A B) = 23

    . So the relation A better that B is not transitive.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    28/78

    22 CHAPTER 3. RANDOM VARIABLES

  • 8/14/2019 Prof[1] F P Kelly - Probability

    29/78

    Chapter 4

    Inequalities

    4.1 Jensens Inequality

    A function f; (a, b) R is convex if

    f(px + qy) pf(x) + (1 p)f(y) - x, y (a, b) - p (0, 1)

    Strictly convex if strict inequality holds when x = y

    f is concave iff is convex. f is strictly concave iff is strictly convex

    23

  • 8/14/2019 Prof[1] F P Kelly - Probability

    30/78

    24 CHAPTER 4. INEQUALITIES

    Concave

    neither concave or convex.

    We know that if f is twice differentiable and f

    (x) 0 for x (a, b) the if f isconvex and strictly convex if f

    (x) 0 forx (a, b).

    Example.

    f(x) = log x

    f

    (x) = 1

    x

    f

    (x) =1

    x2 0

    f(x) is strictly convex on (0, )Example.

    f(x) = x log xf(x) = (1 + logx)

    f

    (x) =1x

    0

    Strictly concave.

    Example. f(x = x3 is strictly convex on (0, ) but not on (, )Theorem 4.1. Letf : (a, b) R be a convex function. Then:

    ni=1

    pif(xi) f

    ni=1

    pixi

    x1, . . . , X n (a, b), p1, . . . , pn (0, 1) and

    pi = 1. Further more if f is strictlyconvex then equality holds if and only if all xs are equal.

    E[f(X)] f(E[X])

  • 8/14/2019 Prof[1] F P Kelly - Probability

    31/78

    4.1. JENSENS INEQUALITY 25

    Proof. By induction on n n = 1 nothing to prove n = 2 definition of convexity.

    Assume results holds up to n-1. Consider x1,...,xn (a, b), p1,...,pn (0, 1) andpi = 1For i = 2...n, set p

    i =pi

    1 pi , such thatn

    i=2

    p

    i = 1

    Then by the inductive hypothesis twice, first for n-1, then for 2

    n1

    pifi(xi) = p1f(x1) + (1 p1)n

    i=2

    p

    if(xi)

    p1f(x1) + (1 p1)f

    ni=2

    p

    ixi

    fp1x1 + (1 p1)n

    i=2pixi

    = f

    n

    i=1

    pixi

    f is strictly convex n 3 and not all the xis equal then we assume not all of x2...xnare equal. But then

    (1 pj)n

    i=2

    p

    if(xi) (1 pj)f

    ni=2

    p

    ixi

    So the inequality is strict.

    Corollary (AM/GM Inequality). Positive real numbers x1, . . . , xnn

    i=1

    xi

    1n

    1n

    ni=1

    xi

    Equality holds if and only ifx1 = x2 = = xnProof. Let

    P(X = xi) =1

    n

    then f(x) = log x is a convex function on (0, ).So

    E[f(x)] f(E[x]) (Jensens Inequality)E[log x] logE[x] [1]

    Therefore 1n

    n1

    log xi log 1n

    n1

    x

    n

    i=1

    xi

    1n

    1n

    ni=1

    xi [2]

    For strictness since f strictly convex equation holds in [1] and hence [2] if and only if

    x1 = x2 = = xn

  • 8/14/2019 Prof[1] F P Kelly - Probability

    32/78

    26 CHAPTER 4. INEQUALITIES

    If f : (a, b) R is a convex function then it can be shown that at each pointy (a, b) a linear function y + yx such that

    f(x) y + yx x (a, b)f(y) = y + yy

    If f is differentiable at y then the linear function is the tangent f(y) + (x y)f(y)

    Let y = E[X], = y and = y

    f(E[X]) = + E[X]

    So for any random variable X taking values in (a, b)

    E[f(X)] E[ + X]= + E[X]

    = f(E[X])

    4.2 Cauchy-Schwarz Inequality

    Theorem 4.2. For any random variables X, Y,

    E[XY ]2 EX2EY2Proof. For a, b R Let

    LetZ = aX bYThen0 EZ2 = E(aX bY)2

    = a2E

    X2 2abE[XY ] + b2EY2

    quadratic in a with at most one real root and therefore has discriminant 0.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    33/78

    4.3. MARKOVS INEQUALITY 27

    Take b = 0E[XY ]2 EX2EY2

    Corollary.

    |Corr(X, Y)| 1

    Proof. Apply Cauchy-Schwarz to the random variables X E[X] and Y E[Y]

    4.3 Markovs Inequality

    Theorem 4.3. If X is any random variable with finite mean then,

    P(|X| a) E[|X|]a

    for any a 0

    Proof. Let

    A =

    |X

    | a

    Then |X| aI[A]

    Take expectation

    E[|X|] aP(A)E[|X|] aP(|X| a)

    4.4 Chebyshevs Inequality

    Theorem 4.4. Let X be a random variable with E

    X2 . Then 0

    P(|X| ) E

    X2

    2

    Proof.

    I[|X| ] x2

    2x

  • 8/14/2019 Prof[1] F P Kelly - Probability

    34/78

    28 CHAPTER 4. INEQUALITIES

    Then

    I[|X| ] x2

    2

    Take Expectation

    P(|X| ) E

    x2

    2

    =E

    X2

    2

    Note

    1. The result is distribution free - no assumption about the distribution of X (other

    thanE

    X2 ).

    2. It is the best possible inequality, in the following sense

    X = + with probabilityc

    22

    = with probability c22

    = 0 with probability 1 c2

    Then P(|X| ) = c2

    EX2

    = cP(|X| ) = c

    2= E

    X2

    2

    3. If = E[X] then applying the inequality to X gives

    P(X ) Var X2

    Often the most useful form.

    4.5 Law of Large Numbers

    Theorem 4.5 (Weak law of large numbers). LetX1, X2..... be a sequences of inde-pendent identically distributed random variables with Variance 2 Let

    Sn =n

    i=1

    Xi

    Then

    0, P

    Snn

    0 as n

  • 8/14/2019 Prof[1] F P Kelly - Probability

    35/78

    4.5. LAW OF LARGE NUMBERS 29

    Proof. By Chebyshevs Inequality

    P

    Snn

    E

    (Snn )2

    2

    =E

    (Sn n)2

    n22properties of expectation

    =Var Sn

    n22Since E[Sn] = n

    But Var Sn = n2

    Thus P

    Snn

    n

    2

    n22=

    2

    n2 0

    Example. A1, A2... are independent events, each with probability p. Let Xi = I[Ai].Then

    Snn

    =nA

    n=

    number of times A occurs

    number of trials

    = E[I[Ai]] = P(Ai) = p

    Theorem states that

    P

    Snn p

    0 as n

    Which recovers the intuitive definition of probability.

    Example. A Random Sample of size n is a sequence X1, X2, . . . , X n of independent

    identically distributed random variables (n observations)

    X =

    ni=1 Xi

    nis called the SAMPLE MEAN

    Theorem states that provided the variance ofXi is finite, the probability that the samplemean differs from the mean of the distribution by more than approaches 0 as n .

    We have shown the weak law of large numbers. Why weak? a strong form oflarger numbers.

    P

    Snn

    as n

    = 1

    This is NOT the same as the weak form. What does this mean?

    determines Snn

    , n = 1, 2, . . .

    as a sequence of real numbers. Hence it either tends to or it doesnt.

    P

    :

    Sn()

    n as n

    = 1

  • 8/14/2019 Prof[1] F P Kelly - Probability

    36/78

    30 CHAPTER 4. INEQUALITIES

  • 8/14/2019 Prof[1] F P Kelly - Probability

    37/78

    Chapter 5

    Generating Functions

    In this chapter, assume that X is a random variable taking values in the range 0, 1, 2, . . ..Let pr = P(X = r) r = 0, 1, 2, . . .

    Definition 5.1. The Probability Generating Function (p.g.f) of the random variable

    X,or of the distribution pr = 0, 1, 2, . . ., is

    p(z) = E

    zX

    =r=0

    zrP(X = r) =r=0

    przr

    This p(z) is a polynomial or a power series. If a power series then it is convergent for|z| 1 by comparison with a geometric series.

    |p(z)| r

    pr |z|r

    r

    pr = 1

    Example.

    pr =1

    6r = 1, . . . , 6

    p(z) = E

    zX

    =1

    6

    1 + z + . . . z6

    =

    z

    6

    1 z61 z

    Theorem 5.1. The distribution of X is uniquely determined by the p.g.f p(z).

    Proof. We know that we can differential p(z) term by term for |z| 1p(z) = p1 + 2p2z + . . .

    and so p(0) = p1 (p(0) = p0)

    Repeated differentiation gives

    p(i)(z) =r=i

    r!

    (r i)!przri

    and has p(i) = 0 = i!pi Thus we can recover p0, p1, . . . from p(z)

    31

  • 8/14/2019 Prof[1] F P Kelly - Probability

    38/78

    32 CHAPTER 5. GENERATING FUNCTIONS

    Theorem 5.2 (Abels Lemma).

    E[X] = limz1p(z)

    Proof.

    p(z) =r=i

    rprzr1 |z| 1

    For z (0, 1), p(z) is a non decreasing function of z and is bounded above by

    E[X] =r=i

    rpr

    Choose 0, N large enough thatNr=i

    rpr E[X]

    Then

    limz1

    r=i

    rprzr1 lim

    z1

    Nr=i

    rprzr1 =

    Nr=i

    rpr

    True 0 and soE[X] = lim

    z1p(z)

    Usually p(z) is continuous at z=1, then E[X] = p(1).

    Recall p(z) = z

    61 z

    6

    1 z

    Theorem 5.3.

    E[X(X 1)] = limz1

    p(z)

    Proof.

    p(z) =r=2

    r(r 1)pzr2

    Proof now the same as Abels Lemma

    Theorem 5.4. Suppose that X1, X2, . . . , X n are independent random variables withp.g.fs p1(z), p2(z), . . . , pn(z). Then the p.g.f of

    X1 + X2 + . . . X n

    is

    p1(z)p2(z) . . . pn(z)

    Proof.

    E

    zX1+X2+...Xn

    = E

    zX1 .zX2 . . . . zXn

    = E

    zX1E

    zX2

    . . .E

    zXn

    = p1(z)p2(z) . . . pn(z)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    39/78

    33

    Example. Suppose X has Poisson Distribution

    P(X = r) = e rr!

    r = 0, 1, . . .

    Then

    E

    zX

    =r=0

    zrer

    r!

    = eez

    = e(1z)

    Lets calculate the variance of X

    p

    = e(1z) p

    = 2e(1z)

    Then

    E[X] = limz1

    p(z) = p

    (1)( Since p

    (z) continuous atz = 1 )E[X] =

    E[X(X 1)] = p(1) = 2Var X = E

    X2

    E[X]2= E[X(X 1)] + E[X] E[X]2= 2 + 2=

    Example. Suppose that Y has a Poisson Distribution with parameter. If X and Y areindependent then:

    E

    zX+Y

    = E

    zXE

    zY

    = e(1z)e(1z)

    = e(+)(1z)

    But this is the p.g.f of a Poisson random variable with parameter + . By uniqueness(first theorem of the p.g.f) this must be the distribution for X + Y

    Example. X has a binomial Distribution,

    P(X = r) =

    n

    rpr(1 p)nr r = 0, 1, . . .

    E

    zX

    =n

    r=0

    n

    r

    pr(1 p)nrzr

    = (pz + 1 p)n

    This shows thatX = Y1 + Y2 + + Yn. Where Y1 + Y2 + + Yn are independentrandom variables each with

    P(Yi = 1) = p P(Yi = 0) = 1 pNote if the p.g.f factorizes look to see if the random variable can be written as a sum.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    40/78

    34 CHAPTER 5. GENERATING FUNCTIONS

    5.1 Combinatorial Applications

    Tile a (2 n) bathroom with (2 1) tiles. How many ways can this be done? Say fnfn = fn1 + fn2 f0 = f1 = 1

    Let

    F(z) =n=0

    fnzn

    fnzn = fn1zn + fn2zn

    n=2

    fnzn =

    n=2

    fn1zn +n=0

    fn2zn

    F(z) f0 zf1 = z(F(z) f0) + z2F(z)F(z)(1 z z2) = f0(1 z) + zf1

    = 1 z + z = 1.Since f0 = f1 = 1, then F(z) =

    11zz2

    Let

    1 =1 +

    5

    22 =

    1 52

    F(z) =1

    (1 1z)(1 2z)=

    1

    (1 1z) 2

    (1 2z)=

    1

    1 2

    1

    n=0

    n1 zn 2

    n=0

    n2zn

    The coefficient ofzn1 , that is fn, is

    fn =1

    1 2 (n+11 n+12 )

    5.2 Conditional Expectation

    Let X and Y be random variables with joint distribution

    P(X = x, Y = y)

    Then the distribution of X is

    P(X = x) =yRy

    P(X = x, Y = y)

    This is often called the Marginal distribution for X. The conditional distribution for Xgiven by Y = y is

    P(X = x|Y = y) = P(X = x, Y = y)P(Y = y)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    41/78

    5.3. PROPERTIES OF CONDITIONAL EXPECTATION 35

    Definition 5.2. The conditional expectation of X given Y = y is,

    E[X = x|Y = y] =xRx

    xP(X = x|Y = y)

    The conditional Expectation of X given Y is the random variable E[X|Y] defined by

    E[X|Y] () = E[X|Y = Y()]

    Thus E[X|Y] : R

    Example. Let X1, X2, . . . , X n be independent identically distributed random vari-ables with P(X1 = 1) = p andP(X1 = 0) = 1 p. Let

    Y = X1 + X2 + + Xn

    Then

    P(X1 = 1|Y = r) = P(X1 = 1, Y = r)P(Y = r)

    =P(X1 = 1, X2 + + Xn = r 1)

    P(Y = r)

    =P(X1)P(X2 + + Xn = r 1)

    P(Y = r)

    =p

    n1r1

    pr1(1 p)nrnrpr

    (1 p)nr

    =

    n1r1

    nr

    =

    r

    n

    Then

    E[X1|Y = r] = 0 P(X1 = 0|Y = r) + 1 P(X1 = 1|Y = r)=

    r

    n

    E[X1|Y = Y()] =

    1

    n

    Y()

    Therefore E[X1|Y] = 1n

    Y

    Note a random variable - a function of Y.

    5.3 Properties of Conditional Expectation

    Theorem 5.5.

    E[E[X|Y]] = E[X]

  • 8/14/2019 Prof[1] F P Kelly - Probability

    42/78

    36 CHAPTER 5. GENERATING FUNCTIONS

    Proof.

    E[E[X|Y]] =yRy

    P(Y = y)E[X|Y = y]

    =y

    P(Y = y)xRx

    P(X = x|Y = y)

    =y

    x

    xP(X = x|Y = y)

    = E[X]

    Theorem 5.6. IfX andY are independent then

    E[X|Y] = E[X]

    Proof. IfX and Y are independent then for any y Ry

    E[X|Y = y] =xRx

    xP(X = x|Y = y) =x

    xP(X = x) = E[X]

    Example. Let X1, X2, . . . be i.i.d.r.vs with p.g.f p(z). Let N be a random variableindependent ofX1, X2, . . . with p.g.fh(z). What is the p.g.f of:

    X1 + X2 + + XN

    E

    zX1+,...,Xn

    = EE

    zX1+,...,Xn |N=

    n=0

    P(N = n)E

    zX1+,...,Xn |N = n

    =n=0

    P(N = n) (p(z))n

    = h(p(z))

    Then for example

    E[X1+, . . . , X n] =d

    dzh(p(z))

    z=1

    = h(1)p

    (1) = E[N]E[X1]

    Exercise Calculate d2

    dz2 h(p(z)) and hence

    Var X1+, . . . , X n

    In terms ofVar N and Var X1

  • 8/14/2019 Prof[1] F P Kelly - Probability

    43/78

    5.4. BRANCHING PROCESSES 37

    5.4 Branching Processes

    X0, X1 . . . sequence of random variables. Xn number of individuals in the nth gener-

    ation of population. Assume.

    1. X0 = 1

    2. Each individual lives for unit time then on death produces k offspring, probabil-ity fk. fk = 1

    3. All offspring behave independently.

    Xn+1 = Yn1 + Y

    n2 + + Ynn

    Where Yni are i.i.d.r.vs. Yni number of offspring of individual i in generation n.

    Assume

    1. f0 02. f0 + f1 1

    Let F(z) be the probability generating function ofYni .

    F(z) =n=0

    fkzk = E

    zXi

    = E

    zY

    ni

    Let

    Fn(z) = E

    zXn

    Then F1(z) = F(z) the probability generating function of the offspring distribution.

    Theorem 5.7.

    Fn+1(z) = Fn(F(z)) = F(F(. . . (F(z)) . . . ))

    Fn(z) is an n-fold iterative formula.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    44/78

    38 CHAPTER 5. GENERATING FUNCTIONS

    Proof.

    Fn+1(z) = E

    zXn+1

    = EE

    zXn+1 |Xn

    =

    n=0

    P(Xn = k)E

    zXn+1 |Xn = k

    =n=0

    P(Xn = k)E

    zYn1 +Y

    n2 ++Ynn

    =

    n=0

    P(Xn = k)E

    zYn1

    . . .E

    zY

    nn

    =

    n=0

    P(Xn = k) (F(z))k

    = Fn(F(z))

    Theorem 5.8. Mean and Variance of population size

    Ifm =k=0

    kfk

    and2 =

    k=0

    (k

    m)2fk

    Mean and Variance of offspring distribution.

    Then E[Xn] = mn

    Var Xn =

    2mn1(mn1)

    m1 , m = 1n2, m = 1

    (5.1)

    Proof. Prove by calculating F(z), F

    (z) Alternatively

    E[Xn] = E[E[Xn|Xn1]]= E[m

    |Xn

    1]

    = mE[Xn1]= mn by induction

    E

    (Xn mXn1)2

    = EE

    (Xn mXn1)2|Xn

    = E[Var (Xn|Xn1)]= E

    2Xn1

    = 2mn1

    Thus

    E

    X2n 2mE[XnXn1] + m2EX2n12 = 2mn1

  • 8/14/2019 Prof[1] F P Kelly - Probability

    45/78

    5.4. BRANCHING PROCESSES 39

    Now calculate

    E[XnXn1] = E[E[XnXn1|Xn1]]= E[Xn1E[Xn|Xn1]]= E[Xn1mXn1]

    = mE

    X2n1

    Then E

    X2n

    = 2mn1 + m2E[Xn1]2

    Var Xn = E

    X2n E[Xn]2

    = m2E

    X2n1

    + 2mn1 m2E[Xn1]2= m2 Var Xn1 + 2mn1

    = m4 Var Xn2 +

    2(mn1 + mn)

    = m2(n1) Var X1 + 2(mn1 + mn + + m2n3)= 2mn1(1 + m + + mn)

    To deal with extinction we need to be careful with limits as n . Let

    An = Xn = 0

    = Extinction occurs by generation n

    and let A =

    1

    An

    = the event that extinction ever occurs

    Can we calculate P(A) from P(An)?

    More generally let An be an increasing sequence

    A1 A2 . . .

    and define

    A = limnAn =

    1 An

    Define Bn for n 1

    B1 = A1

    Bn = An

    n1i=1

    Ai

    c

    = An Acn1

  • 8/14/2019 Prof[1] F P Kelly - Probability

    46/78

    40 CHAPTER 5. GENERATING FUNCTIONS

    Bn for n 1 are disjoint events andi=1

    Ai =i=1

    Bi

    ni=1

    Ai =n

    i=1

    Bi

    P

    i=1

    Ai

    = P

    i=1

    Bi

    =1

    P(Bi)

    = limn

    n1P(Bi)

    = limn

    ni=1

    Bi

    = limn

    ni=1

    Ai

    = limnP(An)

    Thus

    P

    limn

    An

    = lim

    nP(An)

    Probability is a continuous set function. Thus

    P(extinction ever occurs) = limn

    P(An)

    = limnP(Xn = 0)

    = q, Say

    Note P(Xn = 0), n = 1, 2, 3, . . . is an increasing sequence so limit exists. But

    P(Xn = 0) = Fn(0) Fn is the p.g.f ofXn

    So

    q = limnFn(0)Also

    F(q) = F

    limn

    Fn(0)

    = limnF(Fn(0))

    Since F is continuous

    = limnFn+1(0)

    Thus F(q) = q

    q is called the Extinction Probability.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    47/78

    5.4. BRANCHING PROCESSES 41

    Alternative Derivation

    q =k

    P(X1 = k)P(extinction|X1 = k)

    =

    P(X1 = k) qk

    = F(q)

    Theorem 5.9. The probability of extinction, q, is the smallest positive root of the equa-tion F(q) = q. m is the mean of the offspring distribution.

    Ifm 1 then q = 1, while ifm 1thenq 1

    Proof.

    F(1) = 1 m =0

    kf

    k = limz1

    F(z)

    F

    (z) =j=z

    j(j 1)zj2 in 0 z 1 Since f0 + f1 1 Also F(0) = f0 0

    Thus ifm 1, there does not exists a q (0, 1) with F(q) = q. If m 1 then let

    be the smallest positive root ofF(z) = z then 1. Further,

    F(0) F() = F(F(0)) F() =

    Fn(0) n 1q = lim

    nFn(0) 0q = Since q is a root ofF(z) = z

  • 8/14/2019 Prof[1] F P Kelly - Probability

    48/78

    42 CHAPTER 5. GENERATING FUNCTIONS

    5.5 Random Walks

    Let X1, X2, . . . be i.i.d.r.vs. Let

    Sn = S0 + X1 + X2 + + Xn Where, usually S0 = 0

    Then Sn (n = 0, 1, 2, . . . is a 1 dimensional Random Walk.

    We shall assume

    Xn =

    1, with probability p

    1, with probability q (5.2)

    This is a simple random walk. If p = q = 12

    then the random walk is called symmetric

    Example (Gamblers Ruin). You have an initial fortune of A and I have an initial fortune of B. We toss coins repeatedly I win with probability p and you win withprobability q. What is the probability that I bankrupt you before you bankrupt me?

  • 8/14/2019 Prof[1] F P Kelly - Probability

    49/78

    5.5. RANDOM WALKS 43

    Seta = A + B andz = B Stop a random walk starting at z when it hits 0 ora.

    Letpz be the probability that the random walk hits a before it hits 0, starting fromz. Letqz be the probability that the random walk hits 0 before it hits a , starting fromz. After the first step the gamblers fortune is eitherz 1 or z + 1 with prob p andqrespectively. From the law of total probability.

    pz = qpz1 +ppz+1 0 z a

    Also p0 = 0 andpa = 1. Must solve pt

    2

    t + q = 0.t =

    1 1 4pq2p

    =1 1 2p

    2p= 1 or

    q

    p

    General Solution forp = q is

    pz = A + B

    q

    p

    zA + B = 0A =

    1

    1 qp

    aand so

    pz =1

    qp

    z

    1 qpaIfp = q, the general solution is A + Bz

    pz =z

    a

    To calculate qz, observe that this is the same problem withp,q,z replaced byp,q,azrespectively. Thus

    qz =

    qp

    aqp

    zqp

    a 1

    ifp = q

  • 8/14/2019 Prof[1] F P Kelly - Probability

    50/78

    44 CHAPTER 5. GENERATING FUNCTIONS

    or

    qz =

    a

    z

    z ifp = qThus qz +pz = 1 and so on, as we expected, the game ends with probability one.

    P(hits 0 before a) = qz

    qz =

    qp

    a ( qp )z

    qp

    a 1

    ifp = q

    Or =a z

    zifp = q

    What happens as a ?

    P( paths hit0 ever) =

    a=z+1

    path hits 0 before it hits a

    P(hits 0 ever) = limaP(hits 0 before a)

    = lima qz

    =

    q

    p

    p q

    = 1 p = q

    LetG be the ultimate gain or loss.

    G = a z, with probability pzz, with probability qz (5.3)

    E[G] =

    apz z, ifp = q0, ifp = q

    (5.4)

    Fair game remains fair if the coin is fair then then games based on it have expected

    reward0.

    Duration of a Game Let Dz be the expected time until the random walk hits 0or a, starting from z. Is Dz finite? Dz is bounded above by x the mean of geometricrandom variables (number of windows of size a before a window with all +1s or1s). Hence Dz is finite. Consider the first step. Then

    Dz = 1 +pDz+1 + qDz1E[duration] = E[E[duration | first step]]

    = p (E[duration | first step up]) + q (E[duration | first step down])= p(1 + Dz+1) + q(1 + Dz1)

    Equation holds for 0 z a with D0 = Da = 0. Lets try for a particular solutionDz = Cz

    Cz = Cp(z + 1) + Cq(z 1) + 1C =

    1

    q p for p = q

  • 8/14/2019 Prof[1] F P Kelly - Probability

    51/78

    5.5. RANDOM WALKS 45

    Consider the homogeneous relation

    pt2 t + q = 0 t1 = 1 t2 = qp

    General Solution for p = q is

    Dz = A + B

    q

    p

    z+

    z

    q = p

    Substitute z = 0, a to get A and B

    Dz =z

    q p a

    q p1

    qp

    z1

    qp

    a p = q

    Ifp = q then a particular solution is z2. General solutionDz z2 + A + Bz

    Substituting the boundary conditions given.,

    Dz = z(a z) p = qExample. Initial Capital.

    p q z a P(ruin) E[gain] E[duration]0.5 0.5 90 100 0.1 0 900

    0.45 0.55 9 10 0.21 -1.1 11

    0.45 0.55 90 100 0.87 -77 766

    Stop the random walk when it hits 0 ora.We have absorption at0 ora. Let

    Uz,n = P(r.w. hits 0 at time nstarts at z)

    Uz,n+1 = pUz+1,n + qUz1,n 0 z a n 0U0,n = Ua,n = 0 n 0

    Ua,0 = 1Uz,0 = 0 0 z a

    LetUz =

    n=0

    Uz,nsn.

    Now multiply by sn+1 and add forn = 0, 1, 2 . . .

    Uz(s) = psUz+1(s) + qsUz1(s)Where U0(s) = 1 andUa(s) = 0

    Look for a solution

    Ux(s) = ((s))z (s)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    52/78

    46 CHAPTER 5. GENERATING FUNCTIONS

    Must satisfy

    (s) = ps (((s))

    2

    + qsTwo Roots,

    1(s), 2(s) =1

    1 4pqs22ps

    Every Solution of the form

    Uz(s) = A(s) (1(s))z + B(s) (2(s))

    z

    Substitute U0(s) = 1 andUa(s) = 0.A(s) + B(s) = 1 and

    A(s) (1(s))a

    + B(s) (2(s))a

    = 0

    Uz(s) =(1(s))a (2(s))z (1(s))z (2(s))a

    (1(s))a (2(s))a

    But1(s)2(s) =q

    precall quadratic

    Uz(s) =

    q

    p

    (1(s))

    az (2(s))az(1(s))

    a (2(s))a

    Same method give generating function for absorption probabilities at the other barrier.

    Generating function for the duration of the game is the sum of these two generating

    functions.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    53/78

    Chapter 6

    Continuous Random Variables

    In this chapter we drop the assumption that id finite or countable. Assume we aregiven a probability p on some subset of.

    For example, spin a pointer, and let give the position at which it stops, with = : 0 2. Let

    P( [0, ]) = 2

    (0 2)Definition 6.1. A continuous random variable X is a function X : Rfor which

    P(a X() b) =ba

    f(x)dx

    Where f(x) is a function satisfying

    1. f(x) 02.

    + f(x)dx = 1

    The function f is called the Probability Density Function.

    For example, if X() = given position of the pointer then x is a continuousrandom variable with p.d.f

    f(x) =

    12

    , (0 x 2)0, otherwise

    (6.1)

    This is an example of a uniformly distributed random variable. On the interval [0, 2]

    47

  • 8/14/2019 Prof[1] F P Kelly - Probability

    54/78

    48 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    in this case. Intuition about probability density functions is based on the approximate

    relation.P(X [x, x + xx]) =

    x+xx

    x

    f(z)dz

    Proofs however more often use the distribution function

    F(x) = P(X x)F(x) is increasing in x.

    IfX is a continuous random variable then

    F(x) =

    x

    f(z)dz

    and so F is continuous and differentiable.

    F(x) = f(x)

    (At any point x where then fundamental theorem of calculus applies).

    The distribution function is also defined for a discrete random variable,

    F(x) =

    :X()xp

    and so F is a step function.

    In either case

    P(a X b) = P(X b) P(X a) = F(b) F(a)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    55/78

    49

    Example. The exponential distribution. Let

    F(x) =

    1 ex, 0 x 0, x 0 (6.2)

    The corresponding pdf is

    f(x) = ex 0 x

    this is known as the exponential distribution with parameter . IfX has this distribu-tion then

    P(X x + z|X z) = P(X x + z)P(X z)

    = e(x+z)ez

    = ex

    = P(X x)

    This is known as the memoryless property of the exponential distribution.

    Theorem 6.1. IfXis a continuous random variable with pdff(x) andh(x) is a contin-uous strictly increasing function with h1(x) differentiable then h(x) is a continuousrandom variable with pdf

    fh(x) = fh1(x) ddx

    h1(x)

    Proof. The distribution function ofh(X) is

    P(h(X) x) = PX h1(x) = Fh1(x)Since h is strictly increasing and F is the distribution function of X Then.

    d

    dxP(h(X) x)

    is a continuous random variable with pdf as claimed fh. Note usually need to repeatproof than remember the result.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    56/78

    50 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    Example. Suppose X U[0, 1] that is it is uniformly distributed on [0, 1] ConsiderY = log x

    P(Y y) = P( log X y)= P

    X eY

    =

    1eY

    1dx

    = 1 eY

    Thus Y is exponentially distributed.

    More generally

    Theorem 6.2. LetU

    U[0, 1]. For any continuous distribution function F, the randomvariable X defined by X = F1(u) has distribution function F.

    Proof.

    P(X x) = PF1(u) x= P(U F(x))= F(x) U[0, 1]

    Remark

    1. a bit more messy for discrete random variables

    P(X = Xi) = pi i = 0, 1, . . .

    Let

    X = xj if

    j1i=0

    pi U j

    i=0

    pi U U[0, 1]

    2. useful for simulations

    6.1 Jointly Distributed Random Variables

    For two random variables X and Y the joint distribution function is

    F(x, y) = P(X x, Y y) F : R2 [0, 1]

    Let

    FX(x) = P(Xz x)= P(X x, Y )= F(x, )= lim

    yF(x, y)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    57/78

    6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 51

    This is called the marginal distribution of X. Similarly

    FY(x) = F(, y)

    X1, X2, . . . , X n are jointly distributed continuous random variables if for a set c Rb

    P((X1, X2, . . . , X n) c) =

    . . .

    (x1,...,xn)c

    f(x1, . . . , xn)dx1 . . . d xn

    For some function f called the joint probability density function satisfying the obvious

    conditions.

    1.

    f(x1, . . . , xn)dx1 0

    2. . . .

    Rn

    f(x1, . . . , xn)dx1 . . . d xn = 1

    Example. (n = 2)

    F(x, y) = P(X x, Y y)=

    x

    y

    f(u, v)dudv

    and so f(x, y) =2F(x, y)

    xy

    Theorem 6.3. provided defined at (x, y). If X and y are jointly continuous randomvariables then they are individually continuous.

    Proof. Since X and Y are jointly continuous random variables

    P(X A) = P(X A, Y (, +))=

    A

    f(x, y)dxdy

    = fAfX(x)dx

    where fX(x) =

    f(x, y)dy

    is the pdf ofX.

    Jointly continuous random variables X and Y are Independentif

    f(x, y) = fX(x)fY(y)

    Then P(X A, Y B) = P(X A)P(Y B)

    Similarly jointly continuous random variables X1, . . . , X n are independent if

    f(x1, . . . , xn) =

    ni=1

    fXi(xi)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    58/78

    52 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    Where fXi(xi) are the pdfs of the individual random variables.

    Example. Two points X and Y are tossed at random and independently onto a linesegment of length L. What is the probability that:

    |X Y| l?

    Suppose that at random means uniformly so that

    f(x, y) =1

    L2x, y [0, L]2

  • 8/14/2019 Prof[1] F P Kelly - Probability

    59/78

    6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 53

    Desired probability

    =

    A

    f(x, y)dxdy

    =area of A

    L2

    =L2 212(L l)2

    L2

    =2Ll l2

    L2

    Example (Buffons Needle Problem). A needle of length l is tossed at random onto a

    floor marked with parallel lines a distance L apart l L. What is the probability thatthe needle intersects one of the parallel lines.

    Let [0, 2] be the angle between the needle and the parallel lines and let x bethe distance from the bottom of the needle to the line closest to it. It is reasonable to

    suppose that X is distributed Uniformly.

    X U[0, L] U[0, )

    andX and are independent. Thus

    f(x, ) =1

    l0 x L and0

  • 8/14/2019 Prof[1] F P Kelly - Probability

    60/78

    54 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    The needle intersects the line if and only if X sin The event A

    =

    A

    f(x, )dxd

    = l

    0

    sin

    Ld

    =2l

    L

    Definition 6.2. The expectation or mean of a continuous random variable X is

    E[X] =

    xf(x)dx

    provided not both of xf(x)dx and0

    xf(x)dx are infinite

    Example (Normal Distribution). Let

    f(x) =12

    e(x)2

    22 x

    This is non-negative for it to be a pdf we also need to check that

    f(x)dx = 1

    Make the substitution z = x . Then

    I =1

    2

    e(x)2

    22 dx

    =12

    ez22 dz

    Thus I2 =1

    2

    ex22 dx

    ey22 dy

    =1

    2

    e(y2+x2)

    2 dxdy

    =1

    2

    20

    0

    re22 drd

    = 2

    0

    d = 1

    Therefore I = 1. A random variable with the pdf f(x) given above has a Normaldistribution with parameters and2 we write this as

    X N[, 2]The Expectation is

    E[X] =12

    xe(x)2

    22 dx

    =12

    (x )e(x)2

    22 dx +12

    e(x)2

    22 dx.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    61/78

    6.1. JOINTLY DISTRIBUTED RANDOM VARIABLES 55

    The first term is convergent and equals zero by symmetry, so that

    E[X] = 0 +

    =

    Theorem 6.4. IfX is a continuous random variable then,

    E[X] =

    0

    P(X x) dx 0

    P(X x) dx

    Proof. 0

    P(X x) dx =0

    x

    f(y)dy

    dx

    = 0

    0

    I[y

    x]f(y)dydx

    =

    0

    y0

    dxf(y)dy

    =

    0

    yf(y)dy

    Similarly

    0

    P(X x) dx =0

    yf(y)dy

    result follows.

    Note This holds for discrete random variables and is useful as a general way of

    finding the expectation whether the random variable is discrete or continuous.

    IfX takes values in the set [0, 1, . . . , ] Theorem states

    E[X] =n=0

    P(X n)

    and a direct proof follows

    n=0

    P(X n) =n=0

    m=0

    I[m n]P(X = m)

    =

    m=0

    n=0

    I[m n]P(X = m)

    =

    m=0

    mP(X = m)

    Theorem 6.5. Let X be a continuous random variable with pdf f(x) and let h(x) bea continuous real-valued function. Then provided

    |h(x)| f(x)dx

    E[h(x)] =

    h(x)f(x)dx

  • 8/14/2019 Prof[1] F P Kelly - Probability

    62/78

    56 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    Proof.

    0

    P(h(X) y) dy

    =

    0

    x:h(x)0

    f(x)dx

    dy

    =

    0

    x:h(x)0

    I[h(x) y]f(x)dxdy

    =

    x:h(x)

    h(x)00

    dy

    f(x)dx

    = x:h(x)0

    h(x)f(x)dy

    Similarly

    0

    P(h(X) y) = x:h(x)0

    h(x)f(x)dy

    So the result follows from the last theorem.

    Definition 6.3. The variance of a continuous random variable X is

    Var X = E

    (X E[X])2

    Note The properties of expectation and variance are the same for discrete and contin-uous random variables just replace

    with

    in the proofs.

    Example.

    Var X = E

    X2 E[X]2

    =

    x2f(x)dx

    xf(x)dx

    2

    Example. Suppose X N[, 2] Letz = X then

    P(Z z) = P

    X

    z

    = P(X + z)

    =

    +z

    12

    e(x)2

    22 dx

    Let

    u =

    x

    =

    z

    12

    eu22 du

    = (z) The distribution function of a N(0, 1) random variable

    Z N(0, 1)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    63/78

    6.2. TRANSFORMATION OF RANDOM VARIABLES 57

    What is the variance ofZ?

    Var X = E

    Z2 E[Z]2 Last term is zero

    =12

    z2ez22 dz

    =

    1

    2ze

    z22

    +

    ez22 dz

    = 0 + 1 = 1

    Var X = 1

    Variance ofX?

    X = + z

    Thus E[X] = we know that already

    Var X = 2 Var Z

    Var X = 2

    X (, 2)

    6.2 Transformation of Random Variables

    Suppose X1, X2, . . . , X n have joint pdff(x1, . . . , xn) let

    Y1 = r1(X1, X2, . . . , X n)

    Y2 = r2(X1, X2, . . . , X n)

    ...

    Yn = rn(X1, X2, . . . , X n)

    Let R Rn be such that

    P((X1, X2, . . . , X n) R) = 1

    Let S be the image of R under the above transformation suppose the transformationfrom R to S is 1-1 (bijective).

  • 8/14/2019 Prof[1] F P Kelly - Probability

    64/78

    58 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    Then inverse functions

    x1 = s1(y1, y2, . . . , yn)

    x2 = s2(y1, y2, . . . , yn) . . .

    xn = sn(y1, y2, . . . , yn)

    Assume that siyj

    exists and is continuous at every point (y1, y2, . . . , yn) in S

    J =

    s1y1

    . . . s1yn...

    . . ....

    sn

    y1 . . .

    sn

    yn

    (6.3)

    IfA R

    P((X1, . . . , X n) A) [1] =

    A

    f(x1, . . . , xn)dx1 . . . d xn

    =

    B

    f(s1, . . . , sn) |J| dy1 . . . d yn

    Where B is the image of A

    = P((Y1, . . . , Y n) B)[2]

    Since transformation is 1-1 then [1],[2] are the same

    Thus the density for Y1, . . . , Y n is

    g((y1, y2, . . . , yn) = f(s1(y1, y2, . . . , yn), . . . , sn(y1, y2, . . . , yn)) |J|

    y1, y2, . . . , yn S= 0 otherwise.

    Example (density of products and quotients). Suppose that(X, Y) has density

    f(x, y) =

    4xy, for0 x 1, 0 y 10, Otherwise.

    (6.4)

    LetU = XY andV = XY

  • 8/14/2019 Prof[1] F P Kelly - Probability

    65/78

    6.2. TRANSFORMATION OF RANDOM VARIABLES 59

    X =

    U V Y =

    V

    U

    x =

    uv y =

    v

    u

    x

    u=

    1

    2

    v

    u

    x

    v=

    1

    2

    u

    v

    y

    u=

    12

    v12

    u32

    y

    v=

    1

    2

    uv.

    Therefore |J| = 12u and so

    g(u, v) =1

    2u(4xy)

    =1

    2u 4uv

    v

    u

    = 2u

    vif(u, v) D

    = 0 Otherwise.

    Note U and V are NOT independent

    g(u, v) = 2u

    vI[(u, v) D]

    not product of the two identities.

    When the transformations are linear things are simpler still. Let A be the n ninvertible matrix.

    Y1

    ...

    Yn

    = A

    X1...

    Xn

    .

    |J| = det A1 = det A1

    Then the pdf of(Y1, . . . , Y n) is

    g(y1, . . . ,n ) =

    1

    det A f(A1

    g)

    Example. Suppose X1, X2 have the pdff(x1, x2). Calculate the pdf ofX1 + X2.LetY = X1 + X2 andZ = X2. Then X1 = Y Z andX2 = Z.

    A1 =

    1 10 1

    (6.5)

    det A1 = 11

    det A

    Then

    g(y, z) = f(x1, x2) = f(y z, z)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    66/78

    60 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    joint distributions ofY andX.

    Marginal density of Y is

    g(y) =

    f(y z, z)dz y

    org(y) =

    f(z, y z)dz By change of variable

    IfX1 andX2 are independent, with pgfs f1 andf2 then

    f(x1, x2) = f(x1)f(x2)

    and then g(y) =

    f(y z)f(z)dz

    - the convolution of f1 andf2

    For the pdf f(x) x is a mode iff(x) f(x)xx is a median if x

    f(x)dx

    x

    f(x)dx =1

    2

    For a discrete random variable, x is a median if

    P(X x) 12

    orP(X x) 12

    IfX1, . . . , X n is a sample from the distribution then recall that the sample mean is

    1

    n

    n

    1

    Xi

    LetY1, . . . , Y n (the statistics) be the values of X1, . . . , X n arranged in increasingorder. Then the sample median is Yn+1

    2ifn is odd or any value in

    Yn2

    , Yn+12

    if n is even

    IfYn = maxX1, . . . , X n andX1, . . . , X n are iidrvs with distribution F and den-sity f then,

    P(Yn y) = P(X1 y , . . . , X n y)= (F(y))n

  • 8/14/2019 Prof[1] F P Kelly - Probability

    67/78

  • 8/14/2019 Prof[1] F P Kelly - Probability

    68/78

    62 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    Z = AY

    Where

    A =

    1 0 0 . . . 0 01 1 0 . . . 0 00 1 1 . . . 0 0...

    .... . .

    ......

    0 0 . . . 1 1

    (6.7)

    det(A) = 1

    h(z1, . . . , zn) = g(y1, . . . , yn)

    = n!f(y1) . . . f (yn)

    = n!ney1 . . . eyn

    = n!ne(y1++yn)

    = n!ne(z12z2++nzn)

    =n

    i=1

    ieizn+1i

    Thus h(z1, . . . , zn) is expressed as the product of n density functions and

    Zn+1i exp(i)

    exponentially distributed with parameteri, with z1, . . . , zn independent.

    Example. LetX andY be independentN(0.1) random variables. Let

    D = R2 = X2 + Y2

    then tan = YX then

    d = x2 + y2 and = arctany

    x

    |J| =

    2x 2yyx2

    1+( yx )2

    1x

    1+( yx )2

    = 2 (6.8)

  • 8/14/2019 Prof[1] F P Kelly - Probability

    69/78

  • 8/14/2019 Prof[1] F P Kelly - Probability

    70/78

    64 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    (2)The chord is perpendicular to a given diameter and the point of intersection is

    uniformly distributed over the diameter.

    a2 +

    3

    2

    2=

    32

    answer= 12

    (3) The foot of the perpendicular to the chord from the centre of the circle is uni-

    formly distributed over the diameter of the interior circle.

    interior circle has radius 12

    .

    answer =122

    12

    =1

    4

    6.3 Moment Generating Functions

    If X is a continuous random variable then the analogue of the pgf is the moment gen-

    erating function defined by

    m() = E

    ex

    for those such that m() is finite

    m() =

    exf(x)dx

    where f(x) is the pdf ofX.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    71/78

    6.3. MOMENT GENERATING FUNCTIONS 65

    Theorem 6.6. The moment generating function determines the distribution ofX, pro-

    videdm() is finite for some interval containing the origin.Proof. Not proved.

    Theorem 6.7. IfX andY are independent random variables with moment generatingfunction mx() andmy() then X+ Y has the moment generating function

    mx+y() = mx() my()Proof.

    E

    e(x+y)

    = E

    exey

    = Ee

    x

    Eey

    = mx()my()

    Theorem 6.8. The rth moment ofX ie the expected value of Xr, E[Xr], is the coeffi-cient of

    r

    r!of the series expansion of n().

    Proof. Sketch of....

    eX = 1 + X +2

    2!X2 + . . .

    EeX = 1 + E[X] +2

    2!

    EX2 + . . .

    Example. Recall X has an exponential distribution, parameter if it has a densityex for0 x .

    E

    eX

    =

    0

    exexdx

    =

    0

    e()xdx

    =

    = m() for

    E[X] = m(0) =

    ( )2=0

    = 1

    E

    X2

    =

    2

    ( )2=0

    =2

    2

    Thus

    Var X = E

    X2 E[X]2

    =2

    2 1

    2

  • 8/14/2019 Prof[1] F P Kelly - Probability

    72/78

  • 8/14/2019 Prof[1] F P Kelly - Probability

    73/78

    6.4. CENTRAL LIMIT THEOREM 67

    Proof. 1.

    E

    e(X+Y)

    = E

    eX

    E

    eY

    = e(1+12

    21

    2)e(2+12

    22

    2)

    = e(1+2)+12 (

    21+

    22)

    2

    which is the moment generating function for

    N[1 + 2, 21 +

    22]

    2.

    E

    e(aX)

    = E

    e(a)X

    = e1(a)+

    12

    21(a)

    2

    = e(a1)+12a

    2212

    which is the moment generating function of

    N[a1, a221]

    6.4 Central Limit Theorem

    X1, . . . , X n iidrvs, mean 0 and variance 2. Xi has density

    Var Xi = 2

    X1 + + Xn has Variance

    Var X1 + + Xn = n2

  • 8/14/2019 Prof[1] F P Kelly - Probability

    74/78

    68 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    X1++Xnn has Variance

    VarX1 + + Xn

    n=

    2

    n

    X1++Xnn

    has Variance

    VarX1 + + Xn

    n= 2

    Theorem 6.10. LetX1, . . . , X n be iidrvs with E[Xi] = andVar Xi = 2 .

    Sn =n1

    Xi

    Then (a, b) such that a b

    limnP

    a Sn n

    n b

    =

    ba

    12

    ez22 dz

    Which is the pdf of a N[0, 1] random variable.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    75/78

    6.4. CENTRAL LIMIT THEOREM 69

    Proof. Sketch of proof.....

    WLOG take = 0 and

    2

    = 1. we can replace Xi by

    Xi

    . mgf ofXi ismXi() = E

    eXi

    = 1 + E[Xi] +

    2

    2E

    X2i

    +3

    3!E

    X3i

    + . . .

    = 1 +2

    2+

    3

    3!E

    X3i

    + . . .

    The mgf of Snn

    Ee Sn

    n = Een(X1++Xn)

    = E

    e

    nX1

    . . .E

    enXn

    = E

    enX1n

    =

    mX1

    n

    n

    =

    1 +

    2

    2n+

    3E

    X3

    3!n32

    n= e

    2

    2 as n

    Which is the mgf ofN[0, 1] random variable.

    Note ifSn Bin[n, p] Xi = 1 with probability p and = 0 with probability (1p).Then Sn np

    npq N[0, 1]

    This is called the normal approximation the the binomial distribution. Applies as n with p constant. Earlier we discussed the Poisson approximation to the binomial.which applies when n and np is constant.Example. There are two competing airlines. n passengers each select 1 of the 2 plans

    at random. Number of passengers in plane one

    S Bin[n, 12

    ]

    Suppose each plane has s seats and let

    f(s) = P(S s)S np

    npq n[0, 1]

    f(s) = P

    S 1

    2n

    12

    n

    s 12

    n12

    n

    = 1

    2s nn

    therefore ifn = 1000 ands = 537 then f(s) = 0.01. Planes hold 1074 seats only 74in excess.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    76/78

    70 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    Example. An unknown fraction of the electorate, p, vote labour. It is desired to find p

    within an error no exceeding 0.005. How large should the sample be.Let the fraction of labour votes in the sample be p

    . We can never be certain (with-

    out complete enumeration), that

    p p 0.005. Instead choose n so that the eventp p 0.005 have probability 0.95.

    P

    p p 0.005 = P(|Sn np| 0.005n)= P

    |Sn np|npq

    0.005

    nn

    Choose n such that the probability is 0.95.

    1.961.96

    12

    ex22 dx = 2(1.96) 1

    We must choose n so that

    0.005

    nn

    1.96

    But we dont know p. Butpq 14

    with the worst case p = q = 12

    n 1.962

    0.00521

    4 40, 000

    If we replace 0.005 by 0.01 the n 10, 000 will be sufficient. And is we replace 0.005by 0.045 then n 475 will suffice.

    Note Answer does not depend upon the total population.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    77/78

    6.5. MULTIVARIATE NORMAL DISTRIBUTION 71

    6.5 Multivariate normal distribution

    Let x1, . . . , X n be iid N[0, 1] random variables with joint density g(x1, . . . xn)

    g(x1, . . . xn) =

    ni=1

    12

    ex2i2

    =1

    (2)n2

    e12

    Pni=1 x

    2i

    =1

    (2)n2

    e12 xx

    Write

    X =

    X1X2

    ...

    Xn

    and let z = + A X where A is an invertible matrix

    x = A1(x ). Density ofz

    f(z1, . . . , zn) =1

    (2)n2

    1

    det Ae12 (A

    1(z))T(A1(z))

    =1

    (2)n2 || 12

    e12 (z)T1(z)

    where AAT. This is the multivariate normal density

    z MV N[, ]

    Cov(zi, zj) = E[(zi i)(zj j)]But this is the (i, j) entry of

    E

    (z )(z )T = E(A X)(A X)T= AE

    XXT

    AT

    = AIAT = AAT = Covariance matrix

    If the covariance matrix of the MVN distribution is diagonal, then the components of

    the random vector z are independent since

    f(z1, . . . , zn) =n

    i=1

    1

    (2)12 i

    e12

    ziii

    2

    Where

    =

    21 0 . . . 00 22 . . . 0...

    .... . .

    ...

    0 0 . . . 2n

    Not necessarily true if the distribution is no MVN recall sheet 2 question 9.

  • 8/14/2019 Prof[1] F P Kelly - Probability

    78/78

    72 CHAPTER 6. CONTINUOUS RANDOM VARIABLES

    Example (bivariate normal).

    f(x1, x2) =1

    2(1 p2) 12 12

    exp

    1

    2(1 p2)

    x1 1

    1

    2

    2p

    x1 1

    1

    x2 2

    2

    +

    x1 1

    1

    2

    1, 2 0 and 1 p +1. Joint distribution of a bivariate normal randomvariable.

    Example. An example with

    1 =1

    1 p2

    21 p11

    12

    p11 12

    22

    =

    21 p12

    p12 22

    E[Xi] = i andVar Xi = 2i . Cov(X1, X2) = 12p.

    Correlation(X1, X2) =Cov(X1, X2)

    12= p