Probability Theory Presentation 11

Embed Size (px)

Citation preview

  • 8/8/2019 Probability Theory Presentation 11

    1/36

    BST 401 Probability Theory

    Xing Qiu Ha Youn Lee

    Department of Biostatistics and Computational BiologyUniversity of Rochester

    October 12, 2009

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    2/36

    Outline

    1 Convergence of Sequence of Measurable Functions

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    3/36

    Random Variables Review

    Ill start with the simplest non-trivial probability space:(1, 21 , 1), where 1 = {H, T}, 1({H}) = 1({T}) =

    12 .

    I can define a sequence of random variables X1, X2, . . . on

    this probability space in this way:

    Xn() =

    0, = H,1n

    , = T.

    My point: different random variables are only different ways

    to assign numbers to events. They do not change the

    whole space nor the probability measure.Bernoulli random variable review: X Bernoulli(p) meansP(X = 1) = p and P(X = 0) = 1 p. So it is more thanjust a coin tossing distribution: it mustsend the two

    possible outcomes to numbers 0 and 1.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    4/36

    Random Variables Review

    Ill start with the simplest non-trivial probability space:

    (1, 21 , 1), where 1 = {H, T}, 1({H}) = 1({T}) =

    12 .

    I can define a sequence of random variables X1, X2, . . . on

    this probability space in this way:

    Xn() =

    0, = H,1n

    , = T.

    My point: different random variables are only different ways

    to assign numbers to events. They do not change the

    whole space nor the probability measure.Bernoulli random variable review: X Bernoulli(p) meansP(X = 1) = p and P(X = 0) = 1 p. So it is more thanjust a coin tossing distribution: it mustsend the two

    possible outcomes to numbers 0 and 1.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    5/36

    Random Variables Review

    Ill start with the simplest non-trivial probability space:

    (1, 21 , 1), where 1 = {H, T}, 1({H}) = 1({T}) =

    12 .

    I can define a sequence of random variables X1, X2, . . . on

    this probability space in this way:

    Xn() =

    0, = H,1n

    , = T.

    My point: different random variables are only different ways

    to assign numbers to events. They do not change the

    whole space nor the probability measure.Bernoulli random variable review: X Bernoulli(p) meansP(X = 1) = p and P(X = 0) = 1 p. So it is more thanjust a coin tossing distribution: it mustsend the two

    possible outcomes to numbers 0 and 1.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    6/36

    Random Variables on the Same Space

    Let Y be the casino r.v., P(Y = 1) = q,P(Y = 1) = 1 q. Y is not a Bernoulli r.v.!

    The following examples show that X and Y can be definedon the same probability space: a) Y = 2X 1 (q = p) ; b)Y = 1 2X (q = 1 p); c) Y 1 (q = 1). d) Y 1(q = 0).

    If X1 Bernoulli(12 ), X2 Bernoulli(

    13 ), they can not be

    defined on the same probability space!

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    7/36

    Random Variables on the Same Space

    Let Y be the casino r.v., P(Y = 1) = q,P(Y = 1) = 1 q. Y is not a Bernoulli r.v.!

    The following examples show that X and Y can be definedon the same probability space: a) Y = 2X 1 (q = p) ; b)Y = 1 2X (q = 1 p); c) Y 1 (q = 1). d) Y 1(q = 0).

    If X1 Bernoulli(12 ), X2 Bernoulli(

    13 ), they can not be

    defined on the same probability space!

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    8/36

    Random Variables on the Same Space

    Let Y be the casino r.v., P(Y = 1) = q,P(Y = 1) = 1 q. Y is not a Bernoulli r.v.!

    The following examples show that X and Y can be definedon the same probability space: a) Y = 2X 1 (q = p) ; b)Y = 1 2X (q = 1 p); c) Y 1 (q = 1). d) Y 1(q = 0).

    If X1 Bernoulli(12 ), X2 Bernoulli(

    13 ), they can not be

    defined on the same probability space!

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    9/36

    Random Variables and the Product Space (I)

    Let X1, X2 be two separate1 yet identical Bernoulli r.v.s.defined on 1 and 2, where 2 is just a copy of 1.

    My point is, though 2 is a copy of 1 and X2 assigns thesame numbers to the same events, X1 = X2 because they

    can take different values.In probability theory, X1 = X2 is very strict. It means that a)X1 and X2 are defined on the same probability space; b)

    X1() = X2() for all .

    In the same spirit, Xn

    a.s.

    X

    means that a) Xn are definedon the same probability space; b) they converge to X

    almost surely.

    1I can not use the word independent here because I havent defined it

    yet.Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    10/36

  • 8/8/2019 Probability Theory Presentation 11

    11/36

    Random Variables and the Product Space (I)

    Let X1, X2 be two separate1 yet identical Bernoulli r.v.s.defined on 1 and 2, where 2 is just a copy of 1.

    My point is, though 2 is a copy of 1 and X2 assigns thesame numbers to the same events, X1 = X2 because they

    can take different values.In probability theory, X1 = X2 is very strict. It means that a)X1 and X2 are defined on the same probability space; b)

    X1() = X2() for all .

    In the same spirit, Xn

    a.s.

    X means that a) X

    nare defined

    on the same probability space; b) they converge to X

    almost surely.

    1I can not use the word independent here because I havent defined it

    yet.Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    12/36

    Random Variables and the Product Space (I)

    Let X1, X2 be two separate1 yet identical Bernoulli r.v.s.defined on 1 and 2, where 2 is just a copy of 1.

    My point is, though 2 is a copy of 1 and X2 assigns thesame numbers to the same events, X1 = X2 because they

    can take different values.In probability theory, X1 = X2 is very strict. It means that a)X1 and X2 are defined on the same probability space; b)

    X1() = X2() for all .

    In the same spirit, Xn

    a.s.

    X means that a) X

    nare defined

    on the same probability space; b) they converge to X

    almost surely.

    1I can not use the word independent here because I havent defined it

    yet.Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    13/36

    Random Variables and the Product Space (II)

    The product space/measure is a way to connect the

    otherwise separate probability spaces/random variables.

    X1 on 1, X2 on 2. We may consider them as X1 and X2on 1 2 in this way:

    X1 : 1 2 R, X1(1, 2) = X1(1).

    X2 : 1 2 R, X2(1, 2) = X2(2).

    We can do this for an infinite sequence of r.v.s. Let

    X1, X2, . . . be a sequence of r.v.s defined on separate

    probability spaces 1, 2, . . .. The product space

    contains outcomes such as (H, H, T, H, T, T, . . .).

    Xn :n

    n R, Xn(1, 2, . . .) = Xn(n).

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    14/36

    Random Variables and the Product Space (II)

    The product space/measure is a way to connect the

    otherwise separate probability spaces/random variables.

    X1 on 1, X2 on 2. We may consider them as X1 and X2on 1 2 in this way:

    X1 : 1 2 R, X1(1, 2) = X1(1).

    X2 : 1 2 R, X2(1, 2) = X2(2).

    We can do this for an infinite sequence of r.v.s. Let

    X1, X2, . . . be a sequence of r.v.s defined on separate

    probability spaces 1, 2, . . .. The product space

    contains outcomes such as (H, H, T, H, T, T, . . .).

    Xn :n

    n R, Xn(1, 2, . . .) = Xn(n).

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    15/36

    Random Variables and the Product Space (II)

    The product space/measure is a way to connect the

    otherwise separate probability spaces/random variables.

    X1 on 1, X2 on 2. We may consider them as X1 and X2on 1 2 in this way:

    X1 : 1 2 R, X1(1, 2) = X1(1).

    X2 : 1 2 R, X2(1, 2) = X2(2).

    We can do this for an infinite sequence of r.v.s. Let

    X1, X2, . . . be a sequence of r.v.s defined on separate

    probability spaces 1, 2, . . .. The product space

    contains outcomes such as (H, H, T, H, T, T, . . .).

    Xn :n

    n R, Xn(1, 2, . . .) = Xn(n).

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    16/36

    Random Variables and the Product Space (III)

    The point: Xn is just Xn defined for the product space so

    we dont need to make any distinction in practice.

    Xns are defined on the same probability spaces now, so

    they can be compared.

    Apparently Xn = Xm in general. There is only oneexception: Xn(n) = Xm(m) = const. for all n n andm m. It turns out to be the case for the strong law oflarge numbers (SLLN).

    SLLN (without proof, just state the conclusion for a specialcase): Let Zn be

    1n

    ni=1 Xi. Demonstrate the behavior of

    Zn up to n = 3.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    17/36

    Random Variables and the Product Space (III)

    The point: Xn is just Xn defined for the product space so

    we dont need to make any distinction in practice.

    Xns are defined on the same probability spaces now, so

    they can be compared.

    Apparently Xn = Xm in general. There is only oneexception: Xn(n) = Xm(m) = const. for all n n andm m. It turns out to be the case for the strong law oflarge numbers (SLLN).

    SLLN (without proof, just state the conclusion for a specialcase): Let Zn be

    1n

    ni=1 Xi. Demonstrate the behavior of

    Zn up to n = 3.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    18/36

    Random Variables and the Product Space (III)

    The point: Xn is just Xn defined for the product space so

    we dont need to make any distinction in practice.

    Xns are defined on the same probability spaces now, so

    they can be compared.

    Apparently Xn = Xm in general. There is only oneexception: Xn(n) = Xm(m) = const. for all n n andm m. It turns out to be the case for the strong law oflarge numbers (SLLN).

    SLLN (without proof, just state the conclusion for a specialcase): Let Zn be

    1n

    ni=1 Xi. Demonstrate the behavior of

    Zn up to n = 3.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    19/36

    Random Variables and the Product Space (III)

    The point: Xn is just Xn defined for the product space so

    we dont need to make any distinction in practice.

    Xns are defined on the same probability spaces now, so

    they can be compared.

    Apparently Xn = Xm in general. There is only oneexception: Xn(n) = Xm(m) = const. for all n n andm m. It turns out to be the case for the strong law oflarge numbers (SLLN).

    SLLN (without proof, just state the conclusion for a specialcase): Let Zn be

    1n

    ni=1 Xi. Demonstrate the behavior of

    Zn up to n = 3.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    20/36

    About the Homework

    Back to the homework #6, problem 1. It asks you to prove

    a.s. convergence. Without any handy theorems/tools, you

    must start from scratch, that is, proof that for almost surely

    every , X1(), X2(), . . . as a sequence of real numbersconverges.

    Homework #7, problem 2. Limits are defined for .You must use countably many set operations of rectangles

    (essentially finitely dimensional rectangles) to defined

    those sets.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    21/36

    About the Homework

    Back to the homework #6, problem 1. It asks you to prove

    a.s. convergence. Without any handy theorems/tools, you

    must start from scratch, that is, proof that for almost surely

    every , X1(), X2(), . . . as a sequence of real numbersconverges.

    Homework #7, problem 2. Limits are defined for .You must use countably many set operations of rectangles

    (essentially finitely dimensional rectangles) to defined

    those sets.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    22/36

    Convergence in measure/probability

    fn

    f iff > 0, ({ : |fn() f()| }) 0.

    Convergence in measure says that the measure of

    not-convergent points shrinks to zero. Or in probabilitytheory: the probability of seeing outlier (those such that

    |fn() f()| > ) decreases to zero.

    It looks awfully like a.e. convergence! Counter example:

    shrinking but bouncy indicators.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    23/36

    Convergence in measure/probability

    fn

    f iff > 0, ({ : |fn() f()| }) 0.

    Convergence in measure says that the measure of

    not-convergent points shrinks to zero. Or in probabilitytheory: the probability of seeing outlier (those such that

    |fn() f()| > ) decreases to zero.

    It looks awfully like a.e. convergence! Counter example:

    shrinking but bouncy indicators.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    24/36

    Convergence in measure/probability

    fn

    f iff > 0, ({ : |fn() f()| }) 0.

    Convergence in measure says that the measure of

    not-convergent points shrinks to zero. Or in probabilitytheory: the probability of seeing outlier (those such that

    |fn() f()| > ) decreases to zero.

    It looks awfully like a.e. convergence! Counter example:

    shrinking but bouncy indicators.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    25/36

    Weak Convergence of Measures

    All the convergence we defined so far are convergence of

    measurable functions/random variables w.r.t. a fixed

    probability measure.

    In SLLN, we need convergence in probability or even a.e.convergence. But in CLT, we are satisfied by knowing the

    resulting distribution is normal, we dont really care about

    pointwise convergence.

    This makes us consider about a totally different

    convergence. A convergence of distributions/measuresinstead of convergence of random variables.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    26/36

    Weak Convergence of Measures

    All the convergence we defined so far are convergence of

    measurable functions/random variables w.r.t. a fixed

    probability measure.

    In SLLN, we need convergence in probability or even a.e.convergence. But in CLT, we are satisfied by knowing the

    resulting distribution is normal, we dont really care about

    pointwise convergence.

    This makes us consider about a totally different

    convergence. A convergence of distributions/measuresinstead of convergence of random variables.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    27/36

    Weak Convergence of Measures

    All the convergence we defined so far are convergence of

    measurable functions/random variables w.r.t. a fixed

    probability measure.

    In SLLN, we need convergence in probability or even a.e.convergence. But in CLT, we are satisfied by knowing the

    resulting distribution is normal, we dont really care about

    pointwise convergence.

    This makes us consider about a totally different

    convergence. A convergence of distributions/measuresinstead of convergence of random variables.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    28/36

    Definition

    Let P1, P2, . . . be probability measures on . Pnw

    P iff any oneof the following equivalent conditions hold:

    Fn(x) F(x) for all continuous points (including ).(Durrett book definition)

    n(A) (A) for all continuity sets A of P, which are setssuch that (A) = 0.

    fdn

    fd, for all bounded, continuousfunctions.

    Several other criteria. See Thm 2.8.1. in Ashs book.

    We say a sequence of r.v.s X1, X2, . . . converges weakly

    (converges in distribution) to X if the distribution functions

    Fn associated with Xn converges weakly to that of X.

    This topic will be re-studied in the CLT chapter.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    29/36

    Definition

    Let P1, P2, . . . be probability measures on . Pnw

    P iff any oneof the following equivalent conditions hold:

    Fn(x) F(x) for all continuous points (including ).(Durrett book definition)

    n(A) (A) for all continuity sets A of P, which are setssuch that (A) = 0.

    fdn

    fd, for all bounded, continuousfunctions.

    Several other criteria. See Thm 2.8.1. in Ashs book.

    We say a sequence of r.v.s X1, X2, . . . converges weakly

    (converges in distribution) to X if the distribution functions

    Fn associated with Xn converges weakly to that of X.

    This topic will be re-studied in the CLT chapter.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    30/36

    Definition

    Let P1, P2, . . . be probability measures on . Pnw

    P iff any oneof the following equivalent conditions hold:

    Fn(x) F(x) for all continuous points (including ).(Durrett book definition)

    n(A) (A) for all continuity sets A of P, which are setssuch that (A) = 0.

    fdn

    fd, for all bounded, continuousfunctions.

    Several other criteria. See Thm 2.8.1. in Ashs book.

    We say a sequence of r.v.s X1, X2, . . . converges weakly

    (converges in distribution) to X if the distribution functions

    Fn associated with Xn converges weakly to that of X.

    This topic will be re-studied in the CLT chapter.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    31/36

    Definition

    Let P1, P2, . . . be probability measures on . Pnw

    P iff any oneof the following equivalent conditions hold:

    Fn(x) F(x) for all continuous points (including ).(Durrett book definition)

    n(A) (A) for all continuity sets A of P, which are setssuch that (A) = 0.

    fdn

    fd, for all bounded, continuousfunctions.

    Several other criteria. See Thm 2.8.1. in Ashs book.

    We say a sequence of r.v.s X1, X2, . . . converges weakly

    (converges in distribution) to X if the distribution functions

    Fn associated with Xn converges weakly to that of X.

    This topic will be re-studied in the CLT chapter.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    32/36

    Definition

    Let P1, P2, . . . be probability measures on . Pnw

    P iff any oneof the following equivalent conditions hold:

    Fn(x) F(x) for all continuous points (including ).(Durrett book definition)

    n(A) (A) for all continuity sets A of P, which are setssuch that (A) = 0.

    fdn

    fd, for all bounded, continuousfunctions.

    Several other criteria. See Thm 2.8.1. in Ashs book.

    We say a sequence of r.v.s X1, X2, . . . converges weakly

    (converges in distribution) to X if the distribution functions

    Fn associated with Xn converges weakly to that of X.

    This topic will be re-studied in the CLT chapter.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    33/36

    Definition

    Let P1, P2, . . . be probability measures on . Pnw

    P iff any oneof the following equivalent conditions hold:

    Fn(x) F(x) for all continuous points (including ).(Durrett book definition)

    n(A) (A) for all continuity sets A of P, which are setssuch that (A) = 0.

    fdn

    fd, for all bounded, continuousfunctions.

    Several other criteria. See Thm 2.8.1. in Ashs book.

    We say a sequence of r.v.s X1, X2, . . . converges weakly

    (converges in distribution) to X if the distribution functions

    Fn associated with Xn converges weakly to that of X.

    This topic will be re-studied in the CLT chapter.

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    34/36

    Relations Between Different Convergences

    Lp convergence implies convergence in measure.

    If is a probability measure, a.e. convergence implies

    convergence in measure.

    For finite measures (probabilities), L convergence implies

    Lp convergence; Lp convergence implies Lp

    convergence,

    if p > p (a homework problem).

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    35/36

    Relations Between Different Convergences

    Lp convergence implies convergence in measure.

    If is a probability measure, a.e. convergence implies

    convergence in measure.

    For finite measures (probabilities), L convergence implies

    Lp convergence; Lp convergence implies Lp

    convergence,

    if p > p (a homework problem).

    Qiu, Lee BST 401

    R l i B Diff C

    http://goforward/http://find/http://goback/
  • 8/8/2019 Probability Theory Presentation 11

    36/36

    Relations Between Different Convergences

    Lp convergence implies convergence in measure.

    If is a probability measure, a.e. convergence implies

    convergence in measure.

    For finite measures (probabilities), L convergence implies

    Lp convergence; Lp convergence implies Lp

    convergence,

    if p > p (a homework problem).

    Qiu, Lee BST 401

    http://goforward/http://find/http://goback/