of 25/25
PROBABILITY THEORY AND STATISTICS V. Tassion D-ITET Spring 2020 (updated: March 6, 2020)

PROBABILITY THEORY AND STATISTICSH2. A∈F ⇒Ac ∈F ifAisanevent,“nonA“ isalsoanevent. ... In this case, the probability of landing a fixed point is always 0 and the equation

  • View
    0

  • Download
    0

Embed Size (px)

Text of PROBABILITY THEORY AND STATISTICSH2. A∈F ⇒Ac ∈F ifAisanevent,“nonA“ isalsoanevent. ... In...

  • PROBABILITY THEORY AND STATISTICS

    V. Tassion

    D-ITET Spring 2020 (updated: March 6, 2020)

  • Introduction

    Some questions you may askWhat is probability?

    Ü A mathematical language describing systems involving randomness.

    Where are probabilities used?

    Ü Describe random experiments in the real world (coin flip, dice rolling, arrivaltimes of customers in a shop, weather in 1 year,...).

    Ü Express uncertainty. For example, when a machine performs a measurement,the value is rarely exact. One may use probability theory in this context by sayingthat the value obtained is equal to the real value plus a small random error.

    Ü Decision making. Probability theory can be used to describe a system when onlypart of the information is known. In such context, it may help to make a decision.

    Ü Randomized algorithms in computer science. Sometimes, it is more efficient toadd some randomness to perform an algorithm. Examples: Google web search, antssearching food.

    Ü Simplify complex systems. Examples: water molecules in water, or cars on thehighway.

    The notion of probabilistic modelIf one wants a precise physical description of a coin flip one would need a lot (really!)of information: the exact position of the fingers and the coins, the initial velocity, theinitial angular velocity, imperfections of the coin, the surface characteristics of the table,air currents, the brain activity of the gambler... These parameters are almost impossibleto measure precisely, and a tiny change in one of them may affect completely the result.For practical use, it is very convenient to use probabilistic description which simplifies alot the system. We are only interested in the result, namely the set of possible outcomes.

    1

  • 2

    For the coin flip, the probabilistic model is given by 2 possible outcomes (head andtails) and each outcome has probability phead = ptail = 1/2 to be realized. In other words,

    Coin flip = {{head, tail}, phead = 1/2, ptail = 1/2}

    A surprising analysis: coin flips are not fair! If one tosses a coin it has morechance to fall on the same face as its initial face! See the youtube video of Persi Diaconis:How random is a coin toss? - Numberphile

    Probability laws: randomness vs orderingIf one performs a single random experiment (for example a coin flip), the result is unpre-dictable. In contrast, when one performs many random experiments, then some generallaws can be observed. For example if one tosses 10000 independent coins, one shouldgenerally observe 5000 heads and 5000 tails approximately: This is an instance of a fun-damental probability law, called the law of large numbers. One goal of probability theoryis to describe how ordering can emerge out of many random experiments, and establishsome general probability laws.

    https://www.youtube.com/watch?v=AYnJv68T3MM

  • Chapter 1

    Mathematical framework

    The probabilistic model for the die

    When one throws a cubic die (with six faces), we know that half of the time, the upper faceof the die shows an even number: 2, 4, or 6. One would like to give a precise mathematicalmeaning to this phenomena. More precisely, our goal is to give a rigorous sense to

    P[“the upper face of the die is even”] = 12. (1.1)

    First, we have to give a sense to “the upper face of the die is even”, and then to thenotation P[.] which represents the probability. A first goal of the chapter is to introducethe notion of probability space used to model general random experiments. It will be givenby a triple (Ω,F ,P), where Ω is called the sample space, F the set of events and P theprobability measure. Before giving the general definitions, we construct a first exampleof probability space in the simple case of the die.

    Step 1: Determine the set of possible outcomes Ω.In probability, we are not interested in the exact protocol of the experiment, we focus

    on its result. The first thing we identify is the set of possible outcomes, called the samplespace and denoted by Ω. For the throw of a die, we have

    Ω = {1,2,3,4,5,6}

    Step 2: Give a mathematical sense to “the die is even”.We want to evaluate the probability of certain “observations”, for example that the

    upper face of the die is even. These observations are formalized by the notion of event:

    An event is given by a subset A ⊂ Ω.

    3

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 4

    We write F for the set of events. Here, we have F = P(E).Examples of events:

    A = {2,4,6}, “the die is even”B = {1,2,3}. “the die is smaller than or equal to 3”

    Set operations on events:The intersection corresponds to the logic operation and:

    C ∶= A ∩B = {2}. “the die is even and smaller than or equal to 3”The union corresponds to the logic operation or:

    D ∶= A ∪B = {1,2,3,4,6}. “the die is even or smaller than or equal to 3”The complement corresponds to the logic operation not:

    E ∶= Ac = {1,3,5}. “the die is not even”

    Step 3: Give a mathematical sense to the probability P.We want to associate to each event A a probability P[A] ∈ [0,1]. This number rep-

    resents the “frequency” at which the event A occurs if one performs many experiments.Before giving the general formula, let us look at some examples.

    If one performs many throws of the die, the event A = {1} (“the die is equal to 1”) willoccur typically one time over six. Hence, it is natural to define P[A] = 1/6. The eventA = {1,2,3} will typically occur half of the time and we define P[A] = 1/2 in this case.More generally, we use the following definition:

    The probability measure is the function P ∶ F → [0,1] defined by

    ∀A ∈ F P[A] = ∣A∣6.

    For example, the probability of the event A = {2,4,6} (“the die is even”) is

    P[A] = ∣{2,4,6}∣6

    = 36= 1

    2.

    Ü the equation above is the mathematical way to express Equation (1.1).Notice that the probability measure P satisfies the following properties

    • P[Ω] = 1,

    • P[A ∪B] = P[A] + P[B] if A and B are disjoint.

    Summary: the probabilistic model for the die

    Sample space Ω = {1,2,3,4,5,6}

    Set of events F = P(E)

    Probability measure P ∶ F → [0,1]A ↦ ∣A∣6

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 5

    1.1 Sample spaces and events (textbook p.1-4)

    Sample space

    We want to model a random experiment. The first mathematical object needed is the setof all possible outcomes of the experiment, denoted by Ω.

    Definition 1.1. The set Ω is called the sample space.An element ω ∈ Ω is called an outcome (or elementary experiment).

    Example 1: Coin toss

    Ω = {head, tail} or Ω = {0,1}.

    Example 2: Throw of a die

    Ω = {1,2,3,4,5,6}.

    Example 3: Throw of three dice

    Ω = {1,2,3,4,5,6}3.In this case an outcome ω ∈ Ω has three components ω = (ω1, ω2, ω3) and ωi represents thevalue of the i-th die.Example 4: Throw of infinitely many dice

    Ω = {1,2,3,4,5,6}N.In this case an outcome ω ∈ Ω has infinitely many components ω = (ω1, ω2, . . .) and ωirepresents the value of the i-th die.Example 5: Number of customers visiting a shop in one day

    Ω = N(= {0,1, . . .}).

    Example 6: Droplet of water falling on a square

    Ω = [0,1]2.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 6

    Events

    Our goal is to measure the probability that the outcome of the experiment satisfies acertain property. Mathematically, such property can be represented by the set of all theoutcomes that satisfy this property. For example, for the throw of die, the property “theupper face of the die is even” corresponds to the subset

    {2,4,6} ⊂ {1,2,3,4,5,6}.

    A priori, an event could be any subset A ⊂ Ω. This works very well if the samplespace Ω is finite or countable. But for more complicated (uncountable) sample spaces(e.g. [0,1]), one needs to impose some conditions on the events. This is due to the factthat later, we want to be able to estimate the probability of the events, and this is possibleonly if the events are sufficiently “nice”. For this course, this technicality is not important,but it explains why the following definition may look complicated at a first sight.

    Definition 1.2. We write F for the set of all the events. It is given by a subset of P(E)satisfying the following hypotheses:

    H1. Ω ∈ F

    H2. A ∈ F ⇒ Ac ∈ F if A is an event, “non A“ is also an event.

    H3. A1,A2, . . . ∈ F ⇒∞

    ⋃i=1

    Ai ∈ F .if A1,A2, . . . are events, then“A1 or A2 or ...” is an event

    Remark 1.3. A set F ⊂ P(E) satisfying H1, H2 and H3 above is called a σ-algebra.

    Example 1: Throw of a dieThe event corresponding to “the upper face of the die is even” is mathematically definedby

    A = {2,4,6}.

    Example 2: Droplet falling on a squareThe event corresponding to “the droplet falls on the upper part of the square” is mathe-matically defined by

    A = [0,1] × [1/2,1].

    In general, an event can be defined in two ways

    Ü one can list all the elements in the event, as in the two examples above.

    Ü one can define it as the set of ω ∈ Ω satisfying a certain property. For example, forthe throw of a die, the event “the upper face is smaller than or equal to 3” can bedefined by

    A = {ω ∈ Ω ∶ ω ≤ 3}.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 7

    Terminology

    Let ω ∈ Ω (a possible outcome). Let A be an event.We say the event A occurs (for ω) if ω ∈ A.

    A

    ω

    We say that it does not occur if ω ∉ A.

    A

    ω

    Remark 1.4. The event A = ∅ never occurs. “we never have ω ∈ ∅”The event A = Ω always occurs. “we always have ω ∈ Ω”

    Operations on events and interpretation

    Since events are defined as subsets of Ω, one can use operations from set theory (union,intersection, complement, symmetric difference,. . . ). From the definition, we know thatwe can take the complement of an event (by H2), or a countable union of events (by H3).The following proposition asserts that the other standard set operations are allowed.

    Proposition 1.5 (Consequences of the definition). We have

    i. ∅ ∈ F ,

    ii. A1,A2, . . . ∈ F ⇒∞

    ⋂i=1

    Ai ∈ F ,

    iii. A,B ∈ F ⇒ A ∪B ∈ F ,

    iv. A,B ∈ F ⇒ A ∩B ∈ F .Proof. We prove the items one after the other.

    i. By H1 in Definition 1.2, we have Ω ∈ F . Hence, by H3 in Definition 1.2, we have

    ∅ = Ωc ∈ F .

    ii. Let A1,A2, . . . ∈ F . By H2, we also have Ac1,Ac2, . . . ∈ F . Then, by H3, we have⋃∞i=1(Ai)c ∈ F . Finally, using H2 again, we conclude that

    ⋂i=1

    Ai = (∞

    ⋃i=1

    (Ai)c)c

    ∈ F .

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 8

    iii. Let A,B ∈ F . Define A1 = A, A2 = B, and for every i ≥ 3 Ai = ∅. By H3, we have

    A ∪B =∞

    ⋃i=1

    Ai ∈ F .

    iv. Let A,B ∈ F . By H2, Ac,Bc ∈ F . Then by iii. above, we have Ac ∪Bc ∈ F . Finally,by H2, we deduce

    A ∩B = (Ac ∪Bc)c ∈ F .

    Let A,B ∈ F . “A and B are two events”

    Event Graphical representation Probab. interpretation

    Ac

    A

    AcA does not occur

    A ∩BA B

    A and B occur

    A ∪BA B

    A or B occurs

    A∆B

    A B

    one and only one of A or Boccurs

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 9

    Relations between events and interpretations

    Relation Graphical representation Probab. interpretation

    A ⊂ B AB

    If A occurs, then B occurs

    A ∩B = ∅A B

    A and B cannot occur at thesame time

    Ω = A1 ∪A2 ∪A3 withA1,A2,A3 pairwise disjoint

    A1 A3

    A2

    for each outcome ω, one andonly one of the events A1,

    A2, A3 is satisfied.

    1.2 Mathematical definition of probability spaces

    Motivation: frequencies of events in random experiments

    As previously mentioned, in the random experiments that we consider, if we look at theoccurence of a certain event, a stable relative frequency appears after repeating manytimes the experiment (for instance: when throwing a needle onto a parquet floor, “theneedle intersects two laths”). Hence, if we introduce

    n = how many times the experiment is repeated,

    andnA = the number of occurences of the event A,

    We assume that n is very large (e.g. n = 1000000000), we want to say that the probabilityof A corresponds to the proportion of times the event A occurs. Hence, we would likedefine

    P[A] ≃ nAn.

    In other words, we assign a number P[A] ∈ [0,1] “the probability of A”, to every event A.Let us assume that it is rigorous and let us look at the properties satisfied by P.

    • By definition, one has nΩ = n since the event A = Ω (sample space) always occurs.We will thus assume that

    P[Ω] = 1.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 10

    • In the case when A = A1 ∪A2 ∪⋯ ∪Ak and the Ai are pairwise disjoint, one has

    nA = nA1 + nA2 +⋯ + nAk .

    Dividing by n, we obtain

    nAn

    = nA1n

    + nA2n

    +⋯ + nAkn.

    Hence, the probability should satisfy

    P(A) = P (A1) +⋯ + P (Ak), if A = A1 ∪A2⋯∪Ak (disjoint union).

    Probability measure and probability space

    Definition 1.6. Let Ω be a sample space, let F be a set of events. A probability measureon (Ω,F) is a map

    P ∶ F → [0,1]A ↦ P[A]

    that satisfies the following two properties

    P1. P[Ω] = 1.

    P2. (countable additivity) P[A] = ∑∞i=1 P[Ai] if A = ⋃∞i=1Ai (disjoint union).

    “A probability measure is a map that associates to each event a number in [0,1].”

    Definition 1.7. Let Ω be a sample space, F a set of events, and P a probability measure.The triple (Ω,F ,P) is called a probability space.

    To summarize, if one want to construct a probabilistic model, we give

    • a sample space Ω, “all the possible outcomes of the experiment”

    • a set of events F ⊂ P(Ω), “the set of possible observations”

    • a probability measure P. “gives a number in [0,1] to every event”

    Example 1: Throw of three dice

    • Ω = {1,2,3,4,5,6}3, an outcome is ω = ( ω1´¸¶die 1

    , ω2´¸¶die 2

    , ω3´¸¶die 3

    ),

    • F = P(Ω), any subset A ⊂ Ω is an event

    • for A ∈ F , P[A] = ∣A∣∣Ω∣ . (∣A∣ =number of elements in A).

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 11

    Example 2: First time we see “tail” in a biased coinWe throw a biased coin multiple times, at each throw, the coin falls on head with

    probability p, and it falls on tail with probability 1 − p (p is a fixed parameter in [0,1]).We stop at the first time we see a tail. The probability that we stop exactly at time k isgiven by

    pk = pk−1(1 − p).(Indeed, we stop at time k, if we have seen exactly k − 1 heads and 1 tail.)

    For this experiment, one possible probability space is given by

    • Ω = N/{0} = {1,2,3, . . .},

    • F = P(Ω),

    • for A ∈ F , P[A] = ∑k∈A

    pk.

    A tempting approach to define a probability measure is to first associate to every ωthe probability pω that the output of the experiment is ω. Then, for an event A ⊂ Ω definethe probability of A by the formula

    P[A] = ∑ω∈A

    pω. (1.2)

    This approach works perfectly well, when the sample space Ω is finite or countable (thisis the case of the two examples above). But this approach does not work well if Ω isuncountable. For example, in the case of the droplet of water in a square Ω = [0,1]2.In this case, the probability of landing a fixed point is always 0 and the equation (1.2)does not make sense. This is for this reason that we use an axiomatic definition (inDefinition 1.6) of probability measure.Example 3: Droplet of water on a square

    • Ω = [0,1]2,

    • F = Borel σ-algebra1

    • for A ∈ F , P(A) = Area(A).1the Borel σ-algebra F is defined as follows: it contains all A = [x1, x2]× [y1, y2], with 0 ≤ x1 ≤ x2 ≤ 1,

    0 ≤ y1 ≤ y2 ≤ 1, and it is the smallest collection of subsets of Ω which satisfies H1, H2 and H3 inDefinition 1.2.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 12

    Direct consequences of the definition

    Proposition 1.8. Let P be a probability measure on (Ω,F).

    i. We haveP[∅] = 0.

    ii. (additivity) Let k ≥ 1, let A1, . . . ,Ak be k pairwise disjoint events, then

    P[A1 ∪⋯ ∪Ak] = P[A1] +⋯ + P[Ak].

    iii. Let A be an event, thenP[Ac] = 1 − P[A].

    iv. If A and B are two events (not necessarily disjoint), then

    P[A ∪B] = P[A] + P[B] − P[A ∩B].

    Proof. We prove the items one after the other

    i. Define x = P[∅]. We already know that x ∈ [0,1] because x is the probability ofsome event. Defining A1 = A2 = ⋯ = ∅, we have

    ∅ =∞

    ⋃i=1

    Ai.

    The events Ai are disjoint and countable additivity implies

    ∑i=1

    P [Ai] = P[∅].

    Since P[Ai] = x for every i and P [∅] ≤ 1, we have∞

    ∑i=1

    x ≤ 1,

    and therefore x = 0.

    ii. Define Ak+1 = Ak+2 = ⋯ = ∅. In this way we have

    A1 ∪⋯ ∪Ak = A1 ∪⋯ ∪Ak ∪ ∅ ∪∅ ∪⋯ =∞

    ⋃i=1

    Ai.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 13

    Since the events Ai are pairwise disjoint, one can apply countable additivity asfollows:

    P[A1 ∪⋯ ∪Ak] = P[∞

    ⋃i=1

    Ai]

    countableadditivity

    =∞

    ∑i=1

    P[Ai]

    = P[A1] +⋯ + P[Ak] +∑i>k

    P[Ai]´¹¹¸¹¹¶

    =0

    .

    iii. By definition of the complement, we have Ω = A ∪Ac, and therefore

    1 = P[Ω] = P[A ∪Ac].

    Since the two events A, Ac are disjoint, additivity finally gives

    1 = P[A] + P[Ac].

    iv. A ∪B is the disjoint union of A with B/A. Hence, by additivity, we have

    P[A ∪B] = P[A] + P[B/A]. (1.3)

    Also B = (B ∩A) ∪ (B ∩Ac) = (B ∩A) ∪ (B/A). Hence, by additivity,

    P[B] = P[B ∩A] + P[B/A],

    which give P[B/A] = P[B]−P[A∩B]. Plugging this estimate in Eq. (1.3) we obtainthe result.

    Useful Inequalities

    Proposition 1.9 (Monotonicity). Let A,B ∈ F , then

    A ⊂ B ⇒ P[A] ≤ P[B].

    Proof. If A ⊂ B, then we have B = A ∪ (B/A) (disjoint union). Hence, by additivity, wehave

    P[B] = P[A] + P[B/A] ≥ P[A].

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 14

    A simple application: Consider the “droplet of water falling on a square”. Consider theevent En that the droplet falls at Euclidean distance 1/n from the point (1/2,1/2). Onecan see that En ⊂ [1/2 − 1/n,1/2 + 1/n]2. Therefore,

    P[En] ≤ P[[1/2 − 1/n,1/2 + 1/n]2] = 4/n,

    which implies that P[En] converges to 0 as n tends to infinity.

    Proposition 1.10 (Union bound). Let A1,A2, . . . be a sequence of events (not necessarilydisjoint), then we have

    P[∞

    ⋃i=1

    Ai] ≤∞

    ∑i=1

    P[Ai].

    Remark 1.11. The union bound also applies to a finite collection of events.

    Proof. For i ≥ 1, defineÃi = Ai ∩Aci−1 ∩⋯ ∩Ac1.

    One can check that∞

    ⋃i=1

    Ai =∞

    ⋃i=1

    Ãi.

    (To prove the direct inclusion, consider ω in the left hand side. Then define the smallesti such that ω ∈ Ai. For this i, we have ω ∈ Ãi, which implies that ω belongs to the righthand side. The other inclusion is clear because Ãi ⊂ Ai for every i.) Now, one can applythe countable additivity to the Ãi, because they are disjoint. We get

    P[∞

    ⋃i=1

    Ai] = P[∞

    ⋃i=1

    Ãi]

    =∞

    ∑i=1

    P[Ãi]

    ≤∞

    ∑i=1

    P[Ai].

    Application: We throw a die n times. We consider the probability space given by

    • Ω = {1,2,3,4,5,6}n, an outcome is ω = ( ω1´¸¶die 1

    ,⋯, ωn´¸¶die n

    ),

    • F = P(Ω),

    • for A ∈ F , P[A] = ∣A∣∣Ω∣ .

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 15

    We want to prove that the probability to see more than ` ∶= 7 logn successive 1’s is smallif n is large. Consider the event A that there exist ` successive 1’s. For every 0 ≤ k ≤ n−`,define the event

    Ak = {ω ∶ ωk+1 = ωk+2 = ⋯ = ωk+` = 1} “` successive 1’s between k + 1 and k + ` ”.Then, we have

    A =n−`

    ⋃k=0

    Ak.

    Using the union bound, we have

    P[A] ≤n−`

    ∑k=0

    P[Ak] ≤ n ⋅ (1

    6)`

    ≤ n ⋅ n− log(7)/ log(6),

    and therefore we see that the probability of seeing more than 7 logn consecutive 1’sconverge to 0 as n tends to infinity.

    Continuity properties of probability measures

    Proposition 1.12. Let (An) be an increasing sequence of events (i.e. An ⊂ An+1 for everyn). Then

    limn→∞

    P [An] = P[∞

    ⋃n=1

    An]. "increasing limit"

    Let (Bn) be a decreasing sequence of events (i.e. Bn ⊃ Bn+1 for every n). Then

    limn→∞

    P [Bn] = P[∞

    ⋂n=1

    Bn]. "decreasing limit"

    A1

    ∞⋃n=1

    An

    A2A3 ∞⋂

    n=1

    Bn B1B2B3

    Remark 1.13. By monotonicity, we have P[An] ≤ P[An+1] and P[Bn] ≥ P[Bn+1] forevery n. Hence the limits in the proposition are well defined as monotone limits.

    Proof. Let (An)n≥1 be an increasing sequence of events. Define Ã1 = A1 and for everyn ≥ 2

    Ãn = An/An−1.

    The events Ãn are disjoint and satisfy

    ⋃n=1

    An =∞

    ⋃n=1

    Ãn and AN =N

    ⋃n=1

    Ãn.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 16

    Using first countable additivity and then additivity, we have

    P[∞

    ⋃n=1

    An] = P[∞

    ⋃n=1

    Ãn]

    =∞

    ∑n=1

    P[Ãn]

    = limN→∞

    N

    ∑n=1

    P[Ãn]

    = limN→∞

    P[AN].

    Now, let (Bn) be a decreasing sequence of events. Then (Bcn) is increasing, and we canapply the previous result in the following way:

    P[∞

    ⋂n=1

    Bn] = 1 − P[∞

    ⋃n=1

    Bcn]

    = 1 − limn→∞

    P[Bcn]= lim

    n→∞P[Bn].

    1.3 Laplace models and counting (textbook p. 21–26)We now discuss a particular type of probability spaces that appear in many concreteexamples. The sample space Ω is an arbitrary finite set, and all the outcomes have thesame probability pω = 1∣Ω∣ .

    Definition 1.14. Let Ω be a finite sample space. The Laplace model on Ω is the triple(Ω,F ,P), where

    • F = P(Ω),

    • P ∶ F → [0,1] is defined by∀A ∈ F P[A] = ∣A∣∣Ω∣ .

    One can easily check that the mapping P above defines a probaility measure in thesense of the definition 1.6. In this context, estimating the probability P[A] boils down tocounting the number of elements in A and in Ω.Example:

    We consider n ≥ 3 points on a circle, from which we select 2 at random. What is theprobability that these two points selected are neighbors?We consider the Laplace model on

    Ω = {E ⊂ {1,2,⋯, n} ∶ ∣E∣ = 2}.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 17

    12

    3

    4 5

    6

    Figure 1.1: A circle with n = 6 points, and the subset {1,3} is selected.

    The event “the two points of E are neighbors” is given by

    A = {{1,2},{2,3},⋯,{n − 1, n},{n,1}},

    and we haveP[A] = ∣A∣∣Ω∣ =

    n

    (n2)= 2n − 1 .

    1.4 Random variables and distribution functions (textbookp 9–12)

    Most often, the probabilistic model under consideration is rather complicated, and oneis only interested in certain quantities in the model. For this reason, one introduces thenotion of random variables.

    Definition 1.15. Let (Ω,F ,P) be a probability space. A random variable (r.v.) is amap X ∶ Ω→ R such that for all a ∈ R,

    {ω ∈ Ω ∶ X(ω) ≤ a} ∈ F .

    Ü The condition {ω ∈ Ω ∶ X(ω) ≤ a} ∈ F is needed for P[{ω ∈ Ω ∶ X(ω) ≤ a}] to bewell-defined.

    Notation: When events are defined in terms of random variable, we will omit the de-pendence in ω. For example, for a ≤ b we write

    {X ≤ a} = {ω ∈ Ω ∶ X(ω) ≤ a},{a

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 18

    Example 1: Gambling with one dieWe throw a fair die. The sample space is Ω = {1,2,3,4,5,6} and the associated probabilityspace (Ω,F ,P) is defined on page 4. Suppose that we gamble on the outcome in such away that our profit is

    −1 if the outcome is 1, 2 or 3, (1.4)0 if the outcome is 4,2 if the outcome is 5 or 6,

    where a negative profit correspond to a loss. Our profit can be represented by the randomvariable X defined by

    ∀ω ∈ Ω X(ω) =⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

    −1 if ω = 1,2,3,0 if ω = 4,2 if ω = 5,6.

    (1.5)

    Example 2: First coordinate of a random pointAs in Example 3 on page 11, we define

    • Ω = [0,1]2,

    • F = Borel σ-algebra,

    • for A ∈ F , P(A) = Area(A).

    Ü The map which returns the x-coordinate, defined by

    Y ∶ Ω → R(ω1, ω2) ↦ ω1

    ,

    is a random variable. Indeed, for every a ∈ R, we have

    {Y ≤ a} = {(ω1, ω2) ∈ Ω ∶ ω1 ≤ a} =⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

    ∅ if a < 0,[0, a] × [0,1] if 0 ≤ a ≤ 1,Ω if a > 1,

    and ∅, [0, a] × [0,1] and Ω are all elements of F .

    Example 3: Area on the left-down part of a random pointWe consider the same probability space as in Example 2 above (Ω = [0,1]2). For ω ∈ Ω,let R(ω) = [0, ω1] × [0, ω2] (it corresponds to rectangle at the left and below ω).Ü The map which returns the area of R(ω), defined by

    Z ∶ Ω → R(ω1, ω2) ↦ ω1ω2

    , (1.6)

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 19

    is a random variable. To see this, notice that for every a ∈ R, we have

    {Z ≤ a} = {ω ∈ Ω ∶ Y (ω) ≤ a} =⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

    ∅ if a < 0,Ea if 0 ≤ a ≤ 1,Ω if a > 1,

    where Ea = {(x, y) ∈ [0,1]2 ∶ xy ≤ a} is represented below.

    a

    1

    a 1

    Figure 1.2: The set Ea corresponds to the region Ea.

    The sets ∅, Ω and Ea are elements of F for every a. For the culture, we now give theformal proof that Ea is in F , but this proof is not important for applications and can beskipped in a first reading.

    One just need to check that Ea is in F if 0 < a ≤ 1 (the case a = 0 can easily be treatedindependently). Let us first prove that

    Eca = ⋃0 a, one can pick a rational number q in theinterval (a/ω2, ω1). For such a choice, we have ω ∈ (q,1] × (a/q,1].

    ⊃ Assume that ω ∈ (q,1] × (a/q,1] for some 0 < q ≤ 1. Then ω1ω2 > q ⋅ a/q = a, andtherefore ω ∉ Ea.

    Now for every q, the rectangle (q,1]× (a/q,1] is in F . Therefore, Eca is also in F , since itis a countable union of elements of F . This concludes that Ea is an event, by taking thecomplement.

    Definition 1.16. Let X be a random variable on a probability space (Ω,F ,P). Thedistribution function of X is the function FX ∶ R→ [0,1] defined by

    ∀a ∈ R FX(a) = P[X ≤ a].

    Proposition 1.17 (Basic identity). Let a < b be two real numbers. Then

    P[a

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 20

    Proof. We have {X ≤ b} = {X ≤ a} ∪ {a 1.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 21

    0 1

    1

    Figure 1.4: Graph of the distribution function FY

    Example 3: Area of the left-down part of a random pointConsider the random variable Z defined by Eq. (1.6) on page 18. For 0 ≤ a ≤ 1, the event

    {Z ≤ a} = {(ω1, ω2) ∈ [0,1]2 ∶ ω1ω2 ≤ a} = Ea

    was represented on Fig. 1.4 and its probability is equal to its area, given by

    Area(Ea) = a + ∫1

    a

    a

    xdx = a + a log(1/a))

    (with the convention 0 log(1/0) = 0). The event {Z ≤ a} is empty if a < 0 and it is equalto Ω if a > 1. Therefore

    FZ(a) = P[Z ≤ a] =⎧⎪⎪⎪⎪⎨⎪⎪⎪⎪⎩

    0 if a < 0,a + a log(1/a) if 0 ≤ a ≤ 1,1 if a > 1.

    0 1

    1

    Figure 1.5: Graph of the distribution function of FZ

    Theorem 1.18 (Properties of distribution functions). Let X be a random variable onsome probability space (Ω,F ,P). The distribution function F = FX ∶ R → [0,1] of Xsatisfies the following properties.

    (i) F is nondecreasing.

    (ii) F is right continuous2.2i.e. F (a) = lim

    h↓0F (a + h) for every a ∈ R.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 22

    (iii) lima→−∞

    F (a) = 0 and lima→∞

    F (a) = 1.

    Proof. We prove the items one after the other.

    (i) For a ≤ b, we have {X ≤ a} ⊂ {X ≤ b}. Hence, by monotonicity, we have P[X ≤ a] ≤P[X ≤ b], i.e.

    F (a) ≤ F (b).

    (ii) Let an ↑ ∞. For every ω ∈ Ω, there exists n large enough such that X(ω) ≤ an.Hence,

    Ω = ⋃n≥1

    {X ≤ an}.

    Furthermore, we have {X ≤ an} ⊂ {X ≤ an+1} and the continuity properties ofprobability measures imply

    1 = P[Ω] = P[⋃n≥1

    {X ≤ an}]

    = limn→∞

    P[X ≤ an]= lim

    n→∞F (an).

    In the same way, one has lima→−∞

    F (a) = 0. Indeed, using that for every an ↓ −∞,

    ∅ = ⋂n≥1

    {X ≤ an}

    and {X ≤ an} ⊃ {X ≤ an+1}, it follows from the continuity properties of probabilitymeasures that

    0 = P[∅] = limn→∞

    P[X ≤ an] = limn→∞

    F (an).

    (iii) Let a ∈ R, let hn ↓ 0. We have

    {X ≤ a} = ⋂n≥1

    {X ≤ a + hn},

    where {X ≤ a + hn} ⊃ {X ≤ a + hn+1}. Hence by the continuity properties of proba-bility measures, we have

    F (a) = P[X ≤ a] = P[⋂n≥1

    {X ≤ a + hn}] = limn→∞

    P[X ≤ a + hn] = limn→∞

    F [a + hn].

    Properties (i), (ii) and (iii) actually characterize distribution functions, in the followingsense:

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 23

    Theorem 1.19. Let F ∶ R → [0,1] satisfies (i), (ii) and (iii), then there exists a proba-bility space (Ω,F ,P) and a random variable X ∶ Ω→ R such that F = FX .

    Proof. Admitted.

    Theorem 1.19 above plays an important role in probability theory. Given a nonde-creasing, right continuous function F ∶ R → [0,1] with lim−∞F = 0 and lim+∞F = 1, onecan always consider a random variable X on some probability space (Ω,F ,P) such thatF = FX . This is very useful because it allows one to define a random variable via itsdistribution function. As soon as we have a function F as above, we have the right tosay:

    “Let X be a random variable on some probability space (Ω,F ,P) with distri-bution function F .”

    The precise choice of the probability space is often not important and we simply say:

    “Let X be a r.v. with distribution function F .”

    Discontinuity/continuity points of F We have seen that the distribution functionF = FX of a random variableX is always right continuous. What about the left continuity?In Example 1 (gambling on a die), we can see that it is not always left continuous, andthe following proposition gives an interpretation of the left limit

    F (a−) = limh↓0

    F (a − h)

    at a given point a for a general distribution function.

    Proposition 1.20 (probability of a given value). Let X ∶ Ω → R be a random variablewith distribution function F . Then for every a in R we have

    P[X = a] = F (a) − F (a−)

    We omit the proof, which can easily be obtained using the elementary identity to-gether with the continuity properties of probability measures. We rather insist on theinterpretation of this proposition. Fix a ∈ R.

    Ü If F is not continuous at a point a ∈ R, then the “jump size” F (a) − F (a−) is equalto the probability that X = a.

    Ü If F is continuous at a point a ∈ R, then P[X = a] = 0. This is what happens atevery point a in Example 2 or 3.

  • CHAPTER 1. MATHEMATICAL FRAMEWORK 24

    Discrete and continuous random variables

    Two different types of random variables will play an important role in this class:

    • the discrete random variables,

    • the continuous random variables.

    We give here the definitions but these two classes of random variables will be studiedin more detail.

    Definition 1.21 (Discrete random variables). A random variable X ∶ Ω→ R is said to bediscrete if its image

    X(Ω) = {x ∈ R ∶ ∃ω ∈ Ω X(ω) = x}is at most countable.

    In other words, a random variables is said to be discrete if it only takes finitely orcountably many values x1, x2, . . .. In this case the probabilistic properties of X are fullydescribed by the numbers

    p(x1) = P[X = x1], p(x2) = P[X = x2], . . .

    Example: The random variable of Example 1 (defined on Eq. (1.5) page 18) is discrete:it takes only three possible values x1 = −1, x2 = 0 and x3 = 2.

    Definition 1.22 (Continuous random variables). A random variable X ∶ Ω → R is saidto be continuous if its distribution function FX can be written as

    FX(a) = ∫a

    −∞f(x)dx for all a in R (1.7)

    for some nonnegative function f ∶ R→ R+, called the density of X.

    To understand the terminology “continuous”, observe that the formula (1.7) impliesthat FX is a continuous function. In particular, by Proposition 1.20, a continuous r.v. Xsatisfies P[X = x] = 0 for every fixed x. The probability that X is exactly equal to x is 0,but we may have a chance that X is very close to x and f(x)dx represent the probabilitythat X takes a value in the infinitesimal interval [x,x + dx].Example: The random variables Y and Z of Example 2 and 3 are continuous.

    Mathematical framework Sample spaces and events (textbook p.1-4)Mathematical definition of probability spacesLaplace models and counting (textbook p. 21–26)Random variables and distribution functions (textbook p 9–12)