LECTURE 1: Probability models and axioms Readings: Lecture ... · LECTURE 1: Probability models and axioms • Readings: Sections 1.1, 1.2 Lecture outline • Sample space • Probability

LECTURE 1: Probability models and axioms

• Readings: Sections 1.1, 1.2

Lecture outline

• Sample space

• Probability laws

– Axioms

– Some properties

• Examples

– Discrete

– Continuous

• Discussion

– Countable additivity

– Mathematical subtleties

• Interpretations of probabilities

• Sample space


– Axioms

– Some properties

• Examples

– Discrete

– Continuous

• Discussion




• Sample space


– Axioms

– Some properties

• Examples

– Discrete

– Continuous

• Discussion




• Sample space


– Axioms

– Some properties

• Examples

– Discrete

– Continuous

• Discussion




• Sample space


– Axioms

– Properties that follow from the axioms

• Examples

– Discrete

– Continuous

• Discussion




Sample space

• List (set) of possible outcomes

• List must be:

– Mutually exclusive

– Collectively exhaustive

– At the “right” granularity

Sample space

• List (set) of possible outcomes, Ω

• List must be:




• Two steps:

– Describe possible outcomes

– Describe beliefs about likelihood of outcomes

Sample space

• List (set) of possible outcomes

• List must be:




Sample space


• List must be:




Sample space


• List must be:




Die roll example

X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

• Let B be the event: min(X,Y ) = 2

• Let M = max(X,Y )

• P(M = 1 | B) =

• P(M = 2 | B) =

Sample space: discrete/finite example

• Two rolls of a tetrahedral die

– Sample space vs. sequential description

X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

1

2

3

4

1,11,2

1,3

1,4

4,4

• A continuous sample space:(x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y




X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

1

2

3

4

1,11,2

1,3

1,4

4,4


x

1

1

y




X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

1

2

3

4

1,11,2

1,3

1,4

4,4


x

1

1

y




X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

1

2

3

4

1,11,2

1,3

1,4

4,4


x

1

1

y

Sample space: continuous example

• A continuous sample space:

• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

ySample space: continuous example


• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y

Probability axioms

• Event: a subset of the sample space

– Probability is assigned to events

Axioms:

1. P(A) ≥ 0

2. P(universe) = 1

3. If A ∩B = Ø,then P(A ∪B) = P(A) +P(B)

• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)

– Axiom 3 needs strengthening

– Do weird sets have probabilities?

Probability axioms



Axioms:

1. P(A) ≥ 0

2. P(universe) = 1


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Probability axioms



• Axioms:

1. P(A) ≥ 0

2. P(universe) = 1


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Probability axioms



• Axioms:

– Nonnegativity: P(A) ≥ 0

– Normalization: P(Ω) = 1

– (Finite) additivity:If A ∩B = Ø, then P(A ∪B) = P(A) +P(B)

• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Probability axioms



• Axioms:



– (Finite) additivity:If A ∩B = Ø, then P(A ∪B) = P(A) +P(B)

• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Probability axioms



• Axioms:



– (Finite) additivity: (to be strengthened later)If A ∩B = Ø, then P(A ∪B) = P(A) +P(B)

• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Some simple consequences of the axioms

• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0

• A,B,C disjoint: P(A ∪B ∪ C) = P(A) +P(B) +P(C)and similarly for k disjoint events

• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0

• P(A) +P(Ac) = 1


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Probability axioms



• Axioms:




Probability axioms



• Axioms:




• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint events:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0

• P(A) +P(Ac) = 1


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Probability axioms



• Axioms:





• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Probability axioms



• Axioms:




Probability axioms



• Axioms:




• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint events:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0

• P(A) +P(Ac) = 1


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



Probability axioms



• Axioms:





• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0

• P(A) +P(Ac) = 1


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



• Axioms

• Consequences

For disjoint sets:

P(A ∪B) = P(A) +P(B)


• P(A) ≤ 1

• P(Ø) = 0

• P(A) +P(Ac) = 1


• P(s1, s2, . . . , sk) = P(s1) + · · ·+P(sk)

= P(s1) + · · ·+P(sk)



More consequences of the axioms

• If A ⊂ B, then P(A) ≤ P(B)

• P(A ∪B) = P(A) +P(B)−P(A ∩B)

• P(A ∪B) ≤ P(A) +P(B)

• P(A ∪B ∪ C) = P(A) +P(Ac ∩B) +P(Ac ∩Bc ∩ C)



• P(A ∪B) = P(A) +P(B)−P(A ∩B)

• P(A ∪B) ≤ P(A) +P(B)

• P(A ∪B ∪ C) = P(A) +P(Ac ∩B) +P(Ac ∩Bc ∩ C)More consequences of the axioms


• P(A ∪B) = P(A) +P(B)−P(A ∩B)

• P(A ∪B) ≤ P(A) +P(B)

• P(A ∪B ∪ C) = P(A) +P(Ac ∩B) +P(Ac ∩Bc ∩ C)More consequences of the axioms


• P(A ∪B) = P(A) +P(B)−P(A ∩B)

• P(A ∪B) ≤ P(A) +P(B)




• P(A ∪B) = P(A) +P(B)−P(A ∩B)

• P(A ∪B) ≤ P(A) +P(B)




• P(A ∪B) = P(A) +P(B)−P(A ∩B)

• P(A ∪B) ≤ P(A) +P(B)


Die roll example

X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

• Let B be the event: min(X,Y ) = 2

• Let M = max(X,Y )

• P(M = 1 | B) =

• P(M = 2 | B) =




X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

1

2

3

4

1,11,2

1,3

1,4

4,4


x

1

1

y

Probability calculation: discrete/finite example

Example

X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

• Let every possible outcome haveprobability 1/16

• P(X = 1) =


Example

X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

• Let every possible outcome have probability 1/16

• P(X = 1) =


Example

X = First roll

1 2 3 4

4

3

2

Y = Second

roll

1

• Let every possible outcome have probability 1/16

• P(X = 1) =

Let Z = min(X,Y )

• P(Z = 1) =

• P(Z = 2) =

• P(Z = 3) =

• P(Z = 4) =

Let Z = min(X,Y )

• P(Z = 1) =

• P(Z = 2) =

• P(Z = 3) =

• P(Z = 4) =Let Z = min(X,Y )

• P(Z = 1) =

• P(Z = 2) =

• P(Z = 3) =

• P(Z = 4) =

Discrete uniform law

• Let all sample points be equally likely

• Then,

P(A) =number of elements of A

total number of sample points

• Just count. . .

Continuous uniform law

• Two “random” numbers in [0,1].

x

1

1

y

• Uniform law: Probability = Area


− Assume Ω consists of n equally likely elements− Assume A consists of m elements

Then : P(A) =number of elements of A

number of elements of Ω=

m

n

• Just count. . .





m

n

• Just count. . .





m

n

• Just count. . .

prob =1

n


− Assume Ω consists of n equally likely elements− Assume A consists of k elements



k

n

• Just count. . .

prob =1

n


• Let all sample points be equally likely

• Then,

P(A) =number of elements of A

total number of sample points

• Just count. . .

Continuous uniform law


x

1

1

y


Probability calculation: continuous example


x

1

1

y




• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

ySample space: continuous example


• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



• (x, y) such that 0 ≤ x, y ≤ 1

x

1

1

y



x

1

1

y

• Uniform probabilty law: Probability = Area

P(x, y) | x+ y ≤ 1/2

=

P0.5,0.3)

=



x

1

1

y

• Uniform probability law: Probability = Area

P(x, y) | x+ y ≤ 1/2

=

P0.5,0.3)

=



x

1

1

y

• Uniform probability law: Probability = Area

P(x, y) | x+ y ≤ 1/2

=

P(0.5,0.3)

=

Probability calculation steps

• Specify the sample space

• Specify a probability law

• Identify an event of interest

• Calculate...

Probability calculation: discrete but infinite sample space

• Sample space: 1,2, . . .

– We are given P(n) =1

2n, n = 1,2, . . .

• P(outcome is even) =

1/2

…..

p

1/4

1/81/16

1 2 3 4

Probability calculation steps

• Specify the sample space

• Specify a probability law

• Identify an event of interest

• Calculate...




2n, n = 1,2, . . .


1/2

…..

p

1/4

1/81/16

1 2 3 4



– We are given P(n) = 2−n, n = 1,2, . . .

• Find P(outcome is even)

1/2

…..

p

1/4

1/81/16

1 2 3 4

• Solution:

P(2,4,6, . . .) = P(2) +P(4) + · · ·

=1

22+

1

24+

1

26+ · · · =

1

3

• Axiom needed:If A1, A2, . . . are disjoint events, then:

P(A1 ∪A2 ∪ · · · ) = P(A1) +P(A2) + · · ·




2n, n = 1,2, . . .


1/2

…..

p

1/4

1/81/16

1 2 3 4

• Solution:

P(2,4,6, . . .) = P(2) +P(4) + · · ·

=1

22+

1

24+

1

26+ · · · =

1

3


P(A1 ∪A2 ∪ · · · ) = P(A1) +P(A2) + · · ·



– We are given P(n) = 2−n, n = 1,2, . . .


1/2

…..

p

1/4

1/81/16

1 2 3 4

• Solution:

P(2,4,6, . . .) = P(2) +P(4) + · · ·

=1

22+

1

24+

1

26+ · · · =

1

3


P(A1 ∪A2 ∪ · · · ) = P(A1) +P(A2) + · · ·



– We are given P(n) = 2−n, n = 1,2, . . .


1/2

…..

p

1/4

1/81/16

1 2 3 4

• Solution:

P(2,4,6, . . .) = P(2) +P(4) + · · ·

=1

22+

1

24+

1

26+ · · · =

1

3


P(A1 ∪A2 ∪ · · · ) = P(A1) +P(A2) + · · ·



– We are given P(n) = 2−n, n = 1,2, . . .


1/2

…..

p

1/4

1/81/16

1 2 3 4

• Solution:

P(2,4,6, . . .) = P(2) +P(4) + · · ·

=1

22+

1

24+

1

26+ · · · =

1

3


P(A1 ∪A2 ∪ · · · ) = P(A1) +P(A2) + · · ·




2n, n = 1,2, . . .


1/2

…..

p

1/4

1/81/16

1 2 3 4

• Solution:

P(2,4,6, . . .) = P(2) +P(4) + · · ·

=1

22+

1

24+

1

26+ · · · =

1

3


P(A1 ∪A2 ∪ · · · ) = P(A1) +P(A2) + · · ·

Countable additivity axiom

• Strengthens the finite additivity axiom

If A1, A2, A3,. . . is an infinite sequence of events,then P(A1 ∪A2 ∪A3 ∪ · · · ) = P(A1) +P(A2) +P(A3) + · · ·



If A1, A2, A3,. . . is an infinite sequence of events,then P(A1 ∪A2 ∪A3 ∪ · · · ) = P(A1) +P(A2) +P(A3) + · · ·



Countable Additivity Axiom:

If A1, A2, A3,. . . is an infinite sequence of disjoint events,then P(A1 ∪A2 ∪A3 ∪ · · · ) = P(A1) +P(A2) +P(A3) + · · ·

Mathematical subtleties


• Additivity holds only for “countable” sequences of events

• The unit square (simlarly, the real line, etc.) are not countable

• “Area” is a legitimate probability law on the unit square,as long as we do not try to assign probabilities/areas to “very strange”sets



• The unit square (simlarly, the real line, etc.) is not countable

(its elements cannot be arranged in a sequence)

• “Area” is a legitimate probability law on the unit square,as long as we do not try to assign probabilities/areasto “very strange” sets



• The unit square (simlarly, the real line, etc.) is not countable





Countable Additivity Axiom:

If A1, A2, A3,. . . is an infinite sequence of disjoint events,then P(A1 ∪A2 ∪A3 ∪ · · · ) = P(A1) +P(A2) +P(A3) + · · ·



• The unit square (similarly, the real line, etc.) is not countable



Interpretations of probability theory

• A narrow view: a branch of math

– Axioms ⇒ theorems

“Thm:” “Frequency” of event A “is” P(A)

• Are probabilities frequencies?

– P(coin toss yields heads) = 1/2

– P(sole shooter of JFK) = 0.92

– P(a piece of equipment aboard the space shuttle fails) = 10−8

• Probability models are a framework fordescribing uncertainty

– Use for consistent reasoning

– Use for predictions, decisions





























– P(the president of . . . will be reelected) = 0.7

• Probabilities are often intepreted as:

– Betting preferences

– Description of beliefs

• A framework for dealing with experiments that have uncertain outcomes

– Rules for consistent reasoning

– Used for predictions and decisions
















• A framework for dealing with experiments that have uncertain outcomes


















• A framework for analyzing phenomena with uncertain outcomes
















• A framework for analyzing phenomena with uncertain outcomes








The role of probability theory

Real world

Data

Inference

Analysis

Predictions

Probability theory

Model building


Real world

Data

Inference

Analysis

Predictions

Probability theory

Model building


Real world

Data

Inference/Statistics

Analysis

Predictions

Probability theory

Model building


Real world

Data


Analysis

Predictions

Probability theory

Model building


Real world

Data


Analysis

Predictions

Probability theory

Models building


Real world

Data


Analysis

Predictions

Probability theory

Model building


Real world

Data


(Analysis)

Predictions

Probability theory

Models building


Real world

Data


(Analysis)

Predictions

Probability theory

Models building


Real world

Data


(Analysis)

Predictions Decisions

Probability theory

Models building

Documents

LECTURE 1: Probability models and axioms Readings: Lecture ... · LECTURE 1: Probability models and axioms • Readings: Sections 1.1, 1.2 Lecture outline • Sample space • Probability