Introduction to Probability · Introduction to Probability Chandrashekar L Department of Computer Science and Automation Indian Institute of Science [email protected]

Introduction to Probability

Chandrashekar L

Department of Computer Science and AutomationIndian Institute of [email protected]

Coins, Dice and Cards

What is the Probability of obtaining

� Atleast 1 Tail in 2 Tosses of a coin.




34

(1)

� An odd number in a single roll of a die.




34

(1)


36

=12

(2)

� A ’Spade’ in a pack of shuffled cards.




34

(1)


36

=12

(2)

� A ’Spade’ in a pack of shuffled cards.

1352

=14

(3)

One step at a time

2 Tosses of a coin

� What all can come??

One step at a time

2 Tosses of a coin

� What all can come??HH;HT;TH;TT

One step at a time

2 Tosses of a coin


� What is that we are interested in??

One step at a time

2 Tosses of a coin


� What is that we are interested in??HT;TH;TT

One step at a time

2 Tosses of a coin



� Probability of Event of interest

One step at a time

2 Tosses of a coin




number of favourable outcomesnumber of all possible outcomes

(4)

Some Terminology


Some Terminology


Sample Space S=ffHHg,fHTg,fTHg,fTTgg

Some Terminology



� What is that we may be interested in??

Some Terminology




Set of Events E

Some Terminology




Set of Events E

E =

n�fHHg

,�fHTg

,�fTHg

,�fTTg

,�fHHg,fHTg

,

�fHHg,fTHg

,�fHHg,fTTg

,�fHTg,fTHg

,�fHTg,fTTg

,

�fTHg,fTTg

,�fHHg,fHTg,fTHg

,�fHHg,fHTg,fTTg

,

�fHHg,fTHg,fTTg

,�fHTg,fTHg,fTTg

,

�fHHg,fHTg,fTHg,fTTg

,f�g

o

Some Terminology




Set of Events E

E =

n�fHHg

,�fHTg

,�fTHg

,�fTTg

,�fHHg,fHTg

,

�fHHg,fTHg

,�fHHg,fTTg

,�fHTg,fTHg

,�fHTg,fTTg

,

�fTHg,fTTg

,�fHHg,fHTg,fTHg

,�fHHg,fHTg,fTTg

,

�fHHg,fTHg,fTTg

,�fHTg,fTHg,fTTg

,


,f�g

o


Some Terminology




Set of Events E

E =

n�fHHg

,�fHTg

,�fTHg

,�fTTg

,�fHHg,fHTg

,

�fHHg,fTHg

,�fHHg,fTTg

,�fHTg,fTHg

,�fHTg,fTTg

,

�fTHg,fTTg

,�fHHg,fHTg,fTHg

,�fHHg,fHTg,fTTg

,

�fHHg,fTHg,fTTg

,�fHTg,fTHg,fTTg

,


,f�g

o


The probability assignment P : E ! [0;1℄

Abstraction

How to put an elephant inside a refrigrator?

Abstraction


� Open the refrigrator.

Abstraction



� Put the Elephant inside.

Abstraction




� Close the refrigrator.

Abstraction





How to compute probabilities

� Identify S.

Abstraction






� Identify S.

� Identify E.

Abstraction






� Identify S.

� Identify E.

� Do the assignment P.

Abstraction






� Identify S.

� Identify E.

� Do the assignment P.

Abstraction and Axioms

A probability space is a 3-tuple (S,E,P) with the followingconditions



1 0 � P(E1) � 1, for any event E1 2 E.




2 P(S) = 1.




2 P(S) = 1.

3 For any sequence E1;E2; : : : of disjoint sets (mutuallyexclusive) we have

P([iEi) =

X

i

P(Ei): (5)




2 P(S) = 1.


P([iEi) =

X

i

P(Ei): (5)

4 The set E satisfies the following conditions1 E1 2 E ) Ec

1 2 E.




2 P(S) = 1.


P([iEi) =

X

i

P(Ei): (5)


1 2 E.2 E1; E2 2 E ) (E1 \ E2) 2 E.




2 P(S) = 1.


P([iEi) =

X

i

P(Ei): (5)


1 2 E.2 E1; E2 2 E ) (E1 \ E2) 2 E.3 For any sequence E1; E2; : : : 2 E ) [

iEi 2 E.




2 P(S) = 1.


P([iEi) =

X

i

P(Ei): (5)


1 2 E.2 E1; E2 2 E ) (E1 \ E2) 2 E.3 For any sequence E1; E2; : : : 2 E ) [

iEi 2 E.

Certain Properties

� If E1;E2 2 E, with E1 � E2, then P(E1) � P(E2).

Certain Properties


� P(E1 [ E2) = P(E1) + P(E2)� P(E1 \ E2).

Certain Properties


� P(E1 [ E2) = P(E1) + P(E2)� P(E1 \ E2).

� P(S) = 1.

Certain Properties


� P(E1 [ E2) = P(E1) + P(E2)� P(E1 \ E2).

� P(S) = 1.

� P(�) = 0.

Why Abstract

E1

E2

E3

S

Figure: Abstract Probability Space

� World is not as simple as Coins, Dice and Cards.

Why Abstract

E1

E2

E3

S

Figure: Abstract Probability Space

� World is not as simple as Coins, Dice and Cards.� Any abstraction is just as useful as

(a + b)2 = a2 + b2 + 2ab.

Conditional Probability

E1

E2

E3

E4

E5

S

A

Here we are interested in probabilities of events given an eventA has occured.


E1

E2

E3

E4

E5

S

A


P(E1jA) =P(E1 \ A)

P(A)(6)


E1

E2

E3

E4

E5

S

A


P(E1jA) =P(E1 \ A)

P(A)(6)

Independent Events

Events E1 and E2 are independent when

P(E1 \ E2) = P(E1)� P(E2): (7)

Independent Events


P(E1 \ E2) = P(E1)� P(E2): (7)

What is the probability of obtaining atleast one ‘Tail’ when acoin is Tossed twice. The probability of obtaining ‘Tail’ in asingle Toss is p.

Independent Events


P(E1 \ E2) = P(E1)� P(E2): (7)


P(ffTTg,fHTg,fTHgg) = p � p + (1� p)� p + p � (1� p)(8)

Independent Events


P(E1 \ E2) = P(E1)� P(E2): (7)


P(ffTTg,fHTg,fTHgg) = p � p + (1� p)� p + p � (1� p)(8)

Examples of independent events� Getting ‘Tail’ in the first toss is independent of getting

‘Head’ in the second toss.� When I toss a coin and roll a die simultaneously the

outcomes are independent of each other.� How many glasses of water I drink is independent of

whether it will rain Tomorrow.The above examples are Correct but Useless

Bayes’ Formula

E1;E2; : : : ;En be disjoint sets such that [iEi = S, let A be any

event

P(A) = P(AjE1)� P(E1) + P(AjE2)� P(E2) + : : :+ P(AjEn)� P(En)

(9)

Bayes’ Formula

E1;E2; : : : ;En be disjoint sets such that [iEi = S, let A be any

event

P(A) = P(AjE1)� P(E1) + P(AjE2)� P(E2) + : : :+ P(AjEn)� P(En)

(9)

This is similar to the cut-off mark calulations.

E1

E2

E3

E4

E5

S

A

Bayes’ Formula Contd

This leads to the following relation

P(AjB) =P(A \ B)

P(B)

=P(BjA)� P(A)

P(BjA)� P(A) + P(BjAc)� P(Ac): (10)

Independent Events Contd

S = [0;1℄;E1 = [0;12℄;E2 = [0;

14℄ [ [

12;34℄;E3 = [

14;34℄: (11)

E3

E2

E1

S

Independent Events Contd

Random Variable

A random variable is not

� Random.

Random Variable


� Random.

� A Variable.

What is it?

Random Variable


� Random.

� A Variable.

What is it?It is a function X : S ! R.

S

x1

E2x2

E1

X

Coin, Dice and Cards

Rs.1 on getting an odd number and 2 on getting a even numberon roll of a die. We can define X as follows


Rs.1 on getting an odd number and 2 on getting a even numberon roll of a die. We can define X as follows

S =f1;2;3;4;5;6g

X (1) =1

X (2) =2

X (3) =1

X (4) =2

X (5) =1

X (6) =2

(12)


One can define another R.V. Y where Rs.x=2 when x shows up.


One can define another R.V. Y where Rs.x=2 when x shows up.

S =f1;2;3;4;5;6g

Y (1) =12

Y (2) =1

Y (3) =32

Y (4) =2

Y (5) =52

Y (6) =3

(13)

P(X = 2), P(X = 1;Y = 1), P(X = 1;Y = �1).

Probability Mass Function

Associated with a R.V. X we are interested in its probabilitymass function (pmf)


Associated with a R.V. X we are interested in its probabilitymass function (pmf)fX (x) = P(X = x)


Associated with a R.V. X we are interested in its probabilitymass function (pmf)fX (x) = P(X = x)Let us compute the pmfs for X and Y .

fX (1) = P(X = 1) =12; fX (2) = P(X = 2) =

12

(14)

fY (1) = P(Y = 1) =12; fY (2) = P(Y = 2) =

12

fY (3) = P(Y = 3) =12; fY (4) = P(Y = 4) = 0

fY (5) = P(Y = 5) =0; fY (6) = P(Y = 6) = 0

fY (12) = P(Y =

12) =

12; fY (

32) = P(Y =

32) =

12

fY (52) = P(Y =

52) =

12

(15)

f (x) can also be thought of as the frequency of occurence of x .

Cumulative Distribution Function

Sometimes we are interested in the quantity F (x) = P(X � x).

Cumulative Distribution Function

Sometimes we are interested in the quantity F (x) = P(X � x).This is the cumulative distribution function

0 0:5 1 1:5 2 2:5

0

0:2

0:4

0:6

0:8

1

x

FX

(x

)

cdf of X

0 1 2 3

0

0:2

0:4

0:6

0:8

1

y

FY

(y)

cdf of Y

Useful Random Variables

Bernoulli Random Variable X is either Success-1 or Failure-0.Thus X takes values 1 or 0.

P(X = 0) = 1� p;

P(X = 1) = p;0 � p � 1:



P(X = 0) = 1� p;

P(X = 1) = p;0 � p � 1:

Binomial Random Variable X is the number Successes inn-independent trials. X takes values from 1 to n with

P(X = k) =�

nk

�

pk (1� p)n�k : (16)



P(X = 0) = 1� p;

P(X = 1) = p;0 � p � 1:


P(X = k) =�

nk

�

pk (1� p)n�k : (16)

Geometric Random Variable X is the number of trials neededto obtain Success. X takes values 1;2;3; : : : with

P(x = k) = (1� p)k�1p: (17)



P(X = 0) = 1� p;

P(X = 1) = p;0 � p � 1:


P(X = k) =�

nk

�

pk (1� p)n�k : (16)

Geometric Random Variable X is the number of trials neededto obtain Success. X takes values 1;2;3; : : : with

P(x = k) = (1� p)k�1p: (17)

Poisson Random Variable X takes values 0;1;2;3; : : : with

P(X = k) = exp(��)(�)k

k !(18)

Useful to model number of arrivals in unit time.


0 0:2 0:4 0:6 0:8 10:2

0:4

0:6

0:8

x

p(X

Bernoulli

0 2 4 6 8 10

0

0:1

0:2

0:3

xp(

X

Binomial

0 2 4 6 8 10

0

0:1

0:2

0:3

0:4

x

p(X

Poisson

2 4 6 8 10

0

0:2

0:4

0:6

0:8

x

p(X

Geometric

Continuous Random Variables

� Till now the Random Variables assumed only discretevalues (finite or countable).



� Consider a Random Variable that takes a continuum ofvalues.




� Then P(X = x) is 0 for all x and pmf does not make sense.





� However the cdf still makes sense.






� Equivalent to the pmf we define the probability densityfunction (pdf).






� Equivalent to the pmf we define the probability densityfunction (pdf).

� The pdf is derivative of the cdf.

Useful C.R.V

Uniform R.V X , where X takes values in (a;b).

Useful C.R.V

Uniform R.V X , where X takes values in (a;b).f (x) = 1

b�a ; x 2 (a;b) and fX (x) = 0; x =2 (a;b)

Useful C.R.V


b�a ; x 2 (a;b) and fX (x) = 0; x =2 (a;b)Gaussian R.V, where X takes values from �1 to +1.

Useful C.R.V



f (x) =1

p2��

exp( x��) 2

2�2 (19)

Exponential R.V, where X takes values in [0;1),

Useful C.R.V



f (x) =1

p2��

exp( x��) 2

2�2 (19)


f (x) = �exp��x8x � 0: (20)

Gamma R.V, where X takes values in [0;1),

Useful C.R.V



f (x) =1

p2��

exp( x��) 2

2�2 (19)


f (x) = �exp��x8x � 0: (20)

Gamma R.V, where X takes values in [0;1),

f (x) =�exp��x(�x)(n�1)

(n � 1)!: (21)

Useful Continuous Random Variables

0 0:2 0:4 0:6 0:8 1

0:9

1

1:1

1:2

x

f(x)

Uniform

�6 �4 �2 0 2 4 6

0

0:1

0:2

0:3

0:4

xf(

x)

Gaussian

�5 0 5 10 15 20

0

5 � 10�2

0:1

x

f(x)

Gamma

�5 0 5 10 15 20

0

0:2

0:4

0:6

0:8

x

f(X

)

Exponential

Mean and Variance

If X is a D.R.V. , the mean or Expectation of X is

E[X ℄ =X

x

xP(X = x): (22)

Mean and Variance


E[X ℄ =X

x

xP(X = x): (22)

If X is a C.R.V. , the mean or Expectation of X is

E[X ℄ =

1Z

�1

xf (x)dx : (23)

Mean and Variance


E[X ℄ =X

x

xP(X = x): (22)


E[X ℄ =

1Z

�1

xf (x)dx : (23)

Properties of Mean

� E(g(X )) =P

xg(x)P(X = x) for D.R.V.,

E(g(X )) =1R

�1g(x)f (x)dx .

Mean and Variance


E[X ℄ =X

x

xP(X = x): (22)


E[X ℄ =

1Z

�1

xf (x)dx : (23)

Properties of Mean

� E(g(X )) =P


E(g(X )) =1R

�1g(x)f (x)dx .

� E(aX + bY ) = aE(X ) + bE(Y ).

Mean and Variance


E[X ℄ =X

x

xP(X = x): (22)


E[X ℄ =

1Z

�1

xf (x)dx : (23)

Properties of Mean

� E(g(X )) =P


E(g(X )) =1R

�1g(x)f (x)dx .

� E(aX + bY ) = aE(X ) + bE(Y ).

Variance

� The Variance of X is given by Var(X ) = E[(X � E(X ))2℄

Variance

� The Variance of X is given by Var(X ) = E[(X � E(X ))2℄

� Var(X ) is always positive.

� Verify that Var(X ) = E(X 2)� (E(X ))2.

Two Random Variables

S

x1

E2x2

E1

X Y

S

E3

E4y2

y1

X ; Y

S

x1; y1

E3

E4

E2x2; y2

E1


We have joint cdf, pmf, pdf defined as follows

� Joint cdf FXY (x ; y) = P(X � x ;Y � y).




� Joint pmf meaning fXY (x ; y) = P(X = x ;Y = y).





� Joint pdf fXY (x ; y), where FXY (x ; y) =yR

�1

xR

�1fXY (x ; y)dxdy .






�1

xR


We have the marginal cdfs, pdfs and pmfs as follows.

� FX (x) = P(X � x) =1R

�1fXY (x ; y)dy for a C.R.V.






�1

xR



� FX (x) = P(X � x) =1R


� fX (x) = P(X = x) =P

yP(X = x ;Y = y). for a D.R.V.






�1

xR



� FX (x) = P(X � x) =1R


� fX (x) = P(X = x) =P


Given the Marginals the joint distribution cannot be determinedin general.






�1

xR



� FX (x) = P(X � x) =1R


� fX (x) = P(X = x) =P


Given the Marginals the joint distribution cannot be determinedin general.We also can talk about the conditional fX jY=y (x), distribution ofX given that Y assumes a value y .

Independent Random Variables

� Joint cdf

FXY (x ; y) = P(X � x ;Y � y)

= P(X � x)P(Y � y)

= FX (x)FY (y)

Independent Random Variables

� Joint cdf

FXY (x ; y) = P(X � x ;Y � y)

= P(X � x)P(Y � y)

= FX (x)FY (y)

� The pdf (also the pmf) fXY (x ; y) = fX (x)fY (y).

In case of independent R.Vs

� We can determine the joints given the marginals.

� The conditional distribution is same as the marginal.

Identically Distributed R.Vs

Two R.Vs X and Y are said to be identically distributed if


Two R.Vs X and Y are said to be identically distributed ifFX (x) = FY (x).Consider the following R.Vs defined on S = [0;1℄.


Two R.Vs X and Y are said to be identically distributed ifFX (x) = FY (x).Consider the following R.Vs defined on S = [0;1℄.X (s) = 1; s 2 [0;0:5), X (s) = 0; s 2 [0:5;1℄.


Two R.Vs X and Y are said to be identically distributed ifFX (x) = FY (x).Consider the following R.Vs defined on S = [0;1℄.X (s) = 1; s 2 [0;0:5), X (s) = 0; s 2 [0:5;1℄.Y = 1� X

P(X = 1) = 0:5;P(Y = 1) = 0:5

P(X = 0) = 0:5;P(Y = 0) = 0:5

(24)

Illustration with Dice

X -outcome of first toss, Y -outcome of second toss.


X -outcome of first toss, Y -outcome of second toss.

(1,1) (1,2) (1,3) (1,4) (1,5) (1,6)

(2,1) (2,2) (2,3) (2,4) (2,5) (2,6)

(3,1) (3,2) (3,3) (3,4) (3,5) (3,6)

(4,1) (4,2) (4,3) (4,4) (4,5) (4,6)

(5,1) (5,2) (5,3) (5,4) (5,5) (5,6)

(6,1) (6,2) (6,3) (6,4) (6,5) (6,6)

YX


Z -sum of the two outcomes.

2

3

3

4

4

4

5

5

5

5

6

6

6

6

6

7

7

7

7

7

7

8

8

8

8

8

9

9

9

9

10

10

10

11

11

12

� Which R.Vs are independent?


� Which R.Vs are identically distributed?



� What is the marginal distribution of X , Y and Z?




� Is fXZ (x ; z) = fX (x)fZ (z)?





� Is fXY (x ; y) = fX (x)fY (y)?






� What is the conditional distributionfZ jX=3; fZ jX>3; fZ jX>3;Y<5; fZ jX=5;Y=5?






� What is the conditional distributionfZ jX=3; fZ jX>3; fZ jX>3;Y<5; fZ jX=5;Y=5?

Conditional Mean and Covariance

The Conditional Mean denoted as E(X jY ) is a function of Y ,say h(Y ).



h(y) = E(X jy = y)

=X

xfX jY=y (x) for D.R.V

=

Z

xfX jY=y (x)dx for C.R.V



h(y) = E(X jy = y)

=X


=

Z


One can also check that E(h(Y )) = E(X ).



h(y) = E(X jy = y)

=X


=

Z


One can also check that E(h(Y )) = E(X ).The covariance between R.Vs X and Y is defined as



h(y) = E(X jy = y)

=X


=

Z


One can also check that E(h(Y )) = E(X ).The covariance between R.Vs X and Y is defined as

E�(X � E(X ))(Y � E(Y ))

�(25)

Approximation of a Random Variable X

� What is best constant that approximates X , i.e.,min

aE�(X � a)2

�



aE�(X � a)2

�

a = E(x):

� What is the best possible function of Y that approximatesX , i.e., min

gE�(X � g)2

�



aE�(X � a)2

�

a = E(x):


gE�(X � g)2

�

g = E(X jY ):

� What is the best possible linear function of Y thatapproximates X , i.e., min

a;bE�X � (bY + a)

�2



aE�(X � a)2

�

a = E(x):


gE�(X � g)2

�

g = E(X jY ):

� What is the best possible linear function of Y thatapproximates X , i.e., min

a;bE�X � (bY + a)

�2

b =CoVar(X ;Y )

Var(Y )a = E(X � bY )

Sum of Two Independent Random Variables

Let X and Y be independent R.Vs, Let Z = X + Y .



fZ (z) =1R

�1fX (x)fY (z � x)dx



fZ (z) =1R

�1fX (x)fY (z � x)dx

0 0:2 0:4 0:6 0:8 1

0:9

1

1:1

1:2

x

f(x)

X1

0 0:5 1 1:5

2

4

6

8

10

x

f(x)

X1 + X2

0 0:5 1 1:5 2 2:5

0

20

40

60

80

x

f(x)

X1 + X2 + X3

Sum of infinitely many random Variables

If X1;X2; : : : ;Xn : : : are independent identically distributedrandom variables

� limn!1

1n

nP

i=1Xi ! E(X1).

Sum of infinitely many random Variables

If X1;X2; : : : ;Xn : : : are independent identically distributedrandom variables

� limn!1

1n

nP

i=1Xi ! E(X1).

� If Y = limn!1

1pn

nP

i=1Xi . Then fY isN (0;1), i.e., Gaussian R.V

with mean 0 and variance 1.

Documents

Introduction to Probability · Introduction to Probability Chandrashekar L Department of Computer Science and Automation Indian Institute of Science [email protected]