Review of probability calculus
June 11, 2017Andreas Scheidegger
Eawag: Swiss Federal Institute of Aquatic Science and Technology
Random variables (RV)“Mathematical machines that generate numbers”
Completely described by the cumulative probability distributionfunction (cdf) or the probability distribution/density function(pdf).Some properties can be described by measures such as mean,variance, mode, . . .
Andreas Scheidegger Univariate Random Variables 1
Probability Distribution/Density Function (pdf)
PA fB
z1 z2 zn zzrzl
Discrete RV Probability to obtain a certain output.Continuous RV Proportional to the probability to obtain an output
close to a certain value.
Andreas Scheidegger Univariate Random Variables 2
Cumulative Distribution Function (cdf)
FA FB
z1 z2 zn zzrzl0
1
0
1
Discrete and continous RV Probability to obtain an output equalor smaller than a certain value.
Andreas Scheidegger Univariate Random Variables 3
cdf and pdf
Discrete RVs
Distribution function:
FA(z) = P(A ≤ z)
Probability distribution:
PA(zi ) for zi ∈ ΩA
Continous RVs
Distribution function:
FB(z) = P(B ≤ z)
Probability density:
fB(z) = ddz FB(z)
P(B ∈ [z1, z2]) =∫ z2
z1fB(z) dz
P(B ∈ [z , z + ∆]) ≈ ∆ · fB(z)
Andreas Scheidegger Univariate Random Variables 4
Characteristics of Random VariablesMeasures of LocationExpected value:
E[A] =∑
z∈ΩA
z PA(z) , E[B] =∫
ΩBz fB(z) dz
Median:
Med[Z ] : P(Z ≤ Med[Z ]) = P(Z > Med[Z ]) = Q0.5[Z ]
Quantiles:
Qp[Z ] : P(Z ≤ Qp[Z ]) = p and P(Z > Qp[Z ]) = 1− p
Mode:
Mode[A] = arg maxzi∈ΩA
PA(zi ) , Mode[B] = arg maxz∈ΩB
fB(z)
Andreas Scheidegger Univariate Random Variables 5
Characteristics of Random VariablesMeasures of Location
Expected value of a function of a RV:
E[g(A)] =∑
z∈ΩA
g(z)PA(z)
E[g(B)] =∫
ΩBg(z)fB(z) dz
Attention!
E[g(X )] 6= g (E[X ])
Andreas Scheidegger Univariate Random Variables 6
Characteristics of Random VariablesMeasures of Location
Expected value of a function of a RV:
E[g(A)] =∑
z∈ΩA
g(z)PA(z)
E[g(B)] =∫
ΩBg(z)fB(z) dz
Attention!
E[g(X )] 6= g (E[X ])
Andreas Scheidegger Univariate Random Variables 6
Characteristics of Random VariablesMeasures of Extension
Variance:Var[Z ] = E
[(Z − E[Z ]
)2]Standard Deviation:
SD[Z ] =√Var[Z ]
Inter-Quantile Range:
QRp[Z ] = Q(1+p)/2[Z ]− Q(1−p)/2[Z ]
Andreas Scheidegger Univariate Random Variables 7
Characteristics of Random Variables
E[aZ + b] = a E[Z ] + b
E[Z1 ± Z2] = E[Z1]± E[Z2]
Var[Z ] = E[Z 2]− E[Z ]2
Var[aZ + b] = a2 Var[Z ]
Only if Z1 and Z2 are independent:
Var[Z1 ± Z2] = Var[Z1] + Var[Z2]
Andreas Scheidegger Univariate Random Variables 8
Multivariate random variables
A
B
Andreas Scheidegger Multivariate Random Variables 9
Joint distribution
discrete RV:
PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)
E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.
continous RV:
fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a)
E.g.: fA,B(3, 1) : proportional to the probability to obtain arealization close to 3 and 1.
Andreas Scheidegger Multivariate Random Variables 10
Joint distribution
discrete RV:
PA,B(a, b) = PA|B(a|b) · PB(b) = PB|A(b|a) · PA(a)
E.g.: PA,B(3, 1) : Probability to obtain ai = 3 and bi = 1.
continous RV:
fA,B(a, b) = fA|B(a|b) · fB(b) = fB|A(b|a) · fA(a)
E.g.: fA,B(3, 1) : proportional to the probability to obtain arealization close to 3 and 1.
Andreas Scheidegger Multivariate Random Variables 10
Conditional Distributions
Discrete RV:PA|B(a|b) = PA,B(a, b)
PB(b)
Continuous RV:fA|B(a|b) = fA,B(a, b)
fB(b)
Andreas Scheidegger Multivariate Random Variables 11
Marginal distribution
Discrete random variables:
PA(a) =∑
b∈ΩB
PA,B(a, b)
Continuous random variables:
fA(a) =∫
ΩBfA,B(a, b) db
Andreas Scheidegger Multivariate Random Variables 12
Marginal distribution
Discrete random variables:
PA(a) =∑
b∈ΩB
PA,B(a, b)
Continuous random variables:
fA(a) =∫
ΩBfA,B(a, b) db
Andreas Scheidegger Multivariate Random Variables 12
Independence
Definition:FA,B(a, b) = FA(a) · FB(b)
Discrete random variables:
PA,B(a, b) = PA(a) · PB(b)
Continuous random variables:
fA,B(a, b) = fA(a) · fB(b)
Andreas Scheidegger Multivariate Random Variables 13
Bayes’ Theorem1
Discrete random variables
BecausePA|B(a|b)PB(b) = PB|A(b|a)PA(a)
we can write
PA|B(a|b) =PB|A(b|a)PA(a)
PB(b) =PB|A(b|a)PA(a)∑
a′∈ΩA
PB|A(b|a′)PA(a′)
1Bayes’ Theorem as we know it today was actually formulated by P. Laplacein 1774 and not by T. Bayes.
Andreas Scheidegger Multivariate Random Variables 14
Bayes’ TheoremContinuous random variables
fA|B(a|b) =fB|A(b|a)fA(a)
fB(b) =fB|A(b|a)fA(a)∫
fB|A(b|a′)fA(a′) da′
Andreas Scheidegger Multivariate Random Variables 15
Characteristics of Random VariablesDependencies
Variance-Covariance Matrix:
Var[Z] = E[(Z− E[Z]
)(Z− E[Z]
)T]Individual Covariances:
Cov[Zi ,Zj ] = E[(Zi − E[Zi ]
)(Zj − E[Zj ]
)]= Var[Z]i ,j
Correlation Matrix:
Cor[Z]i ,j = Cov[Zi ,Zj ]√Var[Zi ] · Var[Zj ]
Andreas Scheidegger Multivariate Random Variables 16
CorrelationCorrelation measures only linear dependencies!
Figure: Several sets of (x , y) points, with the correlation coefficient of xand y for each set. Source: Wikipedia.
Andreas Scheidegger Multivariate Random Variables 17
Short NotationFunction argument corresponds to RV
PA(a), PB|A(b|a) ←→ P(a), P(b|a)
fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)
Example:
fX1|X2,X3(x1|x2, x3) =fX2|X1(x2|x1)fX1|X3(x1|x3)
fX2(x2)
p(x1|x2, x3) = p(x2|x1)p(x1|x3)p(x2)
Andreas Scheidegger Notation 18
Short NotationFunction argument corresponds to RV
PA(a), PB|A(b|a) ←→ P(a), P(b|a)
fB(b), fA|B(a|b) ←→ f (b), f (a|b) or p(b), p(a|b)
Example:
fX1|X2,X3(x1|x2, x3) =fX2|X1(x2|x1)fX1|X3(x1|x3)
fX2(x2)
p(x1|x2, x3) = p(x2|x1)p(x1|x3)p(x2)
Andreas Scheidegger Notation 18
Directed Acyclic GraphsVisualize independence structure of RV
A
B
DC
p(A)
p(B | A)
p(C | A,B)
p(D | B)
e.g. A and D are conditionallyindependent. joint distribution:
p(A,B,C ,D) =p(A) p(B | A) p(C | A,B) p(D | B)
Andreas Scheidegger Notation 19
Directed Acyclic GraphsVisualize independence structure of RV
A
B
DC
p(A)
p(B | A)
p(C | A,B)
p(D | B)
e.g. A and D are conditionallyindependent. joint distribution:
p(A,B,C ,D) =p(A) p(B | A) p(C | A,B) p(D | B)
Andreas Scheidegger Notation 19
Normal distribution
Andreas Scheidegger Normal distributions 20
Central Limit Theorem
Lets X1,X2, . . . be independent and identically distributed RVswith mean µ and a finite variance σ2. Further we defineSn = X1 + X2 + . . .+ Xn, that has a mean nµ and variance nσ2.Then the standardized RV
Zn = Sn − nµ√nσ
is standard normal distributed for n→∞.
Andreas Scheidegger Normal distributions 21
Central Limit Theorem Example
n = 1
Den
sity
−2 −1 0 1 2
0.0
0.4
0.8
n = 2
Den
sity
−2 −1 0 1 20.
00.
30.
6
n = 3
Den
sity
−2 −1 0 1 2
0.0
0.3
0.6
n = 4
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
n = 5
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
n = 6
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
n = 7
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
n = 8
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
n = 9
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
n = 10
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
n = 11
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
n = 12
Den
sity
−2 −1 0 1 2
0.0
0.2
0.4
Andreas Scheidegger Normal distributions 22
Relationships of Univariate Distributions
Figure 1. Univariate distribution relationships.
The American Statistician, February 2008, Vol. 62, No. 1 47
Dow
nloa
ded
by [
Lib
4RI]
at 0
2:24
28
May
201
3
Figure 1. Univariate distribution relationships.
The American Statistician, February 2008, Vol. 62, No. 1 47
Dow
nloa
ded
by [
Lib
4RI]
at 0
2:24
28
May
201
3
From: Leemis, L. M. and McQueston, J. T. (2008) Univariate distributionrelationships. The American Statistician, 62(1), 45–53. → Link
Andreas Scheidegger Normal distributions 23
Multivariate Normal Distribution
Density of a multivariate Normal distribution of dimension n with amean vector µ and a variance-covariance matrix Σ:
Z ∼ N(µ,Σ)
fN(µ,σ,R)(z) = 1(2π)n/2
1| Σ |1/2 exp
(−12(z− µ)TΣ−1(z− µ)
)
Andreas Scheidegger Normal distributions 24
Multivariate Normal DistributionProperties
All marginals are normal distributed
Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )
Linear transformation:
Z ∼ N(µ,Σ)⇒ AZ + b ∼ N(Aµ + b,AΣAT)
Conditional distribution:
Z =(
XY
)∼ N
(µXµY
,
[ΣX,X ΣX,YΣTX,Y ΣY,Y
])
⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1
Y,Y(y− µY),ΣX,X −ΣX,YΣ−1Y,YΣT
X,Y
)
Andreas Scheidegger Normal distributions 25
Multivariate Normal DistributionProperties
All marginals are normal distributed
Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )
Linear transformation:
Z ∼ N(µ,Σ)⇒ AZ + b ∼ N(Aµ + b,AΣAT)
Conditional distribution:
Z =(
XY
)∼ N
(µXµY
,
[ΣX,X ΣX,YΣTX,Y ΣY,Y
])
⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1
Y,Y(y− µY),ΣX,X −ΣX,YΣ−1Y,YΣT
X,Y
)
Andreas Scheidegger Normal distributions 25
Multivariate Normal DistributionProperties
All marginals are normal distributed
Z ∼ N(µ,Σ) ⇒ Zi ∼ N(µi ,Σi ,i )
Linear transformation:
Z ∼ N(µ,Σ)⇒ AZ + b ∼ N(Aµ + b,AΣAT)
Conditional distribution:
Z =(
XY
)∼ N
(µXµY
,
[ΣX,X ΣX,YΣTX,Y ΣY,Y
])
⇒ X | Y (y) ∼ N(µX + ΣX,YΣ−1
Y,Y(y− µY),ΣX,X −ΣX,YΣ−1Y,YΣT
X,Y
)Andreas Scheidegger Normal distributions 25
Further Generalization
one-dimensional
n-dimensional
A
B what’s next?
Andreas Scheidegger Random Processes 26
Discrete random process“Random vectors with infinity large number of elements”
(0.11, 10.78, -10.24, -3.90, 5.91, ...)(-1.11, -4.06, -8.64, -0.92, -2.27, ...)
(0.76, -8.54, 0.81, 2.03, 12.9, ...)
Andreas Scheidegger Random Processes 27
Continous random processes“Random functions”
Andreas Scheidegger Random Processes 28
What is a Probability?Interpretation of probabilities
1. The probability for “head” is 1/2.2. The probability that it rains tomorrow is 30%.
Frequentist
1. The frequency that “head”occurs if the randomexperiment is repeated.
2. “Rain tomorrow” is not arepeatable experiment
Subjective
1. Somebody’s belief that acoin toss results in “head”,given his/her experience.
2. Somebody’s belief that itrains tomorrow, givenhis/her experience.
Other probability interpretations:→ http://www.webcitation.org/6YupVo9zG
Andreas Scheidegger Interpretation 29
What is a Probability?Interpretation of probabilities
1. The probability for “head” is 1/2.2. The probability that it rains tomorrow is 30%.
Frequentist
1. The frequency that “head”occurs if the randomexperiment is repeated.
2. “Rain tomorrow” is not arepeatable experiment
Subjective
1. Somebody’s belief that acoin toss results in “head”,given his/her experience.
2. Somebody’s belief that itrains tomorrow, givenhis/her experience.
Other probability interpretations:→ http://www.webcitation.org/6YupVo9zG
Andreas Scheidegger Interpretation 29
What is a Probability?Interpretation of probabilities
1. The probability for “head” is 1/2.2. The probability that it rains tomorrow is 30%.
Frequentist
1. The frequency that “head”occurs if the randomexperiment is repeated.
2. “Rain tomorrow” is not arepeatable experiment
Subjective
1. Somebody’s belief that acoin toss results in “head”,given his/her experience.
2. Somebody’s belief that itrains tomorrow, givenhis/her experience.
Other probability interpretations:→ http://www.webcitation.org/6YupVo9zG
Andreas Scheidegger Interpretation 29
Summary
joint = conditional x marginal
f (a, b) = f (a|b) f (b) = f (b|a) f (a)
Marginals:
f (a) =∫
f (a, b) db =∫
f (a|b) f (b) db
More information in Appendix A.2 – A.5.
Andreas Scheidegger Summary 30
Common distributions
Andreas Scheidegger Summary 31
Implemented distribution in R
For all distributions four functions are implemented:
d__(x, ...) pdf evaluated at x
p__(x, ...) cdf evaluated at x
q__(p, ...) p-th quantiler__(n, ...) sample n random numbers
beta *beta binomial *binomCauchy *cauchy chi-squared *chisqexponential *exp F *fgamma *gamma geometric *geomhypergeometric *hyper log-normal *lnormmultinomial *multinom negative binomial *nbinomnormal *norm Poisson *poisStudent’s t *t uniform *unifWeibull *weibull
Andreas Scheidegger Summary 32
Normal DistributionDensity
Z ∼ N(µ, σ) fN(µ,σ)(z) = 1σ√2π
exp(−(z − µ)2
2σ2
)
−3 −2 −1 0 1 2 3
01
23
45
Normal with mean=0
z
f
sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Normal with mean=0
z
F
sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4
Andreas Scheidegger Summary 33
Normal DistributionProperties
E[N(µ, σ)
]= Mode
[N(µ, σ)
]= Med
[N(µ, σ)
]= µ
SD[N(µ, σ)
]= σ
Central limit theorem:Lets X1,X2, . . . be independent and identically distributed RVswith mean µ and a finite variance σ2. Further we defineSn = X1 + X2 + . . .+ Xn, that has a mean nµ and variance nσ2.Then the standardized RV
Zn = Sn − nµ√nσ
is standard normal distributed for n→∞.Andreas Scheidegger Summary 34
Lognormal Distribution
Definition:Z = exp(X ) , X ∼ N(m, s)
Density:Z ∼ LN(µ, σ)
fLN(µ,σ)(z) =
1√2π
1sz exp
−12
(log(zµ
)+ s2
2
)2
s2
for z > 0
0 for z ≤ 0
with
s =
√log(1 + σ2
µ2
)
Andreas Scheidegger Summary 35
Lognormal Distribution
0.0 0.5 1.0 1.5 2.0 2.5 3.0
01
23
45
Lognormal with mean=1
z
f
sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4
0.0 0.5 1.0 1.5 2.0 2.5 3.00.
00.
20.
40.
60.
81.
0
Lognormal with mean=1
z
F
sd = 0.1sd = 0.25sd = 0.5sd = 1sd = 2sd = 4
Andreas Scheidegger Summary 36
Lognormal DistributionProperties
E[LN(µ, σ)
]= µ
Mode[LN(µ, σ)
]= µ(
1 + σ2
µ2
) 32
Med[LN(µ, σ)
]= µ√
1 + σ2
µ2
SD[LN(µ, σ)
]= σ
Andreas Scheidegger Summary 37
Lognormal DistributionR implementation
Attention: The lognormal distribution in R is defined with m and s(the mean and standard deviation of X )!
The code below computes the arguments if mean µ and standarddeviation σ are given:## conversion , ’mu ’ and ’sigma ’ givenmeanlog <- log(mu) - 0.5*log (1 + (sigma/mu )^2)sdlog <- sqrt(log (1 + sigma ^2/(mu ^2)))
## generate 1000 random samplesrlnorm (1000 , meanlog =meanlog , sdlog=sdlog)
Andreas Scheidegger Summary 38
χ2 Distribution
Definition:Z =
n∑i=1
X 2i , Xi ∼ N(0, 1)
Density:
Z ∼ χ2n fχ2
n(z) = z(n−2)/2 exp(−z/2)
2n/2 Γ(n/2)
Andreas Scheidegger Summary 39
χ2 Distribution
0 2 4 6 8 10 12 14
0.0
0.1
0.2
0.3
0.4
0.5
0.6
χ2
z
f
df = 1df = 2df = 3df = 4df = 5df = 10
0 2 4 6 8 10 12 140.
00.
20.
40.
60.
81.
0
χ2
z
F
df = 1df = 2df = 3df = 4df = 5df = 10
Andreas Scheidegger Summary 40
χ2 DistributionProperties
E[χ2
n]
= n
Mode[χ2
n]
= n − 2 for n ≥ 2
SD[χ2
n]
=√2n
Andreas Scheidegger Summary 41
F Distribution
Definition:
Z =
XnYm
, X ∼ χ2n , Y ∼ χ2
m
Density:
Z ∼ Fn,m fFn,m (z) =Γ((n + m)/2
)(n/m)n/2 z(n−2)/2
Γ(n/2
)Γ(m/2
)
Andreas Scheidegger Summary 42
F Distribution
0 1 2 3 4
0.0
0.2
0.4
0.6
0.8
1.0
1.2
F
z
f
df1 = 2 df2 = 10df1 = 3 df2 = 10df1 = 5 df2 = 10df1 = 5 df2 = 100
0 1 2 3 40.
00.
20.
40.
60.
81.
0
F
z
F
df1 = 2 df2 = 10df1 = 3 df2 = 10df1 = 5 df2 = 10df1 = 5 df2 = 100
Andreas Scheidegger Summary 43
F DistributionProperties
E[Fn,m
]= m
m − 2 for m > 2
Mode[Fn,m
]= m(n − 2)
n(m + 2) for n > 2
SD[Fn,m
]=√
2m2(n + m − 2)n(m − 2)2(m − 4) for m > 4
Andreas Scheidegger Summary 44
t Distribution
Definition:
Z = X√Yn
, X ∼ N(0, 1) , Y ∼ χ2n
Density:
Z ∼ tn ftn (z) =Γ((n + 1)/2
)√π n Γ
(n/2
)(1 + z2/n)(n+1)/2
Andreas Scheidegger Summary 45
t Distribution
−6 −4 −2 0 2 4 6
0.0
0.1
0.2
0.3
0.4
0.5
t
z
f
df = 1df = 2df = 4df = 10df = 100
−6 −4 −2 0 2 4 60.
00.
20.
40.
60.
81.
0
t
z
F
df = 1df = 2df = 4df = 10df = 100
Andreas Scheidegger Summary 46
t DistributionProperties
E[tn]
= Mode[tn]
= 0 for n > 1
SD[tn]
=√ n
n − 2 for n > 2
Andreas Scheidegger Summary 47
Uniform DistributionDensity
Z ∼ U(zmin, zmax) fU(zmin,zmax) = 1zmax − zmin
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Uniform with mean=0
z
f
max = 1max = 2
−3 −2 −1 0 1 2 3
0.0
0.2
0.4
0.6
0.8
1.0
Uniform with mean=0
z
Fmax = 1max = 2
Andreas Scheidegger Summary 48
Uniform DistributionProperties
E[U(zmin, zmax)
]= zmin + zmax
2
Med[U(zmin, zmax)
]= zmin + zmax
2
SD[U(zmin, zmax)
]= zmax − zmin
2√3
Andreas Scheidegger Summary 49