Click here to load reader
Upload
solo-hermelin
View
217
Download
10
Tags:
Embed Size (px)
DESCRIPTION
This is a lecture I've put together summarizing the topics of mathematical probability. The presentation is at a Undergraduate in Science (Math, Physics, Engineering) level.. In the Upload Process a part of Figures and Equations are missing. For a better version of this presentation please visit my website at http://solohermelin.com at Math Folder and open Probability presentation. Please feel free to comment and suggest improvements to [email protected]!
Citation preview
1
Probability
SOLO HERMELIN
Updated: 6.06.09
2
SOLO
Table of Content
Probability
Set TheoryProbability Definitions
Theorem of AdditionConditional Probability
Total Probability Theorem
Statistical Independent Events Theorem of Multiplication
Conditional Probability - Bayes Formula
Random Variables
Probability Distribution and Probability Density Functions
Conditional Probability Distribution and Conditional Probability Density FunctionsExpected Value or Mathematical Expectation Variance Moments Functions of one Random VariableJointly, Distributed Random Variables Characteristic Function and Moment-Generating Function
Existence Theorems (Theorem 1 & Theorem 2)
3
SOLO
Table of Content (continue - 1)
Probability
Law of Large Numbers (History)Markov’s Inequality Chebyshev’s Inequality Bienaymé’s Inequality Chernoff’s and Hoeffding’s Bounds
Chernoff’s BoundHoeffding’s Bound
Convergence Concepts The Law of Large Numbers Central Limit Theorem
Bernoulli Trials – The Binomial DistributionPoisson Asymptotical Development (Law of Rare Events)Normal (Gaussian) DistributionDe Moivre-Laplace Asymptotical DevelopmentLaplacian DistributionGama DistributionBeta Distribution
Distributions
4
SOLO
Table of Content (continue - 2)
Probability
Cauchy DistributionExponential DistributionChi-square DistributionStudent’s t-DistributionUniform Distribution (Continuous)Rayleigh DistributionRice DistributionWeibull Distribution
Kinetic Theory of GasesMaxwell’s Velocity DistributionMolecular Models
Boltzman StatisticsBose-Einstein StatisticsFermi-Dirac Statistics
Monte Carlo Method
Generating Continuous Random Variables Importance Sampling
Generating Discrete Random Variables
Metropolis & Metropolis – Hastings AlgorithmsMarkov Chain Monte Carlo (MCMC) Gibbs SamplingMonte Carlo Integration
5
SOLO
Table of Content (continue - 3)
Probability
AppendicesPermutationsCombinations
References
Random Processes
Stationarity of a Random Process
Ergodicity
Markov ProcessesWhite Noise
Markov ChainsExistence Theorems (Theorem 3)
6
SOLO Set Theory
A = (ζ1, ζ2,…, ζn) – a set of n elements
A set A is a collection of objects (elements of the set) ζ1, ζ2,…, ζn
A (x)= (|x| < 1) – a set of all numbers smaller than 1
A (x,y)= (0 <x <T, 0<y<T) – a set of points (x,y) in a square
Ǿ - the set that contains no elements
S - the set that contains all elements (Set space)
S
A = Set space of a die: six independent events {1}, {2}, {3}, {4}, {5}, {6}
Examples
7
SOLO Set Theory
Set Operations
Inclusion - A is included in B if BA BxAx
Equality ABandBABA
Addition BxorAxifBAxBA
Multiplication BxandAxifBAxBA
CBACBA
AAA AOA SSA
AAA OOA ASA
S
A
B
S
A
B
BA
S
A
B
BA
Complement of A A OAAandSAA S
A A
Difference BABA S BA
B
A
AB
8
SOLO Set Theory
Set Operations
Incompatible Sets A and B are incompatible iff OBA
Decomposition of a Set
jiOAAandAAAA jin 21
SOBA
B
A
S nAAAA 21
1A 2A nA
jiOAA ji
S
1A 2A nA
If
we say that A is decomposed in incompatible sets.
jiOAAandSAAA jin 21If
we say that the set space S is decomposed in exhaustive andincompatible sets.
De Morgan Law BABA BABA To find the complement of a set operations we must interchangebetween and , and use the complements of the sets.
August De Morgan(1806 – 1871)
On other form of De Morgan Law
AAi
ii
i
ii
iAA
Table of Content
9
SOLO Probability
Pr (A) is the probability of the event A if
S nAAAA 21
1A 2A nA
jiOAA ji
0Pr A(1)
(3) If jiOAAandAAAA jin 21
1Pr S(2)
then nAAAA PrPrPrPr 21
Probability Axiomatic Definition
Probability Geometric Definition
Assume that the probability of an event in a geometric region A is defined as theratio between A surface to surface of S.
SSurface
ASurfaceA Pr
0Pr A(1)
1Pr S(2)
(3) If jiOAAandAAAA jin 21
then nAAAA PrPrPrPr 21
S
A
10
SOLO Probability
From those definition we can prove the following: 0OP(1’)
Proof: OOSandOSS
0PrPrPrPr3
OOSS
APAP 1(2’)
Proof: OAAandAAS
AAAAS Pr1PrPrPr1Pr32
1Pr0 A(3’)
Proof:
1Pr0Pr1Pr
1'2
AAA
APr01
0Pr A(1) 1Pr S(2) (3) If jiOAAandAAAA jin 21
then nAAAA PrPrPrPr 21
AABAIf PrPr (4’)
Proof:
BAAABB PrPr0PrPrPr00
3
OAABandAABB
BABABA PrPrPrPr(5’)
Proof: OABBAandABBAB
OABAandABABA
BABABAABBAB
ABABA
PrPrPrPr
PrPrPr
PrPrPr3
3
Table of Content
11
SOLO Probability
n
ii
n
n
kjikji
kji
n
jiji
ji
n
ii
n
ii AAAAAAAA
1
13
,.
2
.
1
11
Pr1PrPrPrPr(6’)
Proof by induction:
212121 PrPrPrPr AAAAAA For n = 2 we found that satisfies the equation
Assume equation true for n – 1.
1
1
23
1
,.
21
.
11
1
1
1
Pr1PrPrPrPrn
ini
n
n
kjikji
nkji
n
jiji
nji
n
ini
n
ini AAAAAAAAAAAAA
Let calculate for n
but
1
1
1
1
23
1
,.
21
.
11
1
1
1
1
1
1
11
Pr1PrPrPr
PrPrPrPrPr
n
inin
n
ii
n
n
kjikji
kji
n
jiji
ji
n
ii
n
n
iin
n
iin
n
ii
n
ii
AAPAPAAAAAAA
AAAAAAA
1
1
23
1
,.
21
.
11
1
1
1
Pr1PrPrPrPrn
ii
n
n
kjikji
kji
n
jiji
ji
n
ii
n
ii AAAAAAAA
Theorem of Addition
12
SOLO Probability
(6’)
Proof by induction (continue):
1
1
23
1
,.
21
.
11
1
1
1
24
1
.,.
31
,.
21
.
11
11
Pr1PrPrPrPr
Pr1PrPrPrPrPr
n
ini
n
n
kjikji
nkji
n
jiji
nji
n
inin
n
ii
n
n
lkjilkji
lkji
n
kjikji
kji
n
jiji
ji
n
ii
n
ii
AAAAAAAAAAAA
AAAAAAAAAAAA
Use the fact that
1
11
!1!1
!1
!!1
!1
!!1
!1
!!
!
k
n
k
n
kkn
n
kkn
n
kkn
n
kn
n
kkn
n
k
n
to obtain
q.e.d.
n
ii
n
n
kjikji
kji
n
jiji
ji
n
ii
n
ii AAAAAAAA
1
13
,.
2
.
1
11
Pr1PrPrPrPr
1
1
23
1
,.
21
.
11
1
1
1
Pr1PrPrPrPrn
ini
n
n
kjikji
nkji
n
jiji
nji
n
ini
n
ini AAAAAAAAAAAAA
n
ii
n
n
kjikji
kji
n
jiji
ji
n
ii
n
ii AAAAAAAA
1
13
,.
2
.
1
11
Pr1PrPrPrPr
Theorem of Addition (continue)
Table of Content
13
SOLO Probability
Conditional Probability
S nAAAA 21
1A
jiOAA ji
1A
mAAAB 212A
2A 1A 2A
Given two events A and B decomposed in elementary events
jiOAAandAAAAA ji
n
iin
121
lkOAAandAAAAB lk
m
kkm
121
jiOAAandAAABA jir 21
nAAAA PrPrPrPr 21 mAAAB PrPrPrPr 21
nmrAAABA r ,PrPrPrPr 21
We want to find the probability of A event under the condition that the event B had occurred designed as P (A|B)
B
BA
AAA
AAABA
m
r
Pr
Pr
PrPrPr
PrPrPr|Pr
21
21
14
SOLO Probability
Conditional Probability S nAAAA 21
1A
jiOAA ji
1A
mAAAB 212A
2A 1A 2A
If the events A and B are statistical independent, that the fact that B occurred will not affect the probability of A to occur.
B
BABA
Pr
Pr|Pr
A
BAAB
Pr
Pr|Pr
ABA Pr|Pr BAAABBBABA PrPrPr|PrPr|PrPr
Definition:
n events Ai i = 1,2,…n are statistical independent if:
nrAAr
ii
r
ii ,,2PrPr
11
Table of Content
15
SOLO Probability
Conditional Probability - Bayes Formula
Using the relation:
llll AABBBABA Pr|PrPr|PrPr
klOBABABAB lk
m
kk ,
1
m
kk BAB
1
PrPr
we obtain:
m
kkk
lllll
AAB
AAB
B
AABBA
1
Pr|Pr
Pr|Pr
Pr
Pr|Pr|Pr
and Bayes Formula
Thomas Bayes 1702 - 1761
S
jiOAA ji
1A
mAAAB 21
2A 1A 2A
Table of Content
m
kkk
m
kk
m
kk AABBBABAB
111
Pr|PrPr|PrPrPr
16
SOLO Probability
Total Probability Theorem
Table of Content
nAAAS 21
jiOAA ji
1A 2A
nAB
jiOAAandSAAA jin 21If
we say that the set space S is decomposed in exhaustive andincompatible (exclusive) sets.
The Total Probability Theorem states that for any event B,its probability can be decomposed in terms of conditionalprobability as follows:
n
ii
n
ii BPBABAB
11
|Pr,PrPr
Using the relation:
llll AABBBABA Pr|PrPr|PrPr
klOBABABAB lk
n
kk ,
1
n
kk BAB
1
PrPr
For any event B
we obtain:
17
SOLO Probability
Statistical Independent Events
n
ii
n
n
kjikji i
i
n
jiji i
i
n
ii
tIndependenlStatisticaA
n
ii
n
n
kjikji
kji
n
jiji
ji
n
ii
n
ii
AAAA
AAAAAAAA
i
1
13
,.
3
1
2
.
2
1
1
1
1
13
,.
2
.
1
11
Pr1PrPrPr
Pr1PrPrPrPr
From Theorem of Addition
Therefore
n
ii
tIndependenlStatisticaA
n
ii AA
i
11
Pr1Pr1
n
ii
tIndependenlStatisticaA
n
ii AA
i
11
Pr11Pr
Since OAASAAn
ii
n
ii
n
ii
n
ii
1111
&
n
ii
n
ii AA
11
PrPr1
n
ii
tIndependenlStatisticaA
n
ii AA
i
11
PrPr If the n events Ai i = 1,2,…n are statistical independent than are also statistical independent iA
n
iiA
1
Pr
n
ii
MorganDe
A1
Pr
n
ii
tIndependenlStatisticaA
A
i
1
Pr1
nrAAr
ii
r
ii ,,2PrPr
11
Table of Content
18
SOLO Probability
Theorem of Multiplication
12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn
Proof ABABA /PrPrPr Start from
12121 /PrPrPr AAAAAAA nn
2131212 /Pr/Pr/Pr AAAAAAAAA nn in the same way
12122112211 /Pr/Pr/Pr nnnnnnn AAAAAAAAAAAAA
From those results we obtain:
12112312121 |Pr|Pr|PrPrPr AAAAAAAAAAAAA nnn
q.e.d.
Table of Content
19
SOLO Review of Probability
Random Variables Let ascribe to each outcome or event a real number, such we have a one-to-one correspondence between the real numbers and the Space of Events. Any functionthat assigns a real number to each event in the Space of Events is called a randomvariable (a random function is more correct, but this is the used terminology).
X
x
0
X
1 2 3 4 5 6
x
The random variables can be:
- Discrete random variables for discrete events
- Continuous random variables for continuous events
Table of Content
20
SOLO Review of Probability
Probability Distribution and Probability Density Functions The random variables map the space of events X to the space of real numbers x.
xP
x
0
0.1
The Probability Distribution Function or Cumulative Probability Distribution Function of x can be defined as:
(1)
PX (x) is a monotonic increasing function
xxXxPX Pr:
The Probability Distribution Function has the following properties:
xPX 0
(2) xPX 1
(3)
2121 xxxPxP XX
The Probability that X lies in the interval (a,b) is given by:
0Pr aPbPbXa XX
If PX (x) is a continuous differentiable function of x we can define
0lim
Prlim:
00
xd
xPd
x
xPxxP
x
xxXxxp XXX
xxX
the Probability Density Function of x.
xp
x
0
0.1
21
SOLO Review of Probability
Probability Distribution and Probability Density Functions (continue – 1)
The Probability Distribution and Probability Density Functions of x can be defined also for discrete random variables.
integer
61
616/1
10
6
1Pr
0
6
10
k
k
kk
k
dxixdxxpkXkxPk
i
k
XX
xp
x
0 6
0.1
1 2 3 4 5
xP
6/1
3/1
2/13/2
6/5
ExampleSet space of a die:six independent events {x=1}, {x=2}, {x=3}, {x=4}, {x=5}, {x=6}
6
16
1:
iX ixxp
Where δ (x) is the Dirac delta function
1&0
00
dxxx
xx
X
1 2 3 4 5 6
x
22
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
(2) Poisson’s Distribution 00 exp!
, kk
knkp
k
(1) Binomial (Bernoulli) knkknk ppk
npp
knk
nnkp
11
!!
!,
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
nkP ,
(3) Normal (Gaussian)
2
2/exp,;
22
xxp
(4) Laplacian Distribution
b
x
bbxp
exp
2
1,;
23
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
(5) Gama Distribution
00
0/exp
,;1
x
xxk
x
kxpk
k
(6) Beta Distribution
11
1
0
11
11
11
1,;
xx
duuu
xxxp
(7) Cauchy Distribution
220
0
1,;
xxxxp
24
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
SOLO
(8) Exponential Distribution
00
0exp;
x
xxxp
(9) Chi-square Distribution
00
02/exp2/
2/1;
12/
2/
x
xxxkkxp
k
k
Γ is the gamma function
0
1 exp dttta a
(10) Student’s t-Distribution
2/12 /12/
2/1;
x
xp
25
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
SOLO
(11) Uniform Distribution (Continuous)
bxxa
bxaabbaxp
0
1,;
(12) Rayleigh Distribution
2
2
2
2exp
;
xx
xp
(13) Rice Distribution
202
2
22
2exp
,;
vx
I
vxx
vxp
26
SOLO Review of Probability
Probability Distribution and Probability Density Functions (Examples)
(14) Weibull Distribution
SOLO
00
0,,exp,,;
1
x
xxx
xp
Weibull Distribution Table of Content
27
SOLO Review of Probability
Conditional Probability Distribution and Conditional Probability Density Functions
SOLO
The Conditional Probability Distribution Function or Cumulative Conditional Probability Distribution Function of x given is defined as:
xYyxXyxP YX /Pr://
Yy
(1) xyP YX 0//
(2) xyP YX 1//
PX/Y (x/y) is a monotonic increasing function (3)
212/1/ // xxyxPyxP YXYX
The Probability that X lies in the interval (a,b) given is given by: 0///Pr // yaPybPYbXa YXYX
If PX/Y (x/y) is a continuous differentiable function of x we can define
0
///lim
/Prlim:/ ///
00/
xd
yxPd
x
yxPyxxP
x
YxxXxyxp YXYXYX
xxYX
the Conditional Probability Density Function of x.
Yy
The random variables map the space of events X to the space of real numbers x.
28
SOLO Review of Probability
Conditional Probability Distribution and Conditional Probability Density Functions
SOLO
Example 1Given PX (x) and pX (x) find PX/Y (x/x ≤ a) and pX/Y (x/x ≤ a)
axaPxP
axaxxP
XX
YX /
1//
axaPxp
axaxxp
XX
YX /
0//
Example 2Given PX (x) and pX (x) find PX/Y (x/ b <x ≤ a) and pX/Y (x/ b< x ≤ a)
ax
axbbPaP
bPxP
bx
axxPXX
XXYX
1
0
//
ax
axbbPaP
xp
bx
axxpXX
XYX
0
0
//
Table of Content
29
SOLO Review of Probability
Expected Value or Mathematical Expectation
Given a Probability Density Function p (x) we define the Expected Value
For a Continuous Random Variable:
dxxpxxE X:
For a Discrete Random Variable: k
kXk xpxxE :
For a general function g (x) of the Random Variable x:
dxxpxgxgE X:
xp
x
0
0.1
xE
dxxp
dxxpxxE
X
X
:
The Expected Value is the center of surface enclosed between the Probability Density Function and x axis.
Table of Content
30
SOLO Review of Probability
Variance
Given a Probability Density Functions p (x) we define the Variance
22222 2: xExExExExxExExExVar
Central Moment
kk xEx :'
Given a Probability Density Functions p (x) we define the Central Moment of order k about the origin
k
j
jkj
jkkk xE
j
kxExEx
0
'1:
Given a Probability Density Functions p (x) we define the Central Moment of order k about the Mean E (x)
Table of Content
31
SOLO Review of Probability
Moments
Normal Distribution
2
2/exp;
22xxpX
oddnfor
evennfornxE
nn
0
131
12!22
2131
12 knfork
knforn
xEkk
n
n
Proof:
Start from: and differentiate k time with respect to a 0exp 2
aa
dxxa
Substitute a = 1/(2σ2) to obtain E [xn]
0
2
1231exp
1222
a
a
kdxxax
kkk
12
!
0
122/
0
222221212
!22
exp2
22
2/exp2
22/exp
2
1
2
kk
k
k
kxy
kkk
kdyyy
xdxxxdxxxxE
Now let compute:
2244 33 xExE
Chi-square
32
SOLO Review of Probability
Moments
Gama Distribution
00
0/exp
,;1
x
xxk
x
kxpk
k
Beta Distribution
11
1
0
11
11
11
1,;
xx
duuu
xxxp
nknn
knk
n
k
kndxxx
kdxxx
kxE
0
1
0
1 /exp/exp1
Γ is the gamma function
0
1 exp dttta a
33
SOLO Review of Probability
Moments
Uniform Distribution (Continuous)
cxxc
cxccccxp
0
2
1
,;
oddnfor
evennforn
c
n
x
cdxx
cxE
nc
c
nc
c
nn
0
1212
1
2
12
1
Rayleigh Distribution
2
2
2 2exp;
xx
xp
knfork
knforn
dxx
xdxxx
xxE
kk
n
nnn
2!2
121312
2exp
2
1
2exp
2
2
21
022
2
2
34
SOLO Review of Probability
Example
Repeat an experiment m times to obtain X1, X2,…,Xm.
Define:Statistical Estimation:
m
XXXX m
m
21
Sample Variation: m
XXXXXXV mmmm
m
22
2
2
1
22 ii XEXE
jiXXE ji 0Since the experiment are uncorrelated:
m
XEXEXEXE m
m
21
mm
m
m
XXXEXEXEXVar m
mmmX m
2
2
22
2122
35
SOLO Review of Probability
Example (continue)
Statistical Estimation:m
XXXX m
m
21
Sample Variation: m
XXXXXXV mmmm
m
22
2
2
1
Let compute: mVE
m
XXEXEXE
m
XXE
m
XXE mimimimi
22222
mm
XXXXEXXE
jiXXE
XE
miimi
ji
i
201
22
Therefore:
2
222
222122
m
m
mmm
m
XXEXEXE
m
XXE mimimi
2
222
2
2
1 11
m
m
mm
mm
m
XXEXXEXXEVE mmmm
m
Table of Content
36
SOLO Review of Probability
Functions of one Random Variable
Let y = g (x) a given function of the random variable x defined o the domain Ω, withprobability distribution pX (x). We want to find pY (y).
Fundamental Theorem
Assume x1, x2, …, xn all the solutions of the equation nxgxgxgy 21
n
nXXXY xg
xp
xg
xp
xg
xpyp
''' 2
2
1
1 dyy
y
xpX
xg
3xd
x1x 2x
1xd2xd
3x
xd
xgdxg :'
Proof
n
i i
iXn
iiiX
n
iiiiY yd
xg
xpxdxpxdxxxydyYyydyp
111 'PrPr:
q.e.d.
CauchyDistribution
Derivation ofChi-square
37
SOLO Review of Probability
Functions of one Random Variable (continue – 1)
Example 1
bxay
a
byp
ayp XY
1
Example 2
x
ay
y
ap
y
ayp XY 2
Example 32xay yU
a
yp
a
yp
yayp XXY
2
1
Example 4
xy yUypypyp XXY
Yp
x x x
xy
XPXp
YP
Table of Content
38
SOLO Review of Probability
Jointly, Distributed Random Variables
We are interested in function of several variables.
nnnXXX xXxXxXxxxPn
,,,Pr:,,, 22112121
The Jointly Cumulative Probability Distribution of the random variables X1, X2, …,Xn is defined as:
The Cumulative Probability Distribution of the random variable Xi, can be obtained from
,,,,,,,,,,Pr2121 iXXXniiiX xPXxXXXxP
ni
nXXX xxxPn
,,, 2121
If the Jointly Cumulative Probability Distribution is continuous and differentiable in each of the components than we can define the Joint Probability Density Function as:
n
nXXXn
nXXX xxx
xxxPxxxp n
n
21
2121
,,,:,,, 21
21
ik
ikni nknXXXiX xdxdxdxxxpxp ,,,,,,, 12121
39
SOLO Review of Probability
Jointly, Distributed Random Variables (continue – 1)
We define:
nnXXXnn xdxdxxxpxxxgxxxgEn
,,,,,,,,:,,, 1212121 21
m
iimm XXXXS
121:
Example: Given the Sum of m Variables
m
ii
m
innXXXi
nnXXXnm
xExdxdxxxpx
xdxdxxxpxxxSE
n
n
11121
12121
,,,,,
,,,,,:
21
21
m
jii
m
ijj
ji
m
ii
m
jii
m
ijj
jjii
m
iii
m
iii
XS
XESEmmm
XXCovXVar
XEXXEXEXEXE
XEXESESESVar
m
iim
m
iim
1 11
1 11
2
2
1
2
,2
2
:1
1
40
SOLO Review of Probability
Jointly, Distributed Random Variables (continue – 2)
Given the joint density function of n random variables X1, X2, …, Xn: nXXX xxxpn
,,, 2121
we want to find the joint density function of n random variables Y1, Y2, …, Yn thatare related to X1, X2, …, Xn, through
nnn
n
n
XXXgY
XXXgY
XXXgY
,,,
,,,
,,,
21
2122
2111
nn
nnn
n
n
n Xd
Xd
Xd
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
Yd
Yd
Yd
2
1
21
2
2
2
1
2
1
2
1
1
1
2
1
Assuming that the Jacobian
n
nnn
n
n
n
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
X
g
XXXJ
21
2
2
2
1
2
1
2
1
1
1
21 det,,,
is nonzero for each X1, X2, …, Xn, exists a unique solution Y1, Y2, …, Yn
41
SOLO Review of Probability
Jointly, Distributed Random Variables (continue – 3)
Assume that for a given Y1, Y2, …, Yn we can find k solutions (X1, X2, …, Xn)1,… ( X1, X2, …, Xn)k.
k
in
n
nXX
k
innXX
k
iinnnn
nnnnnnYY
ydydxxJ
xxp
xdxdxxpxdxXxxdxXx
ydyYyydyYyydydyyp
n
n
n
11
1
1
111
11111
111111
,,
,,
,,,,Pr
,,Pr:,,
1
1
1
Therefore
k
in
nXX
nYY xxJ
xxpyyp n
n1
1
1
1 ,,
,,,, 1
1
The relation between the differential volume in (Y1, Y2, …, Yn) and the differential volume in (X1, X2, …, Xn) is given by
nnn xdxdxxJydyd 111 ,, 1xd
2xd
3xd
1yd
2yd
3yd
42
SOLO Review of Probability
Jointly, Distributed Random Variables (continue – 4)
Example 1
0,
/exp/exp/exp, 11
11
,
yxyxyxyyxx
yxp YX
X and Y are independent gamma random variables with parameters (α,λ) and(β, λ), respectively, compute the joint densities of U= X + Y and V = X / (X + Y)
VUY
VUX
YXXYXgV
YXYXgU
1/,
,
2
1
UYXYX
X
YX
YJ11
11
22
uvuuv
u
vuuvJ
vuuvp
yxJ
yxpvup YXYX
VU
11,,
, 1/exp
1,
1,
,
,,
11
1
1/exp
vvuu
Therefore
/exp1 uu
upU
11 1
vvvpV
gamma distribution
beta distribution
Table of Content
43
SOLO Review of Probability
Characteristic Function and Moment-Generating Function
Given a Probability Density Functions pX (x) we define the Characteristic Function or Moment Generating Function
xX
XX
X
discretexxpxj
continuousxxPdxjdxxpxjxjE
exp
expexpexp:
This is in fact the complex conjugate of the Fourier Transfer of the Probability Density Function. This function is always defined since the condition of the existence of a Fourier Transfer :
Given the Characteristic Function we can find the Probability Density Functions pX (x) using the Inverse Fourier Transfer:
10
dxxpdxxp X
xp
X
dxjxp XX exp2
1
is always fulfilled.
44
SOLO Review of Probability
Properties of Moment-Generating Function
dxxpxxjjd
dX
X
exp
10
dxxpXX
xEjdxxpxjd
dX
X
0
dxxpxxjjd
dX
X 22
2
2
exp
2222
0
2
2
xEjdxxpxjd
dX
X
dxxpxxjjd
dX
nn
n
X
n
exp
nnX
nn
nX
n
xEjdxxpxjd
d
0
dxxpxj XX exp
This is the reason why ΦX (ω) is also called the Moment-Generation Function.
45
SOLO Review of Probability
Properties of Moment-Generating Function
nn
nn
Xn
XXXX
xEn
jxE
jxE
j
d
d
nd
d
d
d
!!2!11
!
1
!2
1
22
0
2
0
2
2
00
Develop ΦX (ω) in a Taylor series
dxxpxj XX exp
46
SOLO Review of Probability
Moment-Generating Function
Binomial Distribution
n
n
k
knkn
k
knk
pjp
ppkjknk
npp
knk
nkjkjE
1exp
1exp!!
!1
!!
!expexp
00
knk ppknk
nnkp
1
!!
!,
Poisson Distribution integer positive
!
exp; k
kkp
k
1expexpexpexpexp
!
expexp
!
expexp
00
jj
k
j
kkj
k
k
k
k
Exponential Distribution
00
0exp;
x
xxxp
jxj
j
dxxjdxxxj
0
00
exp
expexpexp
47
SOLO Review of Probability
Moment-Generating Function
Normal Distribution
2
2
2exp
2
1,;
x
xp
dx
xjxxdx
xxj
2
222
2
2
2
22exp
2
1
2expexp
2
1
Let write
24222
22222
222222
2
222
jjx
jjx
xjxxjxx
Therefore
1
2
22
2
242
2
2
2exp
2
1
2
2exp
2expexp
2
1
dx
jxjdx
xxj
Central LimitTheorem
dxjxp XX exp2
1Using
dxj
x 22
2
2
2
1exp
2
1
2exp
2
1
j22
2
1exp Poisson
Distribution
48
SOLO Review of Probability
Properties of Moment-Generating Function
Moment-Generating Function of the Sum of Independent Random Variables
mm XXXS 21:
Given the Sum of Independent Random Variables
mmXmXX
xpxpxpxxxp
VariablesRandomtIndependen
mmSmS
xdxpxjxdxpxjxdxpxj
dxdxdxxxxpXXXj
m
mmXXXmmS
mm
expexpexp
,,,exp
222111
,,,
212121
21
221121
mm XXXS
21
,m,,ik
kkp i
i
k
iiiiX i
21integer positive!
exp;
mijiX i,,2,11expexp
1expexp 2121 jmXXXS mm
dxxpxj XX exp
Sum of Poisson Independent Random Variables is a Poisson Random Variable with mSm
21
Example 1: Sum of Poisson Independent Random Variablesmm XXXS 21:
49
SOLO Review of Probability
Properties of Moment-Generating Function
Example 2: Sum of Normal Independent Random Variables
mm
mm
XXXS
j
jjj
mm
21
22
2
2
1
2
22
2
2
2
2
1
2
1
2
2
1exp
2
1exp
2
1exp
2
1exp
21
Sum of Normal Independent Random Variables is a Normal Random Variable with
dxxpxj XX exp
2
2
2exp
2
1,;
i
i
i
iii
xxp
mm XXXS 21:
iiX j
i 22
2
1exp
mS
mS
m
m
21
22
2
2
1
2
Therefore the Sm probability distribution is:
2
2
2exp
2
1,;
m
m
m
mm
S
S
S
SSm
xSp
Table of Content
50
SOLO Review of Probability
Existence Theorems
Existence Theorem 1
Given a function G (x) such that
1lim,1,0
xGGGx
2121 0 xxifxGxG ( G (x) is monotonic non-decreasing)
xGxGxG n
xxxx
n
n
lim
We can find an experiment X and a random variable x, defined on X, such thatits distribution function P (x) equals the given function G (x).
xG
x
0
0.1
Proof of Existence Theorem 1
Assume that the outcome of the experiment X is any real number -∞ <x < +∞. We consider as events all intervals, the intersection or union of intervals on thereal axis.
5x1x 2x 3x 4x 6x 7x 8x
To specify the probability of those events we define P (x)=Prob { x ≤ x1}= G (x1).From our definition of G (x) it follows that P (x) is a distribution function.
Existence Theorem 2 Existence Theorem 3
51
SOLO Review of Probability
Existence Theorems
Existence Theorem 2
If a function F (x,y) is such that
0,,,,
1,,0,,
11122122 yxFyxFyxFyxF
FxFyF
for every x1 < x2, y1 < y2, then two random variables x and y can be found such thatF (x,y) is their joint distribution function.
x
0
y
2x1x
2y
1y
Proof of Existence Theorem 2
Assume that the outcome of the experiment X is any real number -∞ <x < +∞.Assume that the outcome of the experiment Y is any real number -∞ <y < +∞. We consider as events all intervals, the intersection or union of intervals on thereal axes x and y.
To specify the probability of those events we define P (x,y)=Prob { x ≤ x1, y ≤ y1, }= F (x1,y1).From our definition of F (x,y) it follows that P (x,y) is a joint distribution function.
The proof is similar to that in the Existence Theorem 1
52
SOLO Review of Probability
Histogram
A histogram is a mapping mi that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram.
Thus, if we let n be the total number of observations and k be the total number of bins, the histogram mi meets the following conditions:
k
iimn
1
A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram Mi of a histogram mi is defined as:
i
jji mM
1
An ordinary and a cumulative histogram of the same data. The data shown is a random sample of 10,000 points from a normal distribution with a mean of 0 and a standard deviation of 1.
Cumulative Histogram
53
SOLO Review of Probability
Law of Large Numbers (History)
The Weak Law of Large Numbers was first proved by the Swiss mathematician James Bernoulli in the fourth part of his work “Ars Conjectandi” published posthumously in 1713.
Jacob Bernoulli1654-1705
The Law of Large Numbers has three versions:• Weak Law of Large Numbers (WLLN)• Strong Law of Large Numbers (SLLN)• Uniform Law of Large Numbers (ULLN)
The French mathematician Siméon Poisson generalizedBernoulli’s theorem around 1800.
Siméon Denis Poisson1781-1840
The next contribution was by Binaymé and later in 1866 by Chebyshev and is known as Binaymé- Chebyshev Inequality.
Pafnuty LvovichChebyshev1821 - 1894
Irénée-Jules Bienaymé1796 - 1878
Weak Law of Large Numbers (WLLN)
54
SOLO Review of Probability
Law of Large Numbers (History - continue)
Francesco Paolo Cantelli
1875-1966
Félix Edouard Justin ĖmileBorel
1871-1956
Andrey Nikolaevich Kolmogorov1903 - 1987
Table of Content
Borel-Cantelli Lemma
55
SOLO Review of Probability
Markov’s Inequality
If X is a random variable which takes only nonnegative values, then for any value a>0
a
XEaX Pr
Proof:
Suppose X is continuous with probability density function xpX
aXa
dxxpadxxpadxxpx
dxxpxdxxpxdxxpxXE
aX
aX
aX
aX
a
XX
Pr
00
1856 - 1922
Since a > 0:
a
XEaX Pr
x
0
0.1
xE
xPX
a
aPX
aPaX X 1Pr
xpX
Table of Content
56
SOLO Review of Probability
Chebyshev’s Inequality
If X is a random variable with mean μ = E (X) and variance σ2= E [(X – μ)2] ,then for any value k > 0
2
2
Prk
kX
Proof:
Since (X – μ)2 is a nonnegative random variable, we can apply Markov’s inequality with a = k2 to obtain
2
2
2
222Pr
kk
XEkX
But since (X – μ)2 ≥ k2 if and only if | X – μ| ≥ k, the above is equivalent to
2
2
Prk
kX
Pafnuty LvovichChebyshev1821 - 1894
Weak Law ofLarge Numbers Take k σ instead of k to obtain
2
1Pr
kkX
xpX
x0
0.1
k k
Bernoulli’sTheorem
Table of Content
57
SOLO Review of ProbabilityBienaymé’s Inequality
If X is a random variable then for any values a, k > 0 use
Proof:
Let prove first that if the random variable y takes only positive values, than for any α>0
0Pr
a
k
aXEkaX
n
n
nn
yEy Pr
i.e.
0Pr
ak
aXEkaX
n
n
ydyypdyypydyypyyE YYY Pr0
Define and choose α = kn > 00: naXy
0Pr
a
k
aXEkaX
n
n
nn
kaXkaX nn
0Pr
a
k
aXEkaX
n
n
Irénée-JulesBienaymé
1796 - 1878
Markov’s Inequality
For n = 2 and a = μ we obtain the Chebyshev’s Inequality. For this reasonChebyshev’s Inequality is known also as Bienaymé - Chebyshev’s Inequality
Markov’s Inequality
x
0
0.1
n
n
k
aXE
aka ka
n
n
k
aXE
xpX
Table of Content
58
SOLO Review of ProbabilityChernoff’s and Hoeffding’s Bounds
Start from Markov’s Inequalityfor a nonnegative random variable Z and γ > 0
0,0Pr Z
ZEZ
Now let take a random variable Y and define the logarithmicgenerating function
otherwise
YtEifYtEtY ,
exp,expln:
Using the fact that exp (x) is a monotonic increasing function 0expexp ttYtY
and applying Markov’s inequality with tYtZ exp:&exp:
we obtain:
0exp
exp
expexpexpPrPr ttt
t
YtEtYtY Y
Therefore: ttY Yt
expinfPr
0
From this inequality, by using different Y, we obtain the Chernoff’s and Hoeffding’s bounds
To compute ΛY(t) we need to know the distribution function pY (y).
Markov, Chebyshev and Bienaymé inequalities use only Expectation Value information. Let try to obtain a tighter bound when the probability distribution function is known.
0infimum
t
Table of Content
59
SOLO Review of Probability
Chernoff’s Bound
Let X1, X2,… be independent Bernoulli’s random variables with Pr (Xi=1) = p and Pr (Xi=0) = 1-p
Herman Chernoff1921 -
Use
mm XmXXY :/: 1 Define:
ptp
ptptXtEt iX i
1expln
10exp1explnexpln
ttY Yt
expinfPr
0
pmtpmXm
tEt
m
iiY
1/explnexpln1
tttt Yt
Yt
00supexpinf
pmtpmttt Y 1/expln
0
1/exp
/exp
pmtp
mtptt
td
dY
1
1/*exp
p
pmt
pp
mp
mp
pmtt Y 1
1ln1ln
1
1ln
1
1ln**
60
SOLO Review of ProbabilityChernoff’s Bound (continue – 1)
Use
101
1ln1lnexpinf/Pr
101
p
ppmmXX m
Define:
1,01
1ln1ln:|
p
ppmpH
**expsupexpinfsupPr10010
ttttY YYt
0| ppH
1
1
1ln1ln
|
ppm
d
pHd
0|
d
ppHd
mm
d
pHd4
1
11|2
2
0 0.1
m4
2
2 |
d
pHd
5.0
61
SOLO Review of Probability
Chernoff’s Bound (continue – 2)
10|supexp|expsup/Pr1010
1
ppHpHmXX m
1,01
1ln1ln:|
p
ppmpH
2
4
102
00
2
|!2
|!1
||
pm
pHp
ppHp
ppHpHm
From which we arrive to the Chernoff’s Bound
1,02exp/Pr 21 ppmmXX m
Define p:
102exp/Pr 2
1 pmpmXX m
62
SOLO Review of Probability
Chernoff’s Bound (continue – 3)
Using the Chernoff Bound we obtain
Define now:
102exp1/11Pr 2
1 pmpmXX m
mm XmXXY 1:/11: 1
or since pmXXmXX mm 1/1/11 11
pmXX m /1
102exp/Pr 2
1 pmpmXX m
together with:
102exp/Pr 2
1 pmpmXX m Chernoff’s Bounds
Herman Chernoff1921 -
102exp2/Pr 2
1 pmpmXX m
By summing those two inequalities we obtain:
102exp2//Pr 2
11 pmpmXXpmXX mm or:
Table of Content
63
SOLO Review of Probability
Hoeffding’s Bound
Suppose that Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b and assume E (Y) = 0
Define: 10:
bYa
ab
Yb
We have: babab
aYa
ab
YbY
1
Since exp (.) is a convex function, for any t ≥ 0 we have: 0expexpexp1expexp
ttbab
aYta
ab
YbtbtatY
ta tbtY
taexp
tbexp
tYexp
tbta exp1exp
01 ttbtatY
Let take the expectation of this inequality and define:
10/:0
pabapYE
0exp:expexp1
expexp1
expexpexp
00
tutabptabpp
tbptap
tbab
aYEta
ab
YEbtYE
Let start with a simpler problem:
64
SOLO Review of Probability
Hoeffding’s Bound (continue – 1)
0exp:expexp1exp tutabptabpptYE
where:
abtu : 00exp1ln: uppupu
00
exp1
1exp1
exp1
exp
ud
d
upp
upp
upp
uppu
ud
d
Differentiating we obtain:
22
2
exp1
exp1
upp
uppu
ud
d
p
puupp
upp
uppu
ud
d
1
*exp0exp1exp1
exp133
3
4
1
11
11
* 22
2
2
2
pp
pp
pp
ppu
ud
du
ud
d
65
SOLO Review of Probability
Hoeffding’s Bound (continue – 2)
0exp:expexp1exp tutabptabpptYE
where:
abtu : 00exp1ln: uppupu
222
4
1008
1''
2
10'0 abtuuu
08/expexp 22 tabttYE
End of the simpler problem:
Y is a random variable with a ≤ Y ≤ b almost surely for some finite a and b and assume E (Y) = 0
66
SOLO Review of Probability
Hoeffding’s Bound (continue – 3)
Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi fori = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and
Therefore we have
08/expexp8/expexp 22
022
tabttZEabttZE iii
ZE
iii
i
Generalize the result
Use 0
exp
expPr t
t
YtEY
in
08
exp2
8/expexp2
expexpexpexp
expexpexpexp
PrPrPr
1
22
1
22
11
11
111
tabt
t
abtt
ZtEtZtEt
ZtEtZtEt
ZZZ
m
iii
m
iii
m
ii
m
ii
m
ii
m
ii
m
ii
m
ii
m
ii
mm ZZZZ 21
67
SOLO Review of Probability
Hoeffding’s Bound (continue – 4)
Wassily Hoeffding1914 - 1991
Suppose X1, X2, …,Xm are independent random variables with ai ≤ Xi ≤ bi fori = 1,2,…,m. Define Zi = Xi – E (Xi), meaning E (Zi) = 0 and
Therefore we have
Generalize the result
but
08
infexp28
exp2Pr1
22
01
22
1
tabt
tabt
tZm
iii
t
m
iii
m
ii
mm ZZZZ 21
m
iii
abtm
iii
tabab
tt
m
iii
1
22
/4*
1
22
0/2
8inf
1
2
m
iii
m
ii
abZ
1
2
1
2exp2Pr
We finally obtain Hoeffding’s Bound
Table of Content
68
SOLO Review of Probability
Convergence Concepts
We say that the sequence Xn, converge to X with probability 1 if the set of outcomes x such that
has the probability 1, or
nforXX n 1Pr
xXxX nn
lim
Convergence Almost Everywhere (a.e.) (or with Probability 1, or Strongly)
Convergence in the Mean-Square sence (m.s.)
We say that the sequence Xn, converge to X in the mean-square sense if
nforXXE n 02
Convergence in Probability (p) (or Stochastic Convergence or Convergence in Measure) We say that the sequence Xn, converge to X in Probability sense if
nforXX n 0Pr
Convergence in Distribution (d) (weak convergence)
We say that the sequence Xn, converge to X in Distribution sense if
nforxpxp XX n
No convergence
Distribution
AlmostEverywhere
(d)
(p)(a.e.)(m.s.)
Mean Square
Probability
implies
implies
impliesor XXea
n
..
or XXsm
n
..
or XXP
n
or XXd
n
Weak Law ofLarge Numbers
Central LimitTheorem
Bernoulli’sTheorem
69
SOLO Review of Probability
Convergence Concepts (continue – 1)
According to Cauchy Criterion of Convergence the sequence Xn, converge to a unknown limit if
Cauchy Criterion of Convergence
Augustin Louis Cauchy ( 1789-1857)00 manyandnforXX mnn
Convergence Almost Everywhere (a.e.)
01Pr manyandnforXX mnn
Convergence in the Mean-Square sence (m.s.)
002 manyandnforXXE mn
nforXX n 0Pr
Using Chebyshev Inequality
22/Pr XXEXX nn
If Xn →X in the m.s. sense, then the right hand, for a given ε, tends to zero, also the left hand side, i.e.: Convergence in Probability (p)
The opposite is not true, convergence in probability doesn’t imply convergence in m.s.
No convergence
Distribution
AlmostEverywhere
(d)
(p)(a.e.)(m.s.)
Mean Square
Probability
implies
implies
implies
Table of Content
70
SOLO Review of Probability
The Laws of Large Numbers
The Law of Large Numbers is a fundamental concept in statistics and probability thatdescribes how the average of randomly selected sample of a large population is likelyto be close to the average of the whole population. There are two laws of large numbersthe Weak Law and the Strong Law.
The Weak Law of Large Numbers
The Weak Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequenceof random variables that have the same expected value μ and variance σ2, and areuncorrelated (i.e., the correlation between any two of them is zero), then
nXXX nn /: 1 converges in probability (a weak convergence sense) to μ . We have
nforX n 1Pr converges in probability
The Strong Law of Large Numbers The Strong Law of Large Numbers states that if X1,X2,…,Xn,… is an infinite sequenceof random variables that have the same expected value μ and variance σ2, and areuncorrelated (i.e., the correlation between any two of them is zero), and E (|X i|) < ∞then ,i.e. converges almost surely to μ. nforX n 1Pr
converges almost surely
71
SOLO Review of Probability
The Law of Large Numbers
Differences between the Weak Law and the Strong Law
The Weak Law states that, for a specified large n, (X1 + ... + Xn) / n is likely to be near μ. Thus, it leaves open the possibility that | (X1 + ... + Xn) / n − μ | > ε happens an infinite number of times, although it happens at infrequent intervals.
The Strong Law shows that this almost surely will not occur. In particular, it implies that with probability 1, we have for any positive value ε, the inequality | (X1 + ... + Xn) / n − μ | > ε is true only a finite number of times (as opposed to an infinite, but infrequent, number of times).
Almost sure convergence is also called strong convergence of random variables. This version is called the strong law because random variables which converge strongly (almost surely) are guaranteed to converge weakly (in probability). The strong law implies the weak law.
72
SOLO Review of Probability
The Law of Large Numbers
Proof of the Weak Law of Large Numbers
iXE i iXVar i 2 jiXXE ji 0
nnnXEXEXE nn //1
nn
n
n
XEXE
n
XXE
n
XXEXEXEXVar
njiXXE
nnnnn
ji 2
2
2
2
221
0
2
1
2
12
Given
we have:
Using Chebyshev’s inequality on we obtain:nX 2
2 /Pr
n
X n
Using this equation we obtain:
n
XXX nnn 2
2
1Pr1Pr1Pr
As n approaches infinity, the expression approaches 1.
Chebyshev’sinequality
q.e.d. Table of Content
Monte CarloIntegration
Monte CarloIntegration
73
SOLO Review of Probability
Central Limit Theorem
The first version of this theorem was first postulated by the French-born English mathematician Abraham de Moivre in1733, using the normal distribution to approximate thedistribution of the number of heads resulting from many tossesof a fair coin. This was published in1756 in “The Doctrine of Chance” 3th Ed.
Pierre-Simon Laplace(1749-1827)
Abraham de Moivre(1667-1754)
This finding was forgotten until 1812 when the French mathematician Pierre-Simon Laplace recovered it in his work “Théory Analytique des Probabilités”, in which he approximate the binomial distribution with the normal distribution. This is known as the De Moivre – Laplace Theorem.
De Moivre – Laplace Theorem
The present form of the Central Limit Theorem was given by theRussian mathematician Alexandr Lyapunov in 1901.
Alexandr MikhailovichLyapunov
(1857-1918)
74
SOLO Review of Probability
Central Limit Theorem (continue – 1)
Let X1, X2, …, Xm be a sequence of independent random variables with the sameprobability distribution function pX (x). Define the statistical mean:
m
XXXX m
m
21
m
XEXEXEXE m
m
21
mm
m
m
XXXEXEXEXVar m
mmmX m
2
2
22
2122
Define also the new random variable
m
XXXXEXY m
X
mm
m
21:
We have:
The probability distribution of Y tends to become gaussian (normal) as m tends to infinity, regardless of the probability distribution of the random variable, as long as the mean μ and the variance σ2 are finite.
75
SOLO Review of Probability
Central Limit Theorem (continue – 2)
m
XXXXEXY m
X
mm
m
21:
Proof
The Characteristic Function
m
X
m
im
i
i
mY
m
X
m
jE
m
XjE
m
XXXjEYjE
i
expexp
expexp
1
21
0/lim2
1
!3
/
!2
/
!1
/1
2222
33
1
22
0
mmmm
XE
mjXE
mjXE
mj
m
m
iiiX i
Develop in a Taylor series
miX
76
SOLO Review of Probability
Central Limit Theorem (continue – 3)
Proof (continue – 1)
The Characteristic Function
m
XYm
Ei
0/lim2
12222
mmmmm m
X i
2/exp2
1 222
mm
Y mm
Therefore
2/exp2
12/exp
2
1exp
2
1 22 ydyjdyjypm
YY
The probability distribution of Y tends to become gaussian (normal) as m tends to infinity(Convergence in Distribution).
Characteristic Functionof Normal Distribution
ConvergenceConcepts
Table of ContentMonte CarloIntegration
77
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution
knkknk ppk
npp
knk
nnkp
11
!!
!,
JacobBernoulli
1654-1705
!
,1
!;;
00 k
k
i
eipkP
k
i
ik
i
pnxE
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance ppnxVar 1
x
a dtttxa0
1 exp,γ is the incomplete gamma function
Moment Generating Function
npjp 1exp DistributionExamples
78
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution (continue – 1)
p – probability of success (r = 1) of a given discrete trial
q – probability of failure (r=0) of the given discrete trial
1 qp
n – number of independent trials
nkp , – probability of k successes in n independent trials (Bernoulli Trials)
knkknk ppk
npp
knk
nnkp
11
!!
!,
Using the binomial theorem we obtain
n
k
knkn ppk
nqp
0
11
therefore the previous distribution is called binomial distribution.
JacobBernoulli
1654-1705
Given a random event r = {0,1}
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
nkP ,
The probability of k successful trials from n independent trials is given by
The number of k successful trials from n independent trials is given by
!!
!
knk
n
k
n
with probability knk pp 1 to permutations
and Combinations
DistributionExamples
79
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution (continue – 2)
knkknk ppk
npp
knk
nnkp
11
!!
!,
pnpppnpp
knk
npn
ppini
npnpp
ini
niXE
nn
k
knkik
n
i
inin
i
ini
11
0
11
1
1
0
11!1!
!1
1!!1
!11
!!
!
Mean Value
Moment Generating Function
nj
n
k
knkjn
k
knkkjXj
pep
ppeknk
npp
knk
neeE
1
1!!
!1
!!
!
00
80
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution (continue – 3)
knkknk ppk
npp
knk
nnkp
11
!!
!,
n
i
inin
i
ini
n
i
inin
i
ini
ppini
npp
ini
n
ppini
nipp
ini
niXE
12
00
22
1!!1
!1
!!2
!
1!!1
!1
!!
!
pnpnn
ppknk
npnpp
mnm
npnn
ppini
npnpp
ini
npnn
nn pp
n
k
knk
pp
n
m
mnm
n
i
inin
i
ini
2
1
0
1
1
0
22
1
1
2
22
1
1!1!
!11
!2!
!21
1!!1
!11
!!2
!21
12
ppnpnpnpnnXEXEXVar 11 2222
Variance
81
SOLO Review of Probability
Bernoulli Trials – The Binomial Distribution (continue – 4)
Let apply Chebyshev’s Inequality:
pnxE Mean Value
Variance ppnXEXExVar 122
2
2
2
222Pr
kk
XEXEkXEX
2
22 1Pr
k
ppnkpnX
we obtain:
An upper bound to this inequality, when p varies (0 ≤ p ≤ 1), can be obtained bytaking the derivative of p (1 – p), equating to zero, and solving for p. The result isp = 0.5.
2
22
4Pr
k
nkpnX
Chebyshev’s Inequality
We can see that when k → ∞ ,i.e. X converges in Probability to Mean Value n p. This is known as Bernoulli’s Theorem.
0Pr 22P
kpnX Convergence in
Probability
82
SOLO Review of Probability
Generalized Bernoulli Trials
Consider now r mutually exclusive events A1, A2, …, Ar
rjiOAA ji ,,2,1 with their sum equal to certain event S: SAAA r 21
and the probability of occurrence rr pAppAppAp ,,, 2211
Therefore 12121 rr pppApApAp
We want to find the probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar, kr times such that nkkk r 21
!!!
!
21 rkkk
n
The number of possible combinations of k1 events A1, k2 events A2, …,kr events Ar
is and the probability of each combination isrk
rkk ppp 21
21
We obtain the probability of the Generalized Bernoulli Trials as
rkr
kk
rr ppp
kkk
nnkkkp
21
2121
21 !!!
!,,,,
Permu-tations
& combi-nations
Table of Content
83
SOLO Review of Probability
Poisson Asymptotical Development (Law of Rare Events)
knkknk ppknk
npp
k
nnkp
1
!!
!1,Start with the Binomial Distribution
We assume that n >> 1 and 1/1/ 00 nknkp
0
0
000 111,0 kn
k
k
nn
n en
k
n
kpnkp
nkpk
k
n
kn
nk
nn
nkpk
k
n
kn
knnn
n
k
n
k
k
knnnnkp
k
kk
kk
kk
knk
n
,0!
1
11
11
,0!
1
11
1!
11,
0
1
0
0
0
00
00 exp!
, kk
knkp
k
This is Poisson Asymptotical Development (Law of Rare Events)
Siméon Denis Poisson1781-1840
DistributionExamples
84
SOLO Review of Probability
Poisson Distribution
Siméon Denis Poisson1781-1840
integer positive
!
exp; k
kkp
k
!
,1
!;;
00 k
k
i
eipkP
k
i
ik
i
exp
0
1:
1
1
0 !exp
!1exp
!
exp
k
kik
i
i
i
i
kii
iXE
Probability Density Functions
Mean Value
Variance 22 XEXExVar
Moment Generating Function
1expexpexpexp
!
expexp
!
expexpexp
00
jje
m
j
m
mjkjE
m
m
m
m
2
exp
1
1
exp
2
2
1
1
0
22
!1!2exp
!1exp
!
exp
i
i
i
i
i
i
i
i
iii
i
i
iXE
85
SOLO Review of Probability
Poisson Distribution
Moment Generating Function 1expexp j
Approximation to the Gaussian Distribution
sin2/sin2sin1cos1exp 2 jjj
For λ sufficient large Φ (ω) is negligible for all but very small values of ω, in which case sin&2/2/sin 22
jj2
exp1expexp2
j22
2
1exp
For a normal distribution with mean μ and variance σ2 we found the Moment Generating Function:
Therefore the Poisson Distribution can be approximated by a Gaussian Distribution with mean μ = λ and variance σ2 = λ
2exp
2
1~
!
exp;
2k
kkp
k
86
SOLO Review of Probability
Poisson Distribution
Siméon Denis Poisson1781-1840
integer positive
!
exp; k
kkp
k
!
,1
!;;
00 k
k
i
eipkP
k
i
ik
i
xE
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance xVar
x
a dtttxa0
1 exp,γ is the incomplete gamma function
Moment Generating Function
1expexp j
Table of Content
87
SOLO Review of Probability
Normal (Gaussian) Distribution
Karl Friederich Gauss1777-1855
2
2exp
,;2
2
x
xp
x
duu
xP2
2
2exp
2
1,;
xE
xVar
2exp
exp2
exp2
1
exp
22
2
2
j
duuju
xjE
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
DistributionExamples
Table of Content
88
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development
knkknk ppknk
npp
k
nnkp
1
!!
!1,Start with the Binomial Distribution
Use Stirling asymptotical approximation nnnnn exp2!
knk
knk
knk
n
kn
qn
k
pn
knk
n
qpknknknkkk
nnnnkp
2
exp2exp2
exp2,
Define pnkkkpnk k 1:&1: 00
qkqk
pnk
qk
pkn
qpn
knkqp
knk
n
nkp
nkp
k
knk
knk
11
11
1
!
!1!1
!!
!
,1
,11
nkpnkppnk
nkpnkppnk
,1,1
,1,1
89
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development (continue – 1)
knk
kn
qn
k
pn
knk
nnkp
2,
or
pnkpnkn
k
n
11
0 &
qpn:
nkk
qn
kn
pn
k
qpn
knknkp
2
1
222
1,
kk pn
k
qn
kkk
pnqnpnqnnkp
1111
2
1,
2
1
2
1
2
2
1
2
1
211
2
1kk qn
k
pn
k
qnpn
kk
k
k
qn
k
pn
k
pn
qn qnpn
11
2
12
2
1
2
1
90
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development (continue – 2)
kk qn
k
pn
k
qnpnnkp
11
2
1,
2
2
22
22
2
22
22
1ln
2
2
11
2
22
1ln1ln,2ln
2
2
k
qpnk
kkk
kkk
xxx
kk
kk
qnpn
qnqnqn
pnpnpn
qnqn
pnpnnkp
from which
2
2
2 2exp
2
1,
knkp
Pierre-Simon Laplace(1749-1827)
DistributionExamples
Abraham de Moivre(1667-1754)
This result was first published by De Moivre in1756 in “The Doctrine of Chance” 3th Ed. and reviewed by Laplace, “Théorie Analytiques de Probabilités”, 1820
CentralLimit
Theorem
91
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development for Generalized Bernoulli Trials
Consider the r mutually exclusive events A1, A2, …, Ar
rjiOAA ji ,,2,1 with their sum equal to certain event S: SAAA r 21
and the probability of occurrence rr pAppAppAp ,,, 2211
Therefore 12121 rr pppApApAp
The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar, kr times such that nkkk r 21
rkr
kk
rr ppp
kkk
nnkkkp
21
2121
21 !!!
!,,,,
For n goes to infinity and we havenpnknpn iii
rr
r
rr
kr
kk
r ppn
pnpnk
pnpnk
pppkkk
nr
11
2
1
211
2121 2
21
exp
!!!
!21
92
SOLO Review of Probability
De Moivre-Laplace Asymptotical Development for Generalized PoissonTrials
Consider the r-1 mutually exclusive events A1, A2, …, Ar-1
rjiOAA ji ,,2,1 with small probability of occurrence 112211 ,,, rr pAppAppAp
such that 11:121121 rrr ppppApApAp
The probability that in n trials will obtain A1, k1 times, A2, k2 times, and so on, and Ar-1, kr-1 times such that rr knkkk :121
rkr
kk
rr ppp
kkk
nnkkkp
21
2121
21 !!!
!,,,,
For n goes to infinity
!
exp
!
exp
!!!
!
1
11
1
1121
21
11
21
r
rk
rk
kr
kk
r k
pnpn
k
pnpnppp
kkk
n r
r
Table of Content
93
SOLO Review of Probability
Laplacian Distribution
Pierre-Simon Laplace(1749-1827)
b
x
bbxp
exp
2
1,;
x
dub
u
bbxP
exp
2
1,;
xE
22bxVar
221
exp
expexp2
1
exp
b
j
duujb
u
b
xjEX
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
DistributionExamples
Table of Content
94
SOLO Review of Probability
Gama Distribution
00
0/exp
,;1
x
xxk
x
kxpk
k
kxE
2kxVar
k
X
j
xjE
1
exp
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance Moment Generating Function
00
0/,
,;
x
xk
xk
kxP
Γ is the gamma function
0
1 exp dttta a
x
a dtttxa0
1 exp,γ is the incomplete gamma function
DistributionExamples
Table of Content
95
SOLO Review of Probability
Beta Distribution
11
1
0
11
11
11
1,;
xx
duuu
xxxp
x
duuuxP0
11 1,;
xE
12
xVar
1
1
0 !1
exp
k
kk
r
X
k
j
r
r
xjE
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
Γ is the gamma function
0
1 exp dttta a
DistributionExamples
Beta DistributionExample Table of Content
96
SOLO Review of Probability
Cauchy Distribution
Augustin Louis Cauchy ( 1789-1857)
22
02
0
0
1
1
1,;
xxxx
xxp
2
1arctan
1,; 0
0
xx
xxP
Probability Density Functions
Cumulative Distribution Function
Mean Value not defined
Variance not defined
Moment Generating Function not defined
DistributionExamples
97
SOLO Review of ProbabilityCauchy Distribution
elsewere
p
0
2
111
1
Example of Cauchy Distribution DerivationParticle
Trajectory
O
a
y
x
Assume a particle is leaving the origin, moving with constant velocity toward a wall situated at a distance a from the origin. The angle θ, between particle velocity vector and Ox axis, is a random variable uniform distributed between – θ1 and + θ1. Find the probability distribution function of y, the distance from Ox axis at which the particle hits the wall. tanay y
2/1
12/
elsewere
ya
a
elsewere
a
add
pypY
0
2/
0
tan1
2/1
tan
11221
1121
Therefore we obtainFunctions of
One Random Variable
2/1 12/ 2/1 12/
p ypY
Table of Content
98
SOLO Review of Probability
Exponential Distribution
00
0exp;
x
xxxp
1expexp
exp
00exp
0
dxxxx
dxxxxE
xu
dxxdv
2
22 1
xExExVar
1
0
0
1exp
expexpexp
jxj
j
dxxxjxjEX
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
00
0exp1exp;
x
xxdxxxP
x
2
0
2
222 2
d
djxE X
Distributionsexamples
Table of Content
99
SOLO Review of Probability
Chi-square Distribution
00
02/exp2/
2/1;
2/2
2/
x
xxxkkxp
k
k
kxE
kxVar 2
2/21
expk
X
j
xjE
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance Moment Generating Function
00
02/
2/,2/
;
x
xk
xk
kxP
Γ is the gamma function
0
1 exp dttta a
x
a dtttxa0
1 exp,γ is the incomplete gamma function
Distributionsexamples
100
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions
Given k normal random independent variables X1, X2,…,Xk with zero men values and same variance σ2, their joint density is given by
2
22
12/
12/1
2
2
1 2exp
2
1
2
2exp
,,1
k
kk
k
i
i
normal
tindependenkXX
xx
x
xxpk
Define
Chi-square 0:: 22
1
2 kk xxy
Chi 0: 22
1 kk xx
kkkkkk dxxdp
k 22
1Pr
The region in χk space, where pΧk (χk) is constant, is a hyper-shell of a volume
(A to be defined)
dAVd k 1
Vd
kk
kkkkkkkk dAdxxdpk
1
2
2
2/
22
1 2exp
2
1Pr
2
2
2/
1
2exp
2
k
kk
k
k
Ap
k
Compute
1x
2x
3x
d ddV 24
101
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions (continue – 1)
kk
kk
k
k UA
pk
2
2
2/
1
2exp
2
Chi-square 0: 22
1
2 kk xxy
00
02
exp22
1 2
2/1
2/
0
2
22
y
yy
yy
A
ypyp
d
ydypp
k
kk
y
k
Yk kkk
A is determined from the condition 1
dyypY
2/
212/
222exp
22
2/
2/20
2
2
2
22/ kAk
Ayd
yyAdyyp
k
k
k
kY
yUyy
kkyp
kk
Y
2
2/2
2
2/
2exp
2/
2/1,;
Γ is the gamma function
0
1 exp dttta a
kk
k
k
k
k
k Uk
pk
2
212/2
2exp
2/
2/1
00
01:
a
aaU
Function ofOne Random
Variable
102
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions (continue – 2)
Table of Content
Chi-square 0: 22
1
2 kk xxy
Mean Value 2 2 2 21k kE E x E x k
4
2 42 2 4
0
1, ,& 3
th
i
i i
Moment of aGauss Distribution
x i i i i
x E x
i kE x x E x x
2
4
2 4
22 22 2 2 2 2 4 2 2 4
1
2 2 2 4 4 2 2 2 4
1 1 1 1 13
2 2 4 43 2
k
k
k k ii
k k k k k
i j i i ji j i i j
i j
k k
E k E k E x k
E x x k E x E x x k
k k k k k
k
kMain
DiagonalkVariance 2
22 2 2 42k
kE k k
where xi
are gaussianwith
Gauss’ Distribution
103
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions (continue – 3)
Tail probabilities of the chi-square and normal densities.
The Table presents the points on the chi-square distribution for a given upper tail probability
xyQ Pr
where y = χn2 and n is the number of degrees
of freedom. This tabulated function is also known as the complementary distribution.
An alternative way of writing the previousequation is: QxyQ n 1Pr1 2which indicates that at the left of the point xthe probability mass is 1 – Q. This is 100 (1 – Q) percentile point.
Examples
1. The 95 % probability region for χ22 variable
can be taken at the one-sided probabilityregion (cutting off the 5% upper tail): 99.5,095.0,0 2
2
5.99
2. Or the two-sided probability region (cutting off both 2.5% tails): 38.7,05.0975.0,025.0 22
22
0.51
0.975 0.0250.05
7.38
3. For χ1002 variable, the two-sided 95% probability region (cutting off both 2.5% tails) is:
130,74975.0,025.0 2100
2100
74130
104
SOLO Review of Probability
Derivation of Chi and Chi-square Distributions (continue – 4)
Note the skewedness of the chi-square distribution: the above two-sided regions arenot symmetric about the corresponding means
nE n 2
Tail probabilities of the chi-square and normal densities.
For degrees of freedom above 100, thefollowing approximation of the points on thechi-square distribution can be used:
22 1212
11 nQQn G
where G ( ) is given in the last line of the Tableand shows the point x on the standard (zeromean and unity variance) Gaussian distributionfor the same tail probabilities.In the case Pr { y } = N (y; 0,1) and withQ = Pr { y>x }, we have x (1-Q) :=G (1-Q)
5.990.51
0.975 0.0250.05
7.38
Table of Content
105
SOLO Review of ProbabilityStudent’s t-Distribution
2/12 /12/
2/1;
x
xp
1
10
undefinedxE
otherwisexVar
22/
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function not defined
0
2
!2
3
2
1
2
1
2/
2/1
2
1;
n
n
n
nn
n
x
xxP
Γ is the gamma function
0
1 exp dttta a
121: naaaaa n
It get his name after W.S. Gosset that wrote under pseudonym “Student”
William Sealey Gosset
1876 - 1937
Distributionsexamples
Table of Content
106
SOLO Review of Probability
Uniform Distribution (Continuous)
bxxa
bxaabbaxp
0
1,;
2
baxE
12
2abxVar
abj
ajbj
xjE
expexp
exp
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
bx
bxaab
ax
xa
baxP
1
0
,;
Distributionsexamples
Moments
Table of Content
107
SOLO Review of Probability
Rayleigh Distribution
2
2
2
2exp
;
xx
xp
2
xE
2
2
4 xVar
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance
Moment Generating Function
2
2
2exp1;
x
xP
jerfi
222/exp1 22
John William Strutt
Lord Rayleigh
(1842-1919)
Distributionsexamples
Moments
Rayleigh Distribution is the chi-distribution with k=2
kk
k
k
k
k
k Uk
pk
2
212/2
2exp
2/
2/1
108
SOLO Review of Probability
Rayleigh Distribution
Given X and Y, two independent gaussian random variables, with zero means and thesame variances σ2
Example of Rayleigh Distribution
2
22
2 2exp
2
1,
yx
yxpXY
find the distributions of R and Θ given by: XYYXR /tan& 122
dprdrpdrdrr
ydxdyxydxdyxpdrdrp
r
XYR
22
2
22
22
22exp
22exp,,
where:
20
2
1p
02
exp2
2
2
r
rrrpr
Uniform Distribution
Rayleigh Distribution
Solution
Table of Content
x
y
r
109
SOLO Review of Probability
Rice Distribution
202
2
22
2exp
,;
vx
I
vxx
vxp
2
xE
2
2
4 xVar
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance Moment Generating Function
2
2
2exp1;
x
xP
jerfi
222/exp1 22
Stuart Arthur Rice1889 - 1969
Distributionsexamples
where:
2
0220 '
2
'cosexp
2
1d
vxvxI
110
SOLO Review of Probability
Rice Distribution
The Rice Distribution applies to the statistics of the envelope of the output of a bandpassfilter consisting of signal plus noise.
Example of Rice Distribution
tAtntAtn
ttnttntAtnts
SC
SC
00
000
sinsincoscos
sincoscos
X = nC (t) and Y = nS (t) are gaussian random variables, with zero mean and the samevariances σ2 and φ is the unknown but constant signal phase.
Define the output envelope R and phase Θ:
cos/sintan
sincos1
22
AtnAtn
AtnAtnR
CS
SC
222
22
22
2
2
2
22
cosexp
2exp
22
sinexp
2
cosexp,,
drdrrAAr
ydxdAyAxydxdyxpdrdrp XYR
Solution
2
0222
222
0 2
cosexp
22exp, d
rArArdrprp RR
111
SOLO Review of Probability
Rice Distribution
Example of Rice Distribution (continue – 1)
2
022
22
2
2
0
'2
'cosexp
2
1
2exp, d
rAArrdrprp RR
where:
2
0220 '
2
'cosexp
2
1d
rAArI
is the zero-order modified Bessel function of the first kind
202
22
2 2exp,;
Ar
IArr
ArpR Rice Distribution
Since I0 (0) = 1, if in the Rice Distribution we take A = 0 we obtain:
Rayleigh Distribution
2
2
2 2exp,0;
rr
ArpR
Table of Content
112
SOLO Review of Probability
Weibull Distribution
00
0,,exp,,;
1
x
xxx
xp
x
dxxpxPx
exp1,,;,,;
1
1xE
Γ is the gamma function
0
1 exp dttta a
Ernst HjalmarWaloddi Weibull
1887 - 1979
Probability Density Functions
Cumulative Distribution Function
Mean Value
Variance 22 21 xExVar
Distributionsexamples
Table of Content
113
KINETIC THEORY OF GASESSOLOMAXWELL’S VELOCITY DISTRIBUTION
IN 1859 MAXWELL PROPOSED THE FOLLOWING MODEL:
ASSUME THAT THE VELOCITY COMPONENTS OF N MOLECULES, ENCLOSED IN A CUBE WITH SIDE l, ALONG EACH OF THE THREE COORDINATE AXES ARE INDEPENDENTLY AND IDENTICALLY DISTRIBUTED ACCORDING TO THE DENSITY f0(α) = f0(-α), I.E.,
JAMES CLERKMAXWELL
(1831 – 1879)
zyx
zyxzzyyxx
vdvdvdvvvvBA
vdvdvdvvfvvfvvfvdvf
00
0000003
0
exp
f (Vi) d Vi = THE PROBABILITY THAT THE i VELOCITY
COMPONENTS IS BETWEEN vi AND vi + d vi ; i=x,y,zMAXWELL ASSUMED THAT THE DISTRIBUTION DEPENDS ONLY ON THE MAGNITUDE OF THE VELOCITY.
114
KINETIC THEORY OF GASESSOLOMAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
SINCE THE DEFINITION OF THE TOTAL NUMBER OF PARTICLES N IS:
tvrfvdrdN ,,33
WE HAVE IN EQUILIBRIUM
2
3
222
2220
3
expexpexp
exp
BAdvvBdvvBdvvBA
dvdvdvvvvBAvfvdV
N
zzyyxx
zyxxxx
WHERE V IS THE VOLUME OF THE CONTAINER rdV 3
IT FOLLOWS THAT B > 0 AND
V
NBA
2
3
LET FIND THE CONSTANTS A, B AND IN 200 exp vvBAvf
0v
115
KINETIC THEORY OF GASESSOLOMAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
LET FIND THE CONSTANTS A, B AND IN 200 exp vvBAvf
0v
THE AVERAGE VELOCITY IS GIVEN BY:
003
003
03
03
exp
exp
vvvBvvvdN
VA
vvvvBvvdN
VA
vfvd
vfvvdv
THE AVERAGE KINEMATIC ENERGY OF THE MOLECULES ε WHEN IS
00
v
B
mvBvvd
N
VAm
vfvd
vfvmvd
4
3exp
221
223
03
023
WE FOUND ALSO THAT FOR A MONOATOMIC GAS Tk2
3
Tk
mmB
24
3
V
N
Tk
m
V
NBA
2
3
2
3
2
THEREFORE
116
KINETIC THEORY OF GASESSOLOMAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
MAXWELL VELOCITY DISTRIBUTION BECOMES
vv
Tk
m
Tk
m
V
Nvf
2
exp2
2
3
0
zyxzyx
zyxzyx
vdvdvdvvvTk
m
Tk
m
V
N
vdvdvdvfvfvfvdvf
222
2/3
0
2exp
2
3
OR
117
KINETIC THEORY OF GASESSOLOMAXWELL’s VELOCITY DISTRIBUTION (CONTINUE)
vv
Tk
m
Tk
m
V
Nvf
2
exp2
2
3
0
M
kTvrms
3
M
kTv
8
VkT
M2/1
2
vfN
V
M
Tk 0
2/32
Most probablespeed Main speed
Root Meansquared speed
Tk
vM
Tk
M
V
Nvf
2exp
2
22
3
0
M
kTvmp
2
Table of Content
Maxwell’s Distribution is the chi-distribution with k=3
kk
k
k
k
k
k Uk
pk
2
212/2
2exp
2/
2/1
118
KINETIC THEORY OF GASESSOLO
MOLECULAR MODELS
BOLTZMANN STATISTICS
• DISTINGUISHABLE PARTICLES• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
BOSE-EINSTEIN STATISTICS
• INDISTINGUISHABLE PARTICLES• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
FERMI-DIRAC STATISTICS
• INDISTINGUISHABLE PARTICLES• ON PARTICLE PER QUANTUM STATE.
LUDWIG BOLTZMANN
SATYENDRANATH N. BOSE ALBERT EINSTEIN
ENRICO FERMI PAUL A.M. DIRAC
j j
Nj
N
gNw
j
!!
j jjj
jj
Ng
Ngw
!!1
!1
j jjjj
j
NNg
gw
!!
!
j
jNN j
jj NE '
NUMBER OF MICROSTATESFOR A GIVEN MACROSTATE
NUMBER OF MICROSTATESFOR A GIVEN MACROSTATE
NUMBER OF MICROSTATESFOR A GIVEN MACROSTATE
Table of Content
119
KINETIC THEORY OF GASESSOLO
MOLECULAR MODELS
BOLTZMANN STATISTICS
• DISTINGUISHABLE PARTICLES• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
LUDWIG BOLTZMANN
j j
Nj
Boltz N
gNw
j
!!NUMBER OF MICROSTATES
FOR A GIVEN MACROSTATE
MASS (N) FIXED
VOLUME (V) FIXED
ENERGY (E) FIXED
j
jNN
j
jj NE '
NUMBER OF WAYS N DISTINGUISHABLE PARTICLES CAN BE DIVIDED IN GROUPS WITH N1, N2,…,Nj,…PARTICLES IS
j
jNN
j
jN
N
!
!
NUMBER OF WAYS Nj PARTICLES CAN BE PLASED IN THE gj STATES IS jNjg
A MACROSTATE IS DEFINED BY- QUANTUM STATES g1,g2,…,gj
AT THE ENERGY LEVELS
- NUMBER OF PARTICLES N1,N2,…NjIN STATES g1,g2, …,gj
j',,',' 21
120
KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE
BOLTZMANN STATISTICS
j j
Nj
Boltz N
gNw
j
!!
USING STIRLING FORMULA
0'' EdNdNEj
jjj
jj
aaaa ln!ln
j
jjjjj
STIRLING
jjjj NNNgNNNNNgNNw lnlnln!lnln!lnln
0lnlnln j
jjjjj NdNNdgNdwd
0 NdNdNNj
jj
j
TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST COMPUTE THE DIFFERENTIAL
CONSTRAINTED BY:
121
KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE
BOLTZMANN STATISTICS (CONTINUE)
j j
Nj
Boltz N
gNw
j
!!
0' j
jj Nd
0lnln
jj
j
j Ndg
Nwd
0j
jNd
WE OBTAIN
LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS
0'*
ln0'ln
j
j
j
jjj
j
j
g
NNd
g
N
,
TO OBTAIN
OR jeegN jBoltzj'*
BOLTZMANNMOST PROBABLE MACROSTATE
Table of Content
122
KINETIC THEORY OF GASESSOLO
MOLECULAR MODELS
BOSE-EINSTEIN STATISTICS
• INDISTINGUISHABLE PARTICLES• NO LIMIT ON THE NUMBER OF
PARTICLES PER QUANTUM STATE.
NUMBER OF MICROSTATESFOR A GIVEN MACROSTATE
MASS (N) FIXED
VOLUME (V) FIXED
ENERGY (E) FIXED
j
jNN
j
jj NE '
j
jNN
NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED IN THE gj STATES IS
!!1
!1
jj
jj
Ng
Ng
A MACROSTATE IS DEFINED BY- QUANTUM STATES g1,g2,…,gj
AT THE ENERGY LEVELS
- NUMBER OF PARTICLES N1,N2,…NjIN STATES g1,g2, …,gj
SATYENDRANATH N. BOSE(1894-1974)
ALBERT EINSTEIN(1879-1955)
j',,',' 21
j jj
jjEB Ng
Ngw
!!1
!1
123
KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE
BOSE-EINSTEIN STATISTICS(CONTINUE)
USING STIRLING FORMULA aaaa ln!ln
j j
jjjjj
jjjjjjjjjjjjj
STIRLING
jjjjj
g
NgNgN
NNNggggNgNgNNggNw
1ln/1ln
lnlnln!ln!ln!ln
01ln1
1
11lnln
2
jj
j
j
jj
j
j
jj
j
j
j
j
jj
j NdN
gNd
g
N
gg
N
g
N
g
NN
gwd
TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST COMPUTE THE DIFFERENTIAL
j jj
jj
j jj
jjEB Ng
Ng
Ng
Ngw
!!
!
!!1
!1
124
KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE
BOSE-EINSTEIN STATISTICS (CONTINUE)
0' j
jj Nd0j
jNd
WE OBTAIN
LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS
0'*
1ln0'1ln
j
j
j
jjj
j
j
N
gNd
N
g
,
TO OBTAIN
OR
1* '
jee
gN j
EBj BOSE-EINSTEIN
MOST PROBABLE MACROSTATE
j jj
jj
j jj
jjEB Ng
Ng
Ng
Ngw
!!
!
!!1
!1
01lnln
jj
j
j NdN
gwd
Table of Content
125
KINETIC THEORY OF GASESSOLO
MOLECULAR MODELS
FERMI-DIRAC STATISTICS
NUMBER OF MICROSTATESFOR A GIVEN MACROSTATE
MASS (N) FIXED
VOLUME (V) FIXED
ENERGY (E) FIXED
j
jNN
j
jj NE '
j
jNN
NUMBER OF WAYS Nj INDISTINGUISHABLE PARTICLES CAN BE PLASED IN THE gj STATES IS !!
!
jjj
j
NNg
g
j jjj
jDF NNg
gw
!!
!
• INDISTINGUISHABLE PARTICLES• ON PARTICLE PER QUANTUM STATE.
ENRICO FERMI(1901-1954)
PAUL A.M. DIRAC(1902-1984)
A MACROSTATE IS DEFINED BY- QUANTUM STATES g1,g2,…,gj AT THE
ENERGY LEVELS
- NUMBER OF PARTICLES N1,N2,…Nj AT THE ENERGY LEVELSIN STATES g1,g2, …,gj
j',,',' 21
j',,',' 21
126
KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE
FERMI-DIRAC STATISTICS(CONTINUE)
USING STIRLING FORMULA aaaa ln!ln
jjjjjjjjj
jjjjjjjjjjjjj
STIRLING
jjjjj
NNNgNggg
NNNNgNgNggggNNggw
lnlnln
lnlnln!ln!ln!lnln
0lnlnlnln
jj
j
jj
jjjjjjjj Nd
N
NgNdNNdNdNgNdwd
TO CALCULATE THE MOST PROBABLE MACROSTATE WE MUST COMPUTE THE DIFFERENTIAL
j jjj
jDF NNg
gw
!!
!
127
KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE
FERMI-DIRAC STATISTICS (CONTINUE)
0' j
jj Nd0j
jNd
WE OBTAIN
LET ADJOIN THE CONSTRAINTS USING THE LAGRANGE MULTIPLIERS
0'*
*ln0'ln
j
j
jj
jjj
j
jj
N
NgNd
N
Ng
,
TO OBTAIN
OR
1* '
jee
gN j
DFj FERMI-DIRAC
MOST PROBABLE MACROSTATE
0lnln
jj
j
jj NdN
Ngwd
j jjj
jDF NNg
gw
!!
!
128
KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE
FERMI-DIRAC STATISTICS
OR
j jjj
jDF NNg
gw
!!
!
j jj
jjEB Ng
Ngw
!!1
!1
BOSE-EINSTEIN STATISTICS
j j
Nj
Boltz N
gNw
j
!!
BOLTZMANNSTATISTICS
FOR GASES AT LOW PRESSURES OR HIGH TEMPERATURE THE NUMBEROF QUANTUM STATES gj AVAILABLE AT ANY LEVEL IS MUCH LARGERTHAN THE NUMBER OF PARTICLES IN THAT LEVEL Nj.
jj Ng
j
jjN
j
Ng
jjjjjj
jj gNggggg
Ng
121
!1
!1
jjj
Nj
Ng
jjjjjj
j gNgggNg
g
11!
!
j j
NjBoltz
Ng
DF
Ng
EB N
g
N
www
jjjjj
!!AND j
jjjj
eegNNN jBoltzj
Ng
DFj
Ng
EBj'***
129
KINETIC THEORY OF GASESSOLOTHE MOST PROBABLE MACROSTATE – THE THERMODYNAMIC EQUILIBRIUM STATE
j j
NjBoltz
Ng
DF
Ng
EB N
g
N
www
jjjjj
!!AND j
jjjj
eegNNN jBoltzj
Ng
DFj
Ng
EBj'***
DIVIDING THE VALUE OF w FOR BOLTZMANN STATISTICS, WHICHASSUMED DISTINGUISHABLE PARTICLES, BY N! HAS THE EFFECT OFDISCOUNTING THE DISTINGUISHABILITY OF THE N PARTICLES.
Table of Content
130
SOLO Review of ProbabilityMonte Carlo Method
Monte Carlo methods are a class of computational algorithms that rely on repeated random sampling to compute their results. Monte Carlo methods are often used when simulating physical and mathematical systems. Because of their reliance on repeated computation and random or pseudo-random numbers, Monte Carlo methods are most suited to calculation by a computer. Monte Carlo methods tend to be used when it is infeasible or impossible to compute an exact result with a deterministic algorithm.
The term Monte Carlo method was coined in the 1940s by physicists Stanislaw Ulam, Enrico Fermi, John von Neumann, and Nicholas Metropolis, working on nuclear weapon projects in the Los Alamos National Laboratory (reference to the Monte Carlo Casino in Monaco where Ulam's uncle would borrow money to gamble)
Stanislaw Ulam1909 - 1984
Enrico - Fermi1901 - 1954
John von Neumann1903 - 1957
Monte Carlo Casino
Nicholas Constantine Metropolis
(1915 –1999)
131
SOLO Review of Probability
Monte Carlo Approximation
Monte Carlo runs, generate a set of random samples that approximate the distribution p (x). So, with P samples, expectations with respect to the filtering distribution are approximated by
P
L
LxfP
dxxpxf1
1
and , in the usual way for Monte Carlo, can give all the moments etc. of the distribution up to some degree of approximation.
P
L
LxP
dxxpxxE1
1
1
P
L
nLnnn x
PdxxpxxE
1111
1
Table of Content
x(L) are generated (draw) samples from distribution p (x) xpx L ~
132
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (Unknown Statistics)
jimxExE ji ,
DefineEstimation of thePopulation mean
k
iik x
km
1
1:ˆ
A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,and sample variance, , as estimates of the population mean, m, and variance, σ2.
2ˆ kkm̂
2
1
2
1
2222
22222
1 112
1
2
2
11
2
1
2
111
1
11
121
112
1
ˆˆ21
ˆ1
k
k
kk
mkmkkk
mmkk
mk
xxk
Exk
xExEk
mxmxEk
mxk
E
k
i
k
i
k
i
k
ll
k
jj
k
jjii
k
k
iik
k
ii
k
iki
jimxExE ji ,2222
mxEk
mEk
iik
1
1ˆ
jimxExExxE ji
tindependenxx
ji
ji
,2,
Compute
Biased
k elements
k elements
k elementsk2-k elements
Unbiased
Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
133
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 1)
jimxExE ji ,
DefineEstimation of thePopulation mean
k
iik x
km
1
1:ˆ
A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,and sample variance, , as estimates of the population mean, m, and variance, σ2.
2ˆ kkm̂
2
1
2 1ˆ
1 k
kmx
kE
k
iki
jimxExE ji ,2222
mxEk
mEk
iik
1
1ˆ
jimxExExxE ji
tindependenxx
ji
ji
,2,
Biased
Unbiased
Therefore, the unbiased estimation of the sample variance of the population is defined as:
k
ikik mx
k 1
22 ˆ1
1:̂ since 2
1
22 ˆ1
1:ˆ
k
ikik mx
kEE
Unbiased
Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
134
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 2)
A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to compute the sample mean, ,and sample variance, , as estimates of the population mean, m, and variance, σ2.
2ˆ kkm̂
k
iik x
km
1
1:ˆ
ix
m
k1 2 3
k
ikik mx
k 1
22 ˆ1
1:̂
mxEk
mEk
iik
1
1ˆ
2
1
22 ˆ1
1:ˆ
k
ikik mx
kEE
Monte Carlo simulations assume independent and identical distributed (i.i.d.) samples.
135
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 3)
mxEk
mEk
iik
1
1ˆ 2
1
22 ˆ1
1:ˆ
k
ikik mx
kEE
We found:
Let Compute:
k
mxEmxEmxEk
mxmxEmxEk
mxk
Emxk
EmmE
k
i
k
ijj
ji
k
ii
k
i
k
ijj
ji
k
ii
k
ii
k
iikmk
2
1 100
1
2
2
1 11
2
2
2
1
2
1
22ˆ
2
1
1
11ˆ:
k
mmE kmk
222
ˆ ˆ:
136
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 4)
Let Compute:
2
22
11
2
2
2
1
22
2
2
1
22
2
1
22222
ˆ
ˆ11
ˆ2
1
1
ˆˆ21
1
ˆ1
1ˆ
1
1ˆ:2
k
k
ii
kk
ii
k
ikkii
k
iki
k
ikik
mmk
kmx
k
mmmx
kE
mmmmmxmxk
E
mmmxk
Emxk
EEk
k
k
k
ii
kk
ii
k
k
k
ii
k
k
ii
k
kk
ii
k
k
k
k
ii
k
kk
i
k
ijj
ji
k
k
ii
mmEk
kmxE
k
mmEmxE
k
mmEk
mxEk
mxEk
mmEkmxE
k
mmE
mmEk
kmxE
k
mmEmxEmxEmxE
kk
/
22
10
2
0
10
2
3
1
22
1
2
2
/
2
1
3
2
0
44
2
2
1
2
2
/
2
1 1
22
1
4
2
2
ˆ
2
222
22
22
4
2
ˆ1
2
1
ˆ4
1
ˆ4
1
2
1
ˆ2
1
ˆ4
ˆ11
ˆ4
1
1
Since (xi – m), (xj - m) and are all independent for i ≠ j: kmm ˆ
137
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 4)
Since (xi – m), (xj - m) and are all independent for i ≠ j: kmm ˆ
4
2
24
224
44
2
4
44
2
2
2
4
2
4
242
ˆ
ˆ11
7
11
2
1
2
1
2
ˆ11
4
1
1
12
k
k
mmEk
k
k
k
k
k
kk
k
k
k
mmEk
k
kk
kk
k
kk
kk
442
ˆ 2
4
4 : mxE i
k
k
k
ii
kk
ii
k
k
k
ii
k
k
ii
k
kk
ii
k
k
k
k
ii
k
kk
i
k
ijj
ji
k
k
ii
mmEk
kmxE
k
mmEmxE
k
mmEk
mxEk
mxEk
mmEkmxE
k
mmE
mmEk
kmxE
k
mmEmxEmxEmxE
kk
/
22
10
2
0
10
2
3
1
22
1
2
2
/
2
1
3
2
0
44
2
2
1
2
2
/
2
1 1
22
1
4
2
2
ˆ
2
222
22
22
4
2
ˆ1
2
1
ˆ4
1
ˆ4
1
2
1
ˆ2
1
ˆ4
ˆ11
ˆ4
1
1
138
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 5)
mxEk
mEk
iik
1
1ˆ
2
1
22 ˆ1
1:ˆ
k
ikik mx
kEE
We found:
k
mmE kmk
222
ˆ ˆ:
k
mxk
EEk
ikik
k
44
2
2
1
22222
ˆˆ
1
1ˆ:2
44 : mxE i
Kurtosis of random variable xiDefine
44:
k
mxk
EEk
ikik
k
42
2
1
22222
ˆ
1ˆ
1
1ˆ:2
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 6)
2ˆ
2k
2
kˆ-0Prob n
For high values of k, according to the Central Limit Theorem the estimations of mean and of variance are approximately Gaussian Random Variables.
km̂2ˆ k
We want to find a region around that will contain σ2 with a predefined probabilityφ as function of the number of iterations k.
2ˆ k
Since are approximately Gaussian Random Variables nσ is given by solving:
2ˆ k
n
n
d2
2
1exp
2
1
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
Cumulative Probability within nσ
Standard Deviation of the Mean for aGaussian Random Variable
22k
22 1ˆ-
1 k
nk
n
22k
2 11
ˆ-11
kn
kn
42222 1,0;ˆ~ˆ&,0;ˆ~ˆ kkkk kmmmk NN
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 7)
2ˆ
2k
2
kˆ-0Prob n
22k
22 1ˆ-
1 k
nk
n
22k
2 11
ˆ-11
kn
kn
22
ˆ
12
k
k
22k
2 11ˆ
11
kn
kn
k
nk
n1
1
ˆ1
1
22
k
2
kn
kn
11
:ˆ:1
1
k
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 8)
99% Confidence95% Confidence
kn
11
1
100 200 300 400 500Number of Monte Carlo Trials, k
kn
11
1
ˆ/,LimitLower
ˆ/,LimitUpper
256k
576.2n 96.1n
440k
95% Confidence
99% Confidence
0.7
0.9
1.0
1.1
Estim
ated
Sta
ndar
d D
evia
tion
Conf
iden
ce In
terv
als M
ultip
liers
1.5
Typical Confidence Interval Multipliers for the Esstimated Standard Deviation of a Gaussian Random Variable (λ = 3)
576.2n 96.1n
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 9)
kn
11
1
Degree of Confidence = 95 % (nσ = 1.96)
k = 256 Monte Carlo trials performed
kn
11
1
KURTOSIS, λ5 10 15 20
0.8
1.0
1.1
1.2
1.3
1.4
Effect of Kurtosis on Confidence Interval Limits
Est
ima
ted S
tan
da
rd D
evia
tion
Co
nfi
den
ce I
nte
rvals
Mu
ltip
liers
143
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 10)
kn
kn kk 1ˆ
1
:&1ˆ
1
:
00
Monte-Carlo Procedure
Choose the Confidence Level φ and find the corresponding nσ
using the normal (Gaussian) distribution.
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
1
Run a few sample k0 > 20 and estimate λ according to2
2
1
2
0
1
4
0
0
0
0
0
0
ˆ1
ˆ1
:ˆ
k
iki
k
iki
k
mxk
mxk
0
010
1:ˆ
k
iik x
km
3 Compute and as function of k
4 Find k for which
2ˆ
2k
2
kˆ-0Prob n
5 Run k-k0 simulations
144
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue – 11)
Monte-Carlo Procedure
Choose the Confidence Level φ = 95% that gives the corresponding nσ=1.96.
nσ φ
1.000 0.6827
1.645 0.9000
1.960 0.9500
2.576 0.9900
1
The kurtosis λ = 32
3 Find k for which
2kˆ
22k
2 1ˆ-0Prob
kn
4 Run k>800 simulations
Example:Assume a Gaussian distribution λ = 3
95.02
96.1ˆ-0Prob
2kˆ
22k
2
k
Assume also that we require also that with probability φ = 95 % 22k
2 1.0ˆ-
1.02
96.1 k
800k
145
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 12)
Kurtosis of random variable xi
Kurtosis
Kurtosis (from the Greek word κυρτός, kyrtos or kurtos, meaning bulging) is a measure of the "peakedness" of the probability distribution of a real-valued random variable. Higher kurtosis means more of the variance is due to infrequent extreme deviations, as opposed to frequent modestly-sized deviations.
1905 Pearson defines Kurtosis, as a measure of departure from normality in a paper published in Biometrika. λ=3 for the normal distribution and the terms ‘leptokurtic’ (λ>3), mesokurtic (λ=3), platikurtic (λ<3) are introduced.
224 /: mxEmxE ii
22
4
:mxE
mxE
i
i
Karl Pearson (1857 –1936)
A leptokurtic distribution has a more acute "peak" around the mean (that is, a higher probability than a normally distributed variable of values near the mean) and "fat tails" (that is, a higher probability than a normally distributed variable of extreme values). A platykurtic distribution has a smaller "peak" around the mean (that is, a lower probability than a normally distributed variable of values near the mean) and "thin tails" (that is, a lower probability than a normally distributed variable of extreme values).
146Hyperbolic-Secant
25
x2
sech2
1
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 13)
Distribution GraphicalRepresentation
FunctionalRepresentation
Kurtosisλ
ExcessKurtosis
λ-3
Normal
2
2exp 2
2
x3 0
Laplace
b
x
b
exp
2
16 3
Uniformbxorxa
bxaab
0
1
1.8 -1.2
WignerRx
RxxRR
0
2 222 -1.02
147
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable (continue - 14)
Skewness of random variable xi
Skewness
2/32
3
:mxE
mxE
i
i
Karl Pearson
(1857 –1936)
Negative Skew Positive Skew
Negative skew: The left tail is longer; the mass of the distribution is concentrated on the right of the figure. The distribution is said to be left-skewed.
1
Positive skew: The right tail is longer; the mass of the distribution is concentrated on the left of the figure. The distribution is said to be right-skewed.
2
More data in the left tail thanit would be expected in a normal distribution
More data in the right tail thanit would be expected in a normal distribution
Karl Pearson suggested two simpler calculations as a measure of skewness:• (mean - mode) / standard deviation • 3 (mean - median) / standard deviation
148
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics)
We found that using k measurements the estimated mean and variance are given in batch form by:
k
iik x
kx
1
1:ˆ
A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, ,and the variance pk, by a Recursive Filter
kx̂
The k+1 measurement will give:
1
1
11 ˆ
1
1
1
1ˆ
kk
k
iik xxk
kx
kx
kkkk xxk
xx ˆ1
1ˆˆ 11
Therefore the Recursive Filter form for the k+1 measurement will be:
k
ikik xx
kp
1
2ˆ1
1:
1
1
211 ˆ
1 k
ikik xx
kp
149
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics) (continue – 1)
We found that using k+1 measurements the estimated variance is given in batch form by:
A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, ,and the variance pk, by a Recursive Filter
kx̂
kkkkk pk
kxx
kpp
1ˆ
1
1 211
2
12
12
1
0
11
21
1
1
2
1
1
2
11
1
211
ˆ1
111ˆ
1
1
ˆˆˆ1
2ˆˆ
1
1
ˆˆ
1ˆ
1
kkkkk
kk
k
ikikkkk
pk
k
iki
k
i
kkki
k
ikik
xxk
pk
xxkk
k
xxxxxxkk
xxxxk
k
xxxx
kxx
kp
k
kkkk xxk
xx ˆ1
1ˆˆ 11
150
SOLO Review of ProbabilityEstimation of the Mean and Variance of a Random Variable using a Recursive Filter (Unknown Statistics) (continue – 2)
A random variable, x, may take on any values in the range - ∞ to + ∞.Based on a sample of k values, xi, i = 1,2,…,k, we wish to estimate the sample mean, ,and the variance pk, by a Recursive Filter
kx̂
kkkkk pk
kxx
kpp
1ˆ
1
1 211
kkkk xxk
xx ˆ1
1ˆˆ 11
kkkk xxkxx ˆˆ1ˆ 11
kkkkk p
kxxkpp
1ˆˆ1 2
11
151
SOLO Review of Probability
Estimate the value of a constant x, given discrete measurements of x corrupted by anuncorrelated gaussian noise sequence with zero mean and variance r0.The scalar equations describing this situation are:
kk xx 1
kkk vxz
System
Measurement 0,0~ rNvk
The Discrete Kalman Filter is given by:
kk xx ˆˆ 1
111
01111 ˆˆˆ
1
kk
K
kkkk xzrppxx
k
0
1 kkk
I
kk wxx
kk
I
kk vxHz
kT
I
Tkk
I
kT
kkkkk pQpxxxxEp 0
11111 ˆˆ
0
011
1
0111111
11111
1
1
ˆˆ
rp
prpHrHpHHpp
xxxxEp
k
kpp
k
I
k
K
T
I
kk
I
kT
I
kkk
Tkkkkk
kk
k
General Form
with Known Statistics Moments Using a Discrete Recursive FilterEstimation of the Mean and Variance of a Random Variable
152
SOLO Review of Probability
Estimate the value of a constant x, given discrete measurements of x corrupted by anuncorrelated gaussian noise sequence with zero mean and variance r0.
We found that the Discrete Kalman Filter is given by:
kkkkk xzKxx ˆˆˆ 111
0
0
01
1r
pp
rp
prp
k
k
k
kk
0
0
01
1rp
pp
0
1
12
1r
pp
p
krpp
pk
0
0
0
1
0
1 rp
pK
k
kk
0
1 rp
pK
k
kk
kkkk xz
krp
rp
xx ˆ11
ˆˆ 1
0
0
0
0
1
0k1k
0
0
0
21
rp
p
111
1
0
0
0
0
0
0
0
0
0
0
0
krp
rp
rk
rpp
krpp
with Known Statistics Moments Using a Discrete Recursive Filter (continue – 1)Estimation of the Mean and Variance of a Random Variable
153
SOLO Review of Probability
Estimate the value of a constant x, given continuous measurements of x corrupted by anuncorrelated gaussian noise sequence with zero mean and variance r0.The scalar equations describing this situation are:
0x
vxz
System
Measurement rNv ,0~
The Continuous Kalman Filter is given by:
00ˆ&ˆˆˆ
1
1
0
xtxtzrHtptxAtx
kK
I
00
wxAx
vxHzI
TtxtxtxtxEtp ˆˆ:
12
1
1
000
rtptptHrtHtptGQtGtAtptptAtp TT
I
TT
General Form
with Known Statistics Moments Using a Continuous Recursive FilterEstimation of the Mean and Variance of a Random Variable
012 0& ptprtptp or:
tp
p
dtrp
pd
02
0
1 t
rp
ptp
0
0
1
t
rprp
rtpK0
0
1
1 txz
tr
pr
p
tx ˆ1
ˆ0
0
154
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators
• First attempts to generate “random numbers”:- Draw balls out of a stirred urn- Roll dice
• 1927: L.H.C. Tippett published a table of 40,000 digits taken “at random” from census reports.
• 1939: M.G. Kendall and B. Babington-Smith create a mechanical machine to generate random numbers. They published a table of 100,000 digits.
• 1946: J. Von Neumann proposed the “middle square method”.
• 1948: D.H. Lehmer introduced the “linear congruential method”.
• 1955: RAND Corporation published a table of 1,000,000 random digits obtainedfrom electronic noise.
• 1965: M.D. MacLaren and G. Marsaglia proposed to combine two congruentialgenerators.
• 1989: R.S. Wikramaratna proposed the additive congruential method.
Routine RANDU (IBM Corp)“We guarantee that each number is random individually, but we don’t guaranteethat more than one of them is random”
155
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators
On a computer the “random numbers” are not random at all – they are strictlydeterministic and reproducible, but they look like a stream of random numbers.For this reason the computer programs are called “Pseudo-Random Number Generators”.
Essential Properties of a Pseudo-Random Number Generator
Repeatability – the same sequence should be produced with the same initial values (or seeds)
Randomness – should produce independent uniformly distributed random variables that passes all statistical tests for randomness.
Long Period – a pseudo-random number sequence uses finite precision arithmetic, so the sequence must repeat itself with a finite period. This should be much longer than the amount of random numbers needed for simulation.
Insensitive to seeds – period and randomness properties should not depend on the initial seeds.
156
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators
Essential Properties of a Pseudo-Random Number Generator (continue -1)
Portability – should give the same results on different computers
Efficiency – should be fast (small number of floating point operations) and not use much memory.
Disjoint subsequences – different seeds should produce long independent (disjoint) subsequences so that there are no correlations between simulations with different initial seeds.
Homogeneity – sequences of all bits should be random.
157
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators
A Random Number represents the value of a random variable uniform distributed on (0,1). Pseudo-Random Numbers constitute a sequence of values, which although are deterministically generated, have all the appearances of being independent uniform distributed on (0,1).One approach
1. Define x0 = integer initial condition or seed
2. Using integers a and m recursively compute
mxax nn modulo1 mxIntegerxkmaxmkxa nnn ,,,1
Therefore xn takes the values 0,1,…,m-1 and the quantity un=xn/m , called a pseudo-randomnumber is an approximation to the value of uniform (0,1) random variable.
In general the integers a and m should be chose to satisfy three criteria:
1. For any initial seed, the resultant sequence has the “appearance” of being a sequence of independent (0,1) random variables.
2. For any initial seed, the number of variables that can be generated before repetitionbegins is large.
3. The values can be computed efficiently on a digital computer.
Multiplicative congruential method
158
SOLO Review of Probability
Generating Discrete Random Variables
Pseudo-Random Number Generators (continue – 1)
A gudeline is to choose m to be a large prime number compared to the computer word size.
Examples:
32 bits word computer: (some IBM systems)807,16712 531 am
125,35312 535 am36 bits word computer:
Another generator of pseudo-random numbers uses recursions of the type:
mcxax nn modulo1 mxIntegerxkmcaxmkcxa nnn ,,,,1
Mixed congruential method
32 bits word computer: (VAX)069,69232 am
32 bits word computer: (transputers)525,664,1232 am
48 bits word computer: (UNIX, RAND 48 routine)161648 6652 BcDDEECEam
48 bits word computer: (CDC vector machine)052 1547 cam48 bits word computer: (Cray vector machine)01757228752 16
48 cBEAam
64 bits word computer: (Numerical Algorithms Group)0132 1359 cam
Return to Table of Content
159
SOLO Review of Probability
Generating Discrete Random Variables
Histograms
Return to Table of Content
A histogram is a graphical display of tabulated frequencies, shown as bars. It shows what proportion of cases fall into each of several categories: it is a form of data binning. The categories are usually specified as non-overlapping intervals of some variable. The categories (bars) must be adjacent. The intervals (or bands, or bins) are generally of the same size.
Histograms are used to plot density of data, and often for density estimation: estimating the probability density function of the underlying variable. The total area of a histogram always equals 1. If the length of the intervals on the x-axis are all 1, then a histogram is identical to a relative frequency plot.
A cumulative histogram is a mapping that counts the cumulative number of observations in all of the bins up to the specified bin. That is, the cumulative histogram Mi of a histogram mi is defined as:
An ordinary and a cumulative histogram of the same data. The data shown is a random sample of 10,000 points from a normal distribution with a mean of 0 and a standard deviation of 1.
Mathematical Definition
k
iimn
1
In a more general mathematical sense, a histogram is a mapping mi that counts the number of observations that fall into various disjoint categories (known as bins), whereas the graph of a histogram is merely one way to represent a histogram. Thus, if we let n be the total number of observations and k be the total number of bins, the histogram mi meets the following conditions:
i
jji mM
1
160
SOLO Review of Probability
Generating Discrete Random Variables
The Inverse Transform Method
Suppose we want to generate a discrete random variable X having probability density function:
1,1,0)( j
jjj pjxxpxp
xp
x
0x6x1x 2x
3x 4x 5x
To accomplish this, let generate a random number U that is uniformly distributedover (0,1) and set:
j
ii
j
iij pUpifx
ppUpifx
pUifx
X
1
1
1
1001
00
j
j
ii
j
iij ppUpPxXP
1
1
1
)(
Since , for any a and b such that 0 < a < b < 1, and U is uniformly distributed P (a ≤ U < b) = b-a, we have:
and so X has the desired distribution.
SOLO Review of Probability
Generating Discrete Random Variables
The Inverse Transform Method (continue – 1)
Suppose we want to generate a discrete random variable X having probability density function:
Generate a random number(uniform distributed on (0,1) )
PxXpPi 000
U
ixX
1:
:1
ii
P
pP i
UYes
No
Stop
i ≥ N
1,1,0)( j
jjj pjxxpxp
N/1x
0x6x1x 2x
3x 4x 5x
x
0x6x1x 2x
3x 4x 5x
Histogram
Draw X, N times, from p (x)
Histogram of theResults
SOLO Review of Probability
Generating Discrete Random Variables
The Inverse Transform Method (continue – 2)
Generating a Poisson Random Variable:
Generate a random number(uniform distributed on (0,1) )
PXePi 00
U
iX
1:
:
1/:
ii
P
iPP
UYes
No
Stop
i ≥ N
1,1,0!
)(
ii
i
i pii
eiXPp
1
!
!1
1
1
ii
e
ie
p
pi
i
i
i
0 5 10 15 200.0
0.1
0.2
λ= 4
1/N
X
0 5 10 15 200.0
0.1
0.2
λ= 4
X
Draw X , N times, from Poisson Distribution
Histogram of the Results
SOLO Review of Probability
Generating Discrete Random Variables
The Inverse Transform Method (continue – 3)
Generating Binominal Random Variable:
Generate a random number(uniform distributed on (0,1) )
PXpPi n 010
U
iX
1:
:
1/1/:
ii
P
PppiinP
UYes
No
Stop
i ≥ N
1,1,01!!
!)(
ii
inii pipp
ini
niXPp
p
p
i
in
ppini
n
ppini
n
p
pini
ini
i
i
111!!
!
1!1!1
! 11
1
Return to Table of Content
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 k
nkP ,
Histogram of the Results
164
SOLO Review of Probability
Generating Discrete Random Variables
The Accaptance-Rejection Technique
Suppose we have an efficient method for simulating a random variable having aprobability density function { qj, j ≥0 }. We want to use this to obtain a randomvariable that has the probability density function { pj, j ≥0 }.
Let c be a constant such that: 0.. jj
j qtsjcq
p
If such a c exists, it must satisfy: cqcpj
jj
j 1
11
Rejection Method
Step 1: Simulate the value of Y, having probability density function qj.
Step 2: Generate a random number U (that is uniformly distributedover (0,1) ).Step 3: If U < pY/c qY, set X = Y and stop. Otherwise return to Step 1.
Generate a random number(uniform distributed on (0,1) )
U
Generate Y withprobability density function qj
YY qcpU / YX No Yes
Start
x
xp
xq
xx
u
165
SOLO Review of Probability
Generating Discrete Random Variables
The Acceptance-Rejection Technique (continue – 1)
Generate a random number(uniform distributed on (0,1) )
U
Generate Y withprobability density function qj
YY qcpU / YX No Yes
Start
Theorem
The random variable X obtained by the rejection method has probability densityfunction P { X=i } = pi.Proof
Acceptance
,
Acceptance
Acceptance,Acceptance
MethodAcceptance
MethodAcceptance
P
qc
pUiYP
P
iYPiYPiXP i
i
Bayes
AcceptanceAcceptanceAcceptance
(0,1) ddistributeuniformlyU
ceindependenby
Pc
p
P
qc
pq
P
qc
pUPiYP
ii
ii
i
i
qi
Summing over all i, yields
Acceptance
1
1
Pc
piXP i
i
i
1Acceptance Pc
ipiXP
11
Acceptance c
P
q.e.d.
166
SOLO Review of Probability
Generating Discrete Random Variables
The Acceptance-Rejection Technique (continue – 2)
Generate a random number(uniform distributed on (0,1) )
U
Generate Y withprobability density function qj
YY qcpU / YX No Yes
Start
Example
Generate a truncated Gaussian using theAccept-Reject method. Consider the case with
otherwise
xexp
x
0
4,42/2/2
Consider the Uniform proposal function
otherwise
xxq
0
4,48/1
In Figure we can see the results of theAccept-Reject method using N=10,000 samples.
Return to Table of Content
167
SOLO Review of ProbabilityGenerating Continuous Random Variables
The Inverse Transform Algorithm
Let U be a uniform (0,1) random variable. For any continuous distribution function F the random variable X defined by
UFX 1has distribution F. [ F-1(u) is defined to be that value of x such that F (x) = u ]
xF
x
0
0.1
Proof
Let Px(x) denote the Probability Distribution Function X=F-1(U)
xUFPxXPxPx 1
Since F is a distribution function, it means that F (x) is a monotonic increasing function of x and so the inequality “a ≤ b” is equivalent to the inequality“F (a) ≤ F (b)”, therefore
xFxFUP
xFUFFPxPuniformU
xF
UUFF
x
1,0
10
1
1
Generate a random number(uniform distributed on (0,1) )
UFX 1
U
Return to Table of Content
168
SOLO Review of Probability
Generating Continuous Random Variables
The Accaptance-Rejection Technique
Suppose we have an efficient method for simulating a random variable having aprobability density function g (x). We want to use this to obtain a randomvariable that has the probability density function f (x).
Let c be a constant such that: ycyg
yf
If such a c exists, it must satisfy: cdyygcdyyf 1
11
Rejection Method
Step 1: Simulate the value of Y, having probability density function g (Y).
Step 2: Generate a random number U (that is uniformly distributedover (0,1) ).Step 3: If U < f (Y)/c g (Y), set X = Y and stop. Otherwise return to Step 1.
Generate a random number(uniform distributed on (0,1) )
U
Generate Y withprobability density function qj
Ygc
YfU YX No Yes
Start
x
xp
xq
xx
u
169
SOLO Review of Probability
Generating Continuous Random Variables
The Acceptance-Rejection Technique (continue – 1)
Generate a random number(uniform distributed on (0,1) )
U
Generate Y withprobability density function qj
ygc
yfU YX No Yes
Start
Theorem
The random variable X obtained by the rejection method has probability densityfunction P { Y=y } = f (y).Proof
Acceptance
,
Acceptance
Acceptance,Acceptance
MethodAcceptance
MethodAcceptance
P
ygcyf
UyP
P
yPYPyYP
Bayes
AcceptanceAcceptanceAcceptance
(0,1) ddistributeuniformlyU
ceindependenby
Pc
yf
P
ygcyf
yg
P
ygcyf
UPyP
yg
Summing over all i, yields
Acceptance
1
1
Pc
dyyfdyyYP
1Acceptance Pc
yfyYP
11
Acceptance c
P
q.e.d.
Return to Table of Content
170
SOLO
The Bootstrap
• Popularized by Bradley Efron (1979)
• The Bootstrap is a name generically applied to statistical resampling schemes that allow uncertainty in the data to be assesed from the data themselves, in other words
“pulling yourself up by your bootstraps”
The disadvantage of bootstrapping is that while (under some conditions) it is asymptotically consistent, it does not provide general finite-sample guarantees, and has a tendency to be overly optimistic.The apparent simplicity may conceal the fact that important assumptions are being made when undertaking the bootstrap analysis (e.g. independence of samples) where these would be more formally stated in other approaches.
The advantage of bootstrapping over analytical methods is its great simplicity - it is straightforward to apply the bootstrap to derive estimates of standard errors and confidence intervals for complex estimators of complex parameters of the distribution, such as percentile points, proportions, odds ratio, and correlation coefficients.
Generating Discrete Random Variables Bradley Efron
1938Stanford U.
Review of Probability
171
SOLO
The Bootstrap (continue -1)
• Given n observation zi i=1,…,n and a calculated statistics S, what is the uncertainty in S?
• The Procedure:
Generating Discrete Random Variables
- Draw m values z’i i=1,…,m from the original data with replacement
- Calculate the statistic S’ from the “bootstrapped” sample
- Repeat L times to build a distribution of uncertainty in S.
Review of Probability
Return to Table of Content
172
SOLO Review of Probability
Importance Sampling (IS)
Let Y = (Y1,…,Ym) a vector of random variables having a joint probability densityfunction p (y1,…,ym), and suppose that we are interested in estimating
mmmmp dydyyypyygYYgE 1111 ,,,,,, Suppose that a direct generation of the random vector Y so as to compute g (Y) is inefficient possible because (a) is difficult to generate the random vector Y, or
(b) the variance of g (Y) is large, or
(c) both of the above
Suppose that W=(W1,…,Wm) is another random vector, which takes values in thesame domain as Y, and has a joint density function q (w1,…,wm) that can be easily generated. The estimation θ can be expressed as:
WgWq
WpEdwdwwwq
wwq
wwpwwgYYgE qmm
m
mmmp
111
111 ,,
,,
,,,,,,
Therefore, we can estimate θ by generating values of random vector W, and thenusing as the estimator the resulting average of the values g (W) p (W)/ q (W).
Generating Discrete Random Variables
173
SOLO Review of Probability
Importance Sampling (IS) (continue – 1)
N
i
w
i
iiqp
i
xq
xpx
Nx
xq
xpEdxxq
xq
xpxxE
1
1
In Figure the Histogram using the Importance Weight wi is presented together with the true PDF
Generating Discrete Random Variables
Example: Importance Sample for a Bi-Modal Distribution
Consider the following distribution:
2/1,3:2
11,0:
2
1xxxp NN
We want to calculate the mean value (g (x) = x)using Importance Sampling.
Use: 5,5& Uxqxxg
For i=1,…,N, sample (draw) xi using q (x)We obtain:
i
ii xq
xpw : Importance Weight
For N=10,000 samples we obtain Ep [x]=1.4915 instead of 1.5.
Nixqx ii ,,1~
Return to Table of Content
174
SOLO
Metropolis Algorithm • This method of generation of an arbitrary probability distribution was invented by Metropolis, Rosenbluth and Teller (supposedly at a Los Alamos dinner party) and published June 1953.
Generating Discrete Random Variables
Review of Probability
Procedure• Set up a Markov Chain that has as a unique stationary solution the required π (x) Probability Distribution Function (PDF)
• Run the chain until stationary.
• All subsequent samples are from stationary distribution π (x) as required.
Metropolis, N., Rosenbluth, A., Rosenbluth, M., Teller, A., Teller, E.,
“ Equations of state calculations by fast computing machine”,
Journal of Chemical Physics, 1953, Vol. 21(6), pp.1087-1092
Nicholas Constantine Metropolis ( 1915 – 1999)
This is also called Markov Chain Monte Carlo (MCMC) method.
X3 X20.30.3
0.10.2
X1
0.60.50.3
0.6
0.1
175
SOLO
Metropolis Algorithm (continue – 1)
Generating Discrete Random Variables
Review of Probability
Nicholas Constantine Metropolis ( 1915 – 1999)
Proof of the Procedure
Pr (X,t) - the probability of being in the state X at time t.Pr (X→Y)=Pr (Y|X) - the probability, per unit time, of transition probability, of going
from state X to state Y.
Y
tXXYtYYXtXtX ,Pr|Pr,Pr|Pr,Pr1,Pr
At large t, once the arbitrary initial state is “forgotten”, we want Pr (X,t) → Pr (X).
Clearly a sufficient (but not necessary) condition for an equilibrium (time independent)probability distribution is the so called
tYYXtXXY ,Pr|Pr,Pr|Pr Detailed Balance Condition:
tA,Pr
APr
t
This method can be used for any probability distribution, but Metropolis used:
AEBEEE
EeAB
kTE
:01
0|Pr
/ Note: E (A) is equivalent toEnergy level of state A
1|PrPr YB
XYYX Sum of probabilities of all states reached from X.
X Y
XY |Pr
YX |Pr
XY
XX
|Pr
Pr
YY
YY
|Pr
Pr
176
SOLO
Metropolis Algorithm (continue – 2)
Generating Discrete Random Variables
Review of Probability
tYYXtXXY ,Pr|Pr,Pr|Pr Detailed Balance Condition:
tA,Pr
APr
t
Metropolis defined a symmetric Q (Y|X) = Q (X|Y) as a candidate generating density, for p (Y|X) such that: 1|
Y
XYQ
In general Q (Y|X) will not satisfy the “Detailed Balance” condition, for example:
tYYXQtXXYQ ,Pr|,Pr|
X Y
XY |Pr
YX |Pr
XY
XX
|Pr
Pr
YY
YY
|Pr
Pr
The process moves from X to Y too often and from Y to X too rarely.
A convenient way to correct this is to reduce the number of moves from X to Y byintroducing a probability 0 < Α (Y|X) ≤ 1. This is called the Acceptance Probability.
XYXYXYQXY |||Pr
177
SOLO
Metropolis Algorithm (continue – 3)
Generating Discrete Random Variables
Review of Probability
X Y
XY |Pr
YX |Pr
XY
XX
|Pr
Pr
YY
YY
|Pr
Pr
Let define the Acceptance Probability as:
XYXYXYQXY |||Pr
YX
YXYXXY
PrPr1
PrPrPr/Pr|
YXXY
YXYX
PrPrPr/Pr
PrPr1|
XYYXYXQYX |||Pr
If Pr (X) ≤ Pr (Y) then A (X|Y) = 1, A (Y|X) = Pr (X)/Pr (Y)
If Pr (X) >Pr (Y) then A (X|Y) = Pr (Y)/ Pr (X), A (Y|X) = 1
In both cases:
Y
X
YX
XY
YXYXQ
XYXYQ
YX
XY YXQXYQ
Pr
Pr
|
|
||
||
|Pr
|Pr ||
which is just the Detailed Balance condition.
178
SOLO
Metropolis Algorithm (continue – 2)
Generating Discrete Random Variables
Review of Probability
tAABtBBA ,Pr|Pr,Pr|Pr Detailed Balance Condition:
tA,Pr
APr
t
This method can be used for any probability distribution, but Metropolis used:
AEBEEE
EeAB
kTE
:01
0|Pr
/
TkE
TkBEAE
TkAEBE
e
AEBEEe
AEBEEe
tA
tB
BA
AB /
/
/
01
01
,Pr
,Pr
|Pr
|Pr
Therefore
A B
BA Pr
AB Pr
BA
AA
Pr1
Pr AB
BB
Pr1
Pr
179
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Discrete Random Variables
Review of Probability
• Set up a Markov Chain T (x’|x) that has as a unique stationary solution the required π (x’) Probability Distribution Function (PDF)
xdxxxTx |''
W. Keith Hastings improved the Metropolis algorithm by allowing a non-symmetrical Candidate Generating Density.Hastings, W., “Monte Carlo Simulation Methods Using Markov Chains and Their Applications”, Biometrica, 1970, No. 57, pp. 97 - 109
Here we give the development for Continuous Random Variables(for Discrete Random Variables the development is similar to that used forMetropolis Algorithm).
180
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
• The problem is to find the conditional transition probability distribution T (x’|x) of the Markov Chain, that has states converging, after a transition time, to π (x’).
xdxxxTx |''
To satisfy this requirement a “necessary condition” (but “not sufficient”) is:
Proof:
''|'''||'
1
xxdxxTxxdxxxTxdxxxT q.e.d.
Let define Q (x’|x) as a candidate generating density, for T (x’|x) such that:
1'|' xdxxQ
In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example:
''||' xxxTxxxT “Detailed Balance”or “ Reversibility Condition”or “Time Reversibility”
''||' xxxQxxxQ Loosely speaking, the process moves from x to x’ too often and from x’ to x too rarely.
181
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
In general Q (x’|x) will not satisfy the “Detailed Balance” condition, for example:
''||' xxxQxxxQ Loosely speaking, the process moves from x to x’ too often and from x’ to x too rarely.
A convenient way to correct this is to reduce the number of moves from x to x’ byintroducing a probability 0 < α (x’|x) ≤ 1. This is called the Acceptance Probability.
xxxxxxQxxT '|'|'|' If the move is not made the process again returns to x as a value from target distribution.
''||'|' xxxQxxxxxQ The Detailed Balance is
From which
1
|'
''||'
xxxQ
xxxQxx
xxxQ
xxxQxx
|'
''|,1min|'
In the same way (by interchangingx’ with x)
''|
|',1min'|
xxxQ
xxxQxx
182
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
Let prove that we satisfy the “Detailed Balance” condition:
xxxQ
xxxQxxxQxxxT
|'
''|,1min|'|'
''|
|',1min''|''|
xxxQ
xxxQxxxQxxxT
Suppose xxxQxxxQ |'''|
''|
|'
''||'|' xxxQ
xxxQ
xxxQxxxQxxxT
''|1''|''| xxxQxxxQxxxT Therefore ''||' xxxTxxxT q.e.d.
183
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
The Transition Kernel of the Metropolis Hastings Algorithm is:
'|'1|'|'|' xxxxxxxQxxT x
where δx is the Dirac-mass on {x}.
184
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
Therefore the M-H Algorithm will:
Use the previous generated x(t)
Draw a new value xnew from the candidate distribution Q (xnew| x(t)):
tnewnew xxQx |~
Compute the acceptance probability α (xnew| xj):
ttnew
newnewttnew
xxxQ
xxxQxx
|
|,1min|
Use the Acceptance/Rejection method with(uniform distribution between 0 to 1) and c = 1 (because U [0,1] > α (xnew|x(t)) )
Generate a random number(uniform distributed on (0,1) )
U
tnew xxU | newt xx 1No Yes tt xx 1
newt xx ,
1: tt
ttnew
newnewttnew
xxxQ
xxxQxx
|
|,1min|
otherwise
xUxq
0
1011,0
1
2
3
4
185
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
Generate a random number(uniform distributed on (0,1) )
U
Generate a candidate xnew from theprobability density function
Q(xnew|x(t))
No Yes tnew xxU |
t := t+1
tt xx 1 newt xx 1
ttnew
newnewttnew
xxxQ
xxxQxx
|
|,1min|
Startt=0, x0
186
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
1
1
p (x)
x
1
1
p (x)
x1
sample rejected
1
1
p (x)
x1
sample rejected
2
2
sample accepted
1
1
p (x)
x1
sample rejected
2
2
sample accepted
3
3sample
accepted
1
1
p (x)
x1
sample rejected
2
2
sample accepted
3
3sample
accepted
4
4
sample accepted
1
1
p (x)
x1
sample rejected
2
2
sample accepted
3
3sample
accepted
4
4
sample accepted
4sample rejected
1
1
p (x)
x1
sample rejected
2
2
sample accepted
3
3sample
accepted
4
4
sample accepted
4
sample rejected
5
sample accepted
5
Run This
Example
187
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
The convergence of the M-H Algorithm to the desired unique stationary solutionthe required π (x) occurs under the following conditions:
• Irreducibility: every state is eventually reachable from any start state; for all x, there exist a t such that π (x,t) > 0
• Aperiodicity: the chain doesn’t get caught in cycles.
The process is ergodic if it is both irreductible and aperiodic.
In M-H algorithm the draws are used as sample from the target density π (x) onlyafter the Markov Chain has passed the transient stage and the effect of the chosenstarting value x0 has become so small that it can be ignored. The rate of convergenceof the Markov Chain is a function of the chosen candidate generating density Q (x’,x)The efficiency of the algorithm depends on how close is the Acceptance Probability αto 1.
188
SOLO
Metropolis-Hastings (M-H) Algorithm
Generating Continuous Random Variables
Review of Probability
Example:
2
2
102.0exp7.0
2.0exp3.0
x
xx
Proposed CandidateDistribution:
100,| ttnew xxxQ N
Ramon SagarnaR,[email protected]
“Lecture 19Markov Chain Monte Carlo
Methods (MCMC)”
189
SOLO
Metropolis Algorithm
Generating Continuous Random Variables
Review of Probability
If we choose a symmetric candidate generating density: Q (x’|x) = Q (x|x’) for each x’,x then
x
xxx
'
,1min|'
',1min'|
x
xxx
We obtain the Metroplis Algorithm.
xExEEE
EexxQ
kTE
':01
0|'
/
Metropolis has chosen:
xExEEEe
ExxQ
kTE
':
0
01'|
/
190
SOLO
Metropolis Algorithm
Generating Continuous Random Variables
Review of Probability
Generate a random number(uniform distributed on (0,1) )
U
Generate a candidate xnew from theprobability density function
Q(xnew|x(t)) = Q(x(t)|xnew)
No Yes tnew xxU |
t := t+1
tt xx 1 new
t xx 1
tnewt
new x
xxx
,1min|
Startt=0, x0
Return to Table of Content
191
SOLO
Gibbs Sampling
Generating Discrete Random Variables
Review of Probability
Stuart GemanBrown University
Donald GemanJohns Hopkins
University
Josiah Willard Gibbs1839 - 1903
In mathematics and physics, Gibbs sampling is an algorithm to generate a sequence of samples from the joint probability distribution of two or more random variables. The purpose of such a sequence is to approximate the joint distribution, or to compute an integral (such as an expected value). Gibbs sampling is a special case of the Metropolis-Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm. The algorithm is named after the physicist J. W. Gibbs, in reference to an analogy between the sampling algorithm and statistical physics. The algorithm was devised by Stuart Geman and Donald Geman, some eight decades after the passing of Gibbs, and is also called the Gibbs sampler.
Geman, S. and Geman, D., “Stochastic Relaxation, Gibbs Distributions and the BayesRestoration of Images”, IEEE Transactions on Pattern Analysis and Machine Intelligence,1984, 6, pp. 721 - 741
192
SOLO
Gibbs Sampling (continue – 1)
Generating Discrete Random Variables
Review of Probability
Gibbs sampler uses what are called the full (or complete) conditional distributions:
jjj
jj
jkjjj
kjjjBayes
x
kjjjxdxx
xx
xdxxxxx
xxxxxxxxxx
j
,
,
,,,,,,
,,,,,,,,,,,|
111
111111
The Gibbs sampler sample one variable in turn
113
121
21
11
12
11
1
41
21
131
3
31
121
2
3211
1
,,,|~
,,,|~
,,,,|~
,,,|~
,,,|~
tk
ttt
tk
ttk
tk
tk
tttt
tk
ttt
tk
ttt
xxxxX
xxxxX
xxxxxX
xxxxX
xxxxX
(0)
(1)
(2)
(3)
X1
X2
Gibbs sampler always uses the most recent values.
Suppose that is k ( ≥2 ) dimensional. kxxxx ,,, 21
193
SOLO
Gibbs Sampling (continue – 2)
Generating Discrete Random Variables
Review of Probability
Gibbs Sampling is a special case of Metropolis – Hastings Algorithm.To see this let define the candidate generating density: Q (x’|x(t)) as
otherwise
xxifxxxxQ
tj
newj
tj
newjtnew
0
|Pr|
ttnew
newnewttnew
xxxQ
xxxQxx
Pr|
Pr|,1min|
tnew
jnewj
newtj
tj
xxx
xxx
Pr|Pr
Pr|Pr,1min
At any moment one variable is drawn: newjj
newj xxx |~ new
jx
where 111121 ,,,,,,:
tk
tj
tj
ttnewj xxxxxx
The acceptance probability is: tnew xx |
newj
newj
tk
tj
newj
tj
ttnew xxxxxxxxx
|,,,,,,, 111121 The will benewx
194
SOLO
Gibbs Sampling (continue – 3)
Generating Discrete Random Variables
Review of Probability
otherwise
xxifxxxxQ
tj
newj
tj
newjtnew
0
|Pr|
tnew
jnewj
newtj
tj
ttnew
newnewttnew
xxx
xxx
xxxQ
xxxQxx
Pr|Pr
Pr|Pr,1min
Pr|
Pr|,1min|
tt
jnew
jnewj
newnewj
tj
tjtnew
xxxx
xxxxxx
PrPr,Pr
PrPr,Pr,1min|
tj
tj
tj
Bayestj
tj x
xxxx
Pr
,Pr|Pr
newj
newj
newj
Bayesnew
jnewj x
xxxx
Pr
,Pr|Pr
1Pr
Pr,1min|
tj
newj xx
tj
newjtnew
x
xxx
tj
tj
ttj
tj
t xxxxxx ,PrPr, newj
newj
newnewj
newj
new xxxxxx ,PrPr,
The acceptance probability is: tnew xx |
Gibbs Sampling always accepts newjx
Gibbs Sampling is a special case of Metropolis – Hastings Algorithm.
candidate generating density: Q (x’|x(t))
195
SOLO
Gibbs Sampling (continue – 4)
Generating Discrete Random Variables
Review of Probability
Return to Table of Content
SOLO Review of Probability
Monte Carlo Integration
Monte Carlo Method can be used to numerically evaluate multidimensional integrals
xdxgdxdxxxgI mm 11 ,,
To use Monte Carlo we factorize xpxfxg
1&0 xdxpxp
in such a way that is interpreted as a Probability Density Function such that xp
We assume that we can draw NS samples from xp Si Nix ,,1,
Si Nixpx ,,1~
Using Monte Carlo we can approximate
SN
iS
i Nxxxp1
/
SS
S
S
N
i
i
S
N
i
i
S
N
iS
iN
xfN
xdxxxfN
xdNxxxfIxdxpxfI
11
1
11
/
SOLO Review of Probability
Monte Carlo Integration
we draw NS samples from xp Si Nix ,,1,
Si Nixpx ,,1~
S
S
N
i
i
SN xf
NIxdxpxfI
1
1
If the samples are independent, then INS is an unbiased estimate of I.
ix
IIsa
NN
SS
..
xdxpIxff22 :If the variance of is finite; i.e.: xf
The error of the MC estimate, e = INS – I, is of the order of O (NS
-1/2), meaning
that the rate of convergence of the estimate is independent of the dimension ofthe integrand. Return to Table of Content
According to the Law of Large Numbers INS will almost surely converge to I:
then the Central Limit Theorem holds and the estimation error converges indistribution to a Normal Distribution:
2,0~lim fNSN
IINS
S
N
198
Random ProcessesSOLO
Random Variable: A variable x determined by the outcome Ω of a random experiment.
xx
Random Process or Stochastic Process:
A function of time x determined by the outcome Ω of a random experiment.
,txtx
1
2
3
4
x
t
This is a family or an ensemble of functions of time, in general different for each outcome Ω.
Mean or Ensemble Average of the Random Process:
dptxEtx tx,:
Autocorrelation of the Random Process:
ddptxtxEttR txtx 21 ,2121 ,,:,
Autocovariance of the Random Process: 221121 ,,:, txtxtxtxEttC
2121212121 ,,,, txtxttRtxtxtxtxEttC
Table of Content
199
SOLO
Stationarity of a Random Process
1. Wide Sense Stationarity of a Random Process: • Mean Average of the Random Process is time invariant:
.,: constxdptxEtx tx
• Autocorrelation of the Random Process is of the form:
RttRttRtt 21:
2121 ,
12,2121 ,,,:,21
ttRddptxtxEttR txtx
since:
We have: RR
Power Spectrum or Power Spectral Density of a Stationary Random Process:
djRS exp:
2. Strict Sense Stationarity of a Random Process: All probability density functions are time invariant: .,, constptp xtx
Ergodicity:
,,2
1:, lim txExdttx
Ttx
ErgodicityT
TT
A Stationary Random Process for which Time Average = Assembly Average
Random Processes
200
SOLO
Time Autocorrelation:
Ergodicity:
T
TT
dttxtxT
txtxR ,,2
1:,, lim
For a Ergodic Random Process define
Finite Signal Energy Assumption:
T
TT
dttxT
txR ,2
1,0 22 lim
Define:
otherwise
TtTtxtxT 0
,:,
dttxtxT
R TTT ,,2
1:
T
T
TT
T
T
TT
T
T
TT
T
TT
T
T
TT
T
TTT
dttxtxT
dttxtxT
dttxtxT
dttxtxT
dttxtxT
dttxtxT
R
,,2
1,,
2
1,,
2
1
,,2
1,,
2
1,,
2
1
00
Let compute:
T
T
TTT
T
T
TTT
TT
dttxtxT
dttxtxT
R
,,2
1,,
2
1limlimlim
RdttxtxT
T
T
TT
T
,,2
1lim
0,,2
1,,
2
1 suplimlim
txtxT
dttxtxT TT
TtTT
T
T
TTT
therefore: RRTT
lim
,,2
1:, lim txExdttx
Ttx
ErgodicityT
TT
T T
txT
t
Random Processes
201
SOLO
Ergodicity (continue):
TTTT
TT
TT
TTT
XXT
dvvjvxdttjtxT
dtjtxdttjtxT
ddttjtxtjtxT
dttxtxdjT
djR
*
2
1exp,exp,
2
1
exp,exp,2
1
exp,exp,2
1
,,exp2
1exp
Let compute:
where: and * means complex-conjugate.
dvvjvxX TT exp,:
Define:
ddttxtxE
TjdjRE
T
XXES
T
T
TT
T
T
T
TT
T
,,2
1expexp
2: limlimlim
*
Since the Random Process is Ergodic we can use the Wide Stationarity Assumption:
RtxtxE TT ,,
djR
ddtT
jRddtRT
jT
XXES
T
TT
T
TT
TT
T
exp
2
1exp
2
1exp
2:
1
*
limlimlim
Random Processes
202
SOLO
Ergodicity (continue):
We obtained the Wiener-Khinchine Theorem (Wiener 1930):
dtjR
T
XXES TT
T
exp2
:*
lim
Norbert Wiener1894 - 1964
Alexander YakovlevichKhinchine1894 - 1959
The Power Spectrum or Power Spectral Density of a Stationary Random Process S (ω) is the Fourier Transform of the Autocorrelation Function R (τ).
Random Processes
203
SOLO
White Noise
A (not necessary stationary) Random Process whose Autocorrelation is zero for any two different times is called white noise in the wide sense.
211
2
2121 ,,, ttttxtxEttR
1
2 t - instantaneous variance
Wide Sense Whiteness
Strict Sense Whiteness
A (not necessary stationary) Random Process in which the outcome for any two different times is independent is called white noise in the strict sense.
2121, ,,21
ttttp txtx
A Stationary White Noise Random has the Autocorrelation:
2,, txtxER
Note
In general whiteness requires Strict Sense Whiteness. In practice we have only moments (typically up to second order) and thus only Wide Sense Whiteness.
Random Processes
204
SOLO
White Noise
A Stationary White Noise Random has the Autocorrelation:
2,, txtxER
The Power Spectral Density is given by performing the Fourier Transform of the Autocorrelation:
22 expexp
dtjdtjRS
S
2
We can see that the Power Spectrum Density contains all frequencies at the same amplitude. This is the reason that is called White Noise.
The Power of the Noise is defined as: 20
SdtRP
Random Processes
205
SOLO
Table of Content
Markov Processes
A Markov Process is defined by:
Andrei AndreevichMarkov
1856 - 1922
111 ,|,,,|, tttxtxptxtxp
i.e. the Random Process, the past up to any time t1 is fully defined by the process at t1.
Examples of Markov Processes:
1. Continuous Dynamic System wuxthtz
vuxtftx
,,,
,,,
2. Discrete Dynamic System
kkkkk
kkkkk
wuxthtz
vuxtftx
,,,
,,,
1
1
x - state space vector (n x 1)u - input vector (m x 1)v - white input noise vector (n x 1)
- measurement vector (p x 1)z
- white measurement noise vector (p x 1)w
Random Processes
206
SOLO
Table of Content
Markov Processes
Examples of Markov Processes:
3. Continuous Linear Dynamic System txCtz
tvtxAtx
Using the Fourier Transform we obtain:
VHVAIjCZH
1
Using the Inverse Fourier Transform we obtain:
dvthtz ,
dthvddtjHv
dtjdjvHdtjVHtz
th
egrattionoforderchange
V
exp2
1
expexp2
1exp
2
1
int
htv (t) z (t)
Random Processes
207
SOLO
Table of Content
Markov Processes
Examples of Markov Processes:
3. Continuous Linear Dynamic System txCtz
tvtxAtx
The Autocorrelation of the output is:
dvthtz ,
htv (t) z (t)
dhhdthth
ddththddvvEthth
dvthdvthEtztzER
v
t
v
v
zz
2
111
2
212121
2
212111
222111
1
2
vvv tvtvER
22 expexp vvvvvv djdjRS
2*2
22
2
expexp
expexpexpexp
expexp
xx
xx
x
RR
zzzz
HHdjhdjh
djdjhhdjdjhh
djdhhdjRSzzzz
vvzz SHHS *
Random Processes
208
SOLO
Table of Content
Markov Processes
Examples of Markov Processes:
4. Continuous Linear Dynamic System
dvthtz ,
2
vvv tvtvER 2
vvvS
v (t) z (t)
xj
KH
/1
xj
KH
/1
The Power Spectral Density of the output is:
2
22
*
/1 x
v
vvzz
KSHHS
2
22
/1 x
vvzz
KS
x
22
vvK
2/22
vvK
The Autocorrelation of the output is:
dsss
K
jdj
K
djSR
x
vjs
x
v
zzzz
exp/12
1exp
/12
1
exp2
1
2
22
2
22
j
x
R
0/1 2
22
R
s
x
vv dses
K
0
/1 2
22
R
s
x
vv dses
K
x
js
00
xeK
R vvxzz
2
22
2/22
vvxK
xvxK
exp2
22
0exp
Reexp2
1
0exp
Reexp2
1
222
22
222
222
22
222
x
vx
x
vx
x
vx
x
vx
s
sKsdss
s
K
j
s
sKsdss
s
K
j
x
x
Random Processes
209
SOLO
Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with Time Variable Coefficients
21121&
:&:
tttQteteE
twEtwtetxEtxteT
ww
wx
w (t) x (t)
tF
tG x (t)
twtGtxtFtxtxtd
d
tetGtetFte wxx
t
t
dwGttxtttx0
,, 00
The solution of the Linear System is:
where:
3132210000 ,,,&,&,, ttttttItttttFtttd
d
t
t
wxx deGttettte0
,, 00
twEtGtxEtFtxE
Random Processes
210
SOLO
Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 1)
21121&
:&:
tttQteteE
twEtwtetxEtxteT
ww
wx
w (t) x (t)
tF
tG x (t)
t
t
dwGttxtttx0
,, 00 t
t
wxx deGttettte0
,, 00
teteEtxVartV T
xxx : teteEtxVartV T
xxx :
teteEttRteteEttR T
xxx
T
xxx :,&:,
t
t
TTT
xxx dtGQGttttVttttRtV0
,,,,, 000
t
t
TTT
xxx dtGQGttttVttttRtV0
,,,,, 000
Random Processes
211
SOLO Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 2)
21121&
:&:
tttQteteE
twEtwtetxEtxteT
ww
wx
w (t) x (t)
tF
tG x (t)
t
t
dwGttxtttx0
,, 00 t
t
wxx deGttettte0
,, 00
teteEtxVartV T
xxx : teteEtxVartV T
xxx :
teteEttRteteEttR T
xxx
T
xxx :,&:,
0,,,,
0,,,,
,
0
0
000
000
t
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
0,
0,,,
0,,,
0,
,
tttV
dtGQGttttV
or
dtGQGttVtt
tVtt
ttR
T
x
t
t
TTT
x
t
t
TT
x
x
x
Random Processes
212
SOLO Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 3)
21121&
:&:
tttQteteE
twEtwtetxEtxteT
ww
wx
w (t) x (t)
tF
tG x (t)
t
t
wxx deGttettte0
,, 00
0,,,,
0,,,,
,
0
0
000
000
t
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
t
t
TTT
xxx dtGQGttttVttttRtV0
,,,,, 000
teteEtxVartV T
xxx :
teteEttRteteEttR T
xxx
T
xxx :,&:,
tGtQtGdtFtGQGttFtttVtt
dtGQGttFtttVtttFtVtd
d
T
t
t
TTTTT
x
t
t
TTT
xx
0
0
,,,,
,,,,
000
000
tGtQtGtFtVtVtFtVtd
d TT
xxx
tGtQtGtFtVtVtFtVtd
d TT
xxx
Random Processes
213
SOLO Markov Processes
Examples of Markov Processes:
5. Continuous Linear Dynamic System with Time Variable Coefficients (continue – 4)
21121&
:&:
tttQteteE
twEtwtetxEtxteT
ww
wx
w (t) x (t)
tF
tG x (t)
t
t
wxx deGttettte0
,, 00 teteEtxVartV T
xxx :
teteEttRteteEttR T
xxx
T
xxx :,&:,
0,,,,
0,,,,
,
0
0
000
000
t
t
TTT
x
t
t
TTT
x
x
dtGQGttttVtt
dtGQGttttVtt
ttR
0,,,
0,,,,
tttGtQtGtFttRttRtF
tGtQtGtttFttRttRtFttR
td
dTTT
xx
TT
xx
x
0,,,
0,,,,
tGtQtGtttFttRttRtF
tttGtQtGtFttRttRtFttR
td
dTT
xx
TTT
xx
x
Random Processes
214
SOLO Markov Processes
Examples of Markov Processes:
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
w (t) x (t)
tF
tG x (t)
twtGtxtFtxtxtd
dGiven a Continuous Linear System:
we want to decide if can be approximated by a white noise. tw
Let start with a first order linear system with white noise input : tw '
twT
twT
tw '11
w (t)w' (t) Ts
sH
1
1
Ttt
w ett /
00,
tQwEwtwEtwE ''''
ttRtwEtwtwEtwE ww ,
ttRtwEtwtwEtwE ww ,
ttRtVwEwtwEtwE wwww ,
tGtQtGtFtVtVtFtVtd
d TT
xxx QT
tVT
tVtd
dwwww 2
12
00 ,1
, ttT
tttd
dww
where
Random Processes
215
SOLO Markov Processes
Examples of Markov Processes:
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
(continue – 1)
QT
tVT
tVtd
dwwww 2
12
T
t
T
t
wwww eT
QeVtV
22
12
0 t2/T
T
t
ww eV2
0
T
t
eT
Q 2
12 T
QV statesteadyww 2
tVww
0,
0,,
tVetttV
tVetVttttR
wwTT
www
wwT
www
ww
0,
0,,
tVetVtt
tVetttVttR
wwT
www
wwTT
www
ww
For T
QVtVtV
T statesteadywwwwww 25
TTstatesteadywwwwwwww e
T
QeVVttRttR
T
2
,,5
w (t)w' (t) Ts
sH
1
1
Random Processes
216
SOLO Markov Processes
Examples of Markov Processes:
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
(continue – 2)
TTstatesteadywwwwwwww e
T
QeVVttRttR
T
2
,,5
T
ww eT
QV /
2
T
QV statesteadyww 2
T T
1 eV statesteadyww Qde
T
QdVArea T
ww
02
2
T is the correlation time of the noise w (t) and can be found from Vww (τ) by tacking the time corresponding to Vww steady-state /e.
One other way to find T is by tacking the double sides Laplace Transform L2 on τ of:
QdetQtQs s
ww
2'' L
sHQsHsT
Q
deeT
QVs sT
sswwww
2
/
2
1
2
L 2
2/1/1
QQww
T/12/1
Q
2/Q
T/12/1
T can be found by tacking ω1/2 of half of the power spectrum Q/2 and T=1/ ω1/2.
Random Processes
217
SOLO Markov Processes
Examples of Markov Processes:
2
2/1/1
QQww
T/12/1
Q
2/Q
T/12/1
Let return to the original system: twtGtxtFtxtxtd
d
w (t) x (t)
tF
tG x (t)
6. How to Decide if a Input Noise can be Approximated by a White or a Colored Noise
(continue – 3)
Compute the power spectrum ofand define Q and T.
jsww tw
then can be approximated by the white noise with tw tw '
tQwEwtwEtwE ''''
then can be approximated by a colored noise that can be obtained by passingthe predefined white noise through a filter
tw tw '
sTsH
1
1
If F of eigenvalue maximum
1F of constant time minimumT 51
If F of eigenvalue maximum
1F of constant time minimumT 52
Random Processes
218
SOLO Markov Processes
Examples of Markov Processes: 7. Digital Simulation of a Contimuos Process
Let start with a first order linear system with white noise input : tw '
twT
twT
tw '11
w (t)w' (t) Ts
sH
1
1
Ttt
w ett /
00, 00 ,
1, tt
Ttt
td
dww
t
t
TtTtt dwT
etwetw0
0 '1/
0
/
Let choose t = (k+1) ΔT and t0 = k ΔT
tQwEwtwEtwE ''''where
Tkw
Tk
Tk
TTkTT dwT
eTkweTkw
1'
1
/1/ '1
1
Random Processes
219
SOLO Markov Processes
Examples of Markov Processes: 7. Digital Simulation of a Contimuos Process (continue – 1)
Define: TTe /:
2/21/12
2
1
12
/12
2
1 1
122112
/1/1
1111
12
122
1
''''1
''''
11
21
21
T
Qe
T
Qe
T
T
QdQ
Te
ddwEwwEwET
ee
TkwETkwTkwETkwE
TTTk
Tk
TTk
Tk
Tk
TTk
Tk
Tk
Tk
Tk Q
TTkTTk
Tkw
Tk
Tk
TTkTT dwT
eTkweTkw
1'
1
/1/ '1
1
Define w’ (k) such that:
T
QkwEkwkwEkwE
2:''''
2
1
1
':'
kwkw
Therefore: kwTkwTkw '11 2
Random Processes
220
SOLOMarkov Chains
Random Processes
X3 X20.3
0.30.1
0.2
X1
0.60.5
0.3
0.6
0.1
Markov chain, named after Andrey Markov, is a stochastic process with the Markov property. Having the Markov property means that, given the present state, future states are independent of the past states. In other words, the description of the present state fully captures all the information that could influence the future evolution of the process. Being a stochastic process means that all state transitions are probabilistic. Andrey
AndreevichMarkov
1856 - 1922
At each step the system may change its state from the current state to another state (or remain in the same state) according to a probability distribution. The changes of state are called transitions, and the probabilities associated with various state-changes are called transition probabilities
Definition of Markov Chains
A Markov chain is a sequence of random variables X1, X2, X3, ... with the Markov property, namely that, given the present state, the future and past states are independent.
nnnnnn xXxXxXxXxX |Pr,,|Pr 1111
221
SOLOMarkov Chains
Random Processes
Properties of Markov Chains
Define the probability of going from state i to state j in m time steps as: iXjXp omm
ji |Pr
and the single step transition as:
iXjXp ji 01 |Pr
X3 X2
X1
1.011 p
6.021 p
5.012 p
3.032 p
3.023 p
3.031 p
6.013 p
1.033 p 2.022 p
For a time-homogeneous Markov Chain: iXjXp kkmm
ji |Pr
and:
iXjXkp kkji |Pr 1
so, the n-step transition satisfies the Chapman-Kolmogorov equation, that for any k such that 0 < k <n:
Sr
knjr
kri
nji ppp
222
SOLOMarkov Chains
Random Processes
Properties of Markov Chains (continue – 1)
The marginal distribution Pr (Xk = x) is the distribution over states at time k:
iXjXkp kkji |Pr 1
X3 X2
X1
1.011 p
6.021 p
5.012 p
3.032 p
3.023 p
3.031 p
6.013 p
1.033 p 2.022 p
In Matrix form it can be written as:
kN
kK
NNNN
N
N
kN X
X
X
ppp
ppp
ppp
X
X
X
2
1
21
22221
11211
1
2
1
PrPr
where N is the number of states of the Markov Chain.
1.03.03.0
3.02.06.0
6.05.01.0
KProperties of the Transition Matrix K:
10 np ji
11
N
jji np
1
2
For a time-homogeneous Markov Chain:
Srkjr
Srkkkk
rXkp
rXrXjXjX
Pr
Pr|PrPr 11
223
SOLOMarkov Chains
Random Processes
Properties of Markov Chains (continue – 2)
A state j is said to be accessible from a state i (written i → j) if a system started in state i has a non-zero probability chance of transitioning into state j at some point. Formally, state j is accessible from state i if there exists an integer n≥0 such that :
njion piXjX |Pr
Reducibility
Allowing n to be zero means that every state is defined to be accessible from itself. A state i is said to communicate with state j (written i ↔ j) if both i → j and j → i. A set of states C is a communicating class if every pair of states in C communicates with each other, and no state in C communicates with any state not in C. It can be shown that communication in this sense is an equivalence relation and thus that communicating classes are the equivalence classes of this relation. A communicating class is closed if the probability of leaving the class is zero, namely that if i is in C but j is not, then j is not accessible from i. Finally, a Markov chain is said to be irreducible if its state space is a single communicating class; in other words, if it is possible to get to any state from any state.
224
SOLOMarkov Chains
Random Processes
Properties of Markov Chains (continue – 3)
A state i has period k if any return to state i must occur in multiples of k time steps. Formally, the period of a state is defined as:
0|Pr: iXiXndivisorcommongreatestk on
Periodicity
Note that even though a state has period k, it may not be possible to reach the state in k steps. For example, suppose it is possible to return to the state in {6,8,10,12,...} time steps; then k would be 2, even though 2 does not appear in this list. If k = 1, then the state is said to be aperiodic; otherwise (k>1), the state is said to be periodic with period k. It can be shown that every state in a communicating class must have the same period.
225
SOLO Review of Probability
Existence Theorems
Existence Theorem 3
Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)having S (ω) as its power spectrum or R (τ) as its autocorrelation.
Proof of Existence Theorem 3
Define
fa
S
a
SfdSa
222 :&
1:
Since , according to Existence Theorem 1,
we can find a random variable ω with the even density function f (ω), andprobability density function
1&0
dff
dfP :
We now form the process , where is a random variableuniform distributed in the interval (-π,+π) and independent of ω.
tatx cos:
226
SOLO Review of Probability
Existence Theorems
Existence Theorem 3
Proof of Existence Theorem 3 (continue – 1)
Since is uniform distributed in the interval (-π,+π) and independent of ω,its spectrum is
0sinsincoscos00
,
EtEaEtEatxEtindependen
sin
2
1
2
1
2
1
j
ee
j
edeeES
jjjjj
or
sin
sincos EjEeE j
1 1
02022,
22
2
2sin2sin2
2cos2cos2
cos2
22cos2
cos2
coscos
EtE
aEtE
aE
a
tEa
Ea
ttEatxtxE
tindependen
2 2
Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)having S (ω) as its power spectrum or R (τ) as its autocorrelation.
227
SOLO Review of Probability
Existence Theorems
Existence Theorem 3
Proof of Existence Theorem 3 (continue – 2)
0txE
xRdfa
Ea
txtxE
cos2
cos2
22
tatx cos:We have
Because of those two properties x (t) is wide-sense stationary with a power spectrumgiven by:
dRdjRS x
RR
xx
xx
cossincos
dSdjSR x
SS
xx
xx
cos2
1sincos
2
1
Therefore faSx2
q.e.d.
Fourier InverseFourier
dfa
cos2
2
f (ω) definition
S
Given a function S (ω)= S (-ω) or, equivalently, a positive-defined function R (τ),(R (τ) = R (-τ), and R (0)=max R (τ), for all τ ), we can find a stochastic process x (t)having S (ω) as its power spectrum or R (τ) as its autocorrelation.
228
SOLO Permutation & Combinations
Permutations
Given n objects, that can be arranged in a row, how many different permutations (new order of the objects) are possible?
1 2 3 nn-1
To count the possible permutations , let start by moving only the first object {1}.
1 2 3 nn-11
Number of permutations
12 3 nn-12
12 3 nn-13
12 3 nn-1n
By moving only the first object {1}, we obtained n permutations.
229
SOLO Permutation & Combinations
Permutations (continue -1)
Since we obtained all the possible position of the first object, we will perform the same procedure with the second object no {2}, that will change position with all other objects, in each of the n permutations that we obtained before .
For example from the group 1 we obtain the following new permutations
Number of new permutations
Since this is true for all permutations (n-1 new permutations for each of the first n permutations) we obtain a total of n (n-1) permutations .
1 2 3 nn-11
1 23 nn-12
n-2 1 23 nn-1
1 23 nn-1n-1
230
SOLO Permutation & Combinations
Permutations (continue -2)
If we will perform the same procedure with the third object {3}, that will change position with all other objects, besides those with objects no {1} and {2} that we already obtained, in each of the n (n-1) permutations that we obtained before , we will obtain a total of n (n-1) (n-2) permutations.
We continue the procedure with the objects {4}, {5}, …, {n}, to obtain finally the total number of permutations of the n objects:
n (n-1) (n-2) (n-3)… 1 = n !
The gamma function Γ is defined as:
0
1 exp dttta a
Gamma Function Γ
If a = n is an integer then:
nn
dtttnttdtttn nn
dvu
n
0
1
00
expexpexp1
1expexp10
0
tdtt
Therefore: !11211 nnnnnnn Table of Content
231
SOLO Permutation & Combinations
Combinations
Given k boxes, each box having a maximum capacity (for box i the maximumobject capacity is ni ).
1 2 3 nn-1
n1 n2 nk
Box 1 Box 2 Box k
Given also n objects, that must be arranged in k boxes, each box must be filled to it’s maximum capacity :
The order of the objects in the box is not important.
Example: A box with a capacity of three objects in which we arranged the objects {2}, {4}, {7)
42 7
4
2 74 27
427
42 7
4 2 7
4 27
Equivalent
3!=6 arrangements
1 outcome
nnnn k 21
232
SOLO Permutation & Combinations
Combinations (continue - 1)
1 2 3 nn-1
n1 n2 nk
Box 1 Box 2 Box k
12 3 nn-1
123 nn-1
123
kn
!n
1n 2n
In order to count the different combinations we start with n ! different arrangements of then objects.
nnnn k 21
In each of the n! arrangements the first n1 objects will go to box no. 1, the next n2
objects in box no. 2, and so on, and the last nk objects in box no. k, and since:
all the objects are in one of the boxes.
233
SOLO Permutation & Combinations
Combinations (continue - 2)
But since the order of the objects in the boxes is not important, to obtain the number ofdifferent combinations, we must divide the total number of permutations n! by n1!, becauseof box no.1, as seen in the example bellow, where we used n1=2.
1 2 3 nn-1
n1=2 n2 nk
Box 1 Box 2 Box k
12 3 nn-1
kn
!n
21 n 2n
123 nn-1
123 nn-1
12nn-1
12n n-1
4
4
4
4
n-2
n-2
n-3
n-3
!1n
!1n
!1n
SameCombination
SameCombination
SameCombination
!
!
1n
n
Therefore since the order of the objects in the boxes is not important, and becausethe box no.1 can contain only n1 objects, the number of combination are
234
SOLO Permutation & Combinations
Combinations (continue - 3)
Since the order of the objects in the boxes is not important, to obtain the number ofdifferent combinations, we must divide the total number of arrangements n! by n1!, becauseof box no.1, by n2!, because of box no.2, and so on, until nk! because of box no.k, to obtain
!!!
!
21 knnn
n
Combinations
to BernoulliTrials
To GeneralizedBernoulli
Trials
Table of Content
235
SOLO Review of Probability
References
[1] W.B. Davenport, Jr., and W.I. Root, “ An Introduction to the Theoryof Random Signals and Noise”, McGraw-Hill, 1958
[2] A. Papoulis, “ Probability, Random Variables and Stochastic Processes”, McGraw-Hill, 1965
[4] S.M. Ross, “ Introduction to Probability Models”, 4th Ed., AcademicPress, 1989
[6] R.M. McDonough, and A.W. Whalen, “ Detection of Signals in Noise”,2nd Ed., Academic Press, 1995
[7] Al. Spătaru, “ Teoria Transmisiunii Informaţiei – Semnale şi Perturbaţii”,(romanian), Editura Tehnică, Bucureşti, 1965
[8] http://www.york.ac.uk/depts/maths/histstat/people/welcome.htm
[9] http://en.wikipedia.org/wiki/Category:Probability_and_statistics
[10] http://www-groups.dcs.st-and.ac.uk/~history/Biographies
[3] K. Sam Shanmugan, and A.M. Breipohl, “ Random Signals – Detection,Estimation and Data Analysis”, John Wiley & Sons, 1988
[5] S.M. Ross, “ A Course in Simulation”, Macmillan & Collier MacmillanPublishers, 1990
Table of Content
236
SOLO Review of Probability
Integrals Used in Probability
!1
!!1
1
0 mn
mnduuu mn
2
1expexp
aa
xxadxxax
a
x
a
x
axadxxax
2
23
2 22expexp
2
1exp
0
2
dxx
02
1exp
0
2
aa
dxxa
dxx 2exp
0exp 2
aa
dxxa
,3,2,1,0!exp0
nndxxx n ,3,2,1,0,0!
exp1
0
naa
ndxxax
n
n
237
SOLO Review of Probability
Gamma Function
238
SOLO Review of Probability
Incomplete Gamma Function
אפריל 23 239
SOLO
TechnionIsraeli Institute of Technology
1964 – 1968 BSc EE1968 – 1971 MSc EE
Israeli Air Force1970 – 1974
RAFAELIsraeli Armament Development Authority
1974 – 2013
Stanford University1983 – 1986 PhD AA
240
SOLO Review of Probability
Ferdinand Georg Frobenius (1849 –1919)
Perron–Frobenius Theorem
In linear algebra, the Perron–Frobenius Theorem, named after Oskar Perron and Georg Frobenius, asserts that a real square matrix with positive entries has a unique largest real eigenvalue and that the corresponding eigenvector has strictly positive components. This theorem has important applications to probability theory (ergodicity of Markov chains) and the theory of dynamical systems (subshifts of finite type).
Oskar Perron (1880 – 1975)
SOLO Review of Probability
Monte Carlo Categories
1. Monte Carlo Calculations
Design various random or pseudo-random number generators.
2. Monte Carlo Sampling
Develop efficient (variance – reduction oriented) sampling techniques for estimation.
3. Monte Carlo Optimization
Optimize some (non-convex, non-differentiable) functions, to name a few,Simulated annealing, dynamic weighting, genetic algorithms.