Upload
others
View
12
Download
0
Embed Size (px)
Citation preview
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Introduction to Monte Carlo Methods
Prof. Nicholas Zabaras
Center for Informatics and Computational Science
https://cics.nd.edu/
University of Notre Dame
Notre Dame, Indiana, USA
Email: [email protected]
URL: https://www.zabaras.com/
October 4, 2018
1
https://cics.nd.edu/mailto:[email protected]://www.zabaras.com/
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Contents
2
Review of the Bayesian inference framework, Introducing the Monte
Carlo simulation, reviewing the central Limit Theorem and Law of Large
Numbers, indicator functions, error approximations
Examples showing convergence of the MC simulator, Generalization of
the MC Estimator in High Dimensions
Sample Representation of the Monte Carlo Estimator
Deterministic vc MC integration, Using MC for Computing integrals,
expectations and Bayes factors
Following closely:
C. Robert, G. Casella, Monte Carlo Statistical Methods (Ch.. 1, 2, 3.1, & 3.2) (google books, slides, video)
J. S. Liu, MC Strategies in Scientific Computing (Chapters 1 & 2)
J-M Marin and C. P. Robert, Bayesian Core (Chapter 2)
Statistical Computing & Monte Carlo Methods, A. Doucet (course notes, 2007)
http://books.google.com/books?id=HfhGAxn5GugC&dq=Monte+Carlo+statistical+methods,+google+books&printsec=frontcover&source=bn&hl=en&ei=JbxiTP6KDIK88gaK7bnsCQ&sa=X&oi=book_result&ct=result&resnum=4&ved=0CCQQ6AEwAwhttp://books.google.com/books?id=HfhGAxn5GugC&printsec=frontcover&source=gbs_v2_summary_r&cad=0http://www.ceremade.dauphine.fr/~xian/Madrid.pdfhttp://videolectures.net/christian_robert/http://books.google.com/books?id=R8E-yHaKCGUC&printsec=frontcover&dq=monte+carlo+strategies+in+scientific+computing&hl=en&ei=afBiTOWwNoT7lwfi1-nMCw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDUQ6AEwAAhttp://www.springerlink.com/content/x38261/http://people.cs.ubc.ca/~arnaud/stat535/slides7.pdf
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The Bayesian Model
3
Let us revisit our Bayesian model
In most problems of interest there are no analytical closed forms for
the posterior (using conjugate priors is one exception).
Also note that the denominator in the Bayes’ equation above implies
the calculation of the following high-dimensional integral:
( ) ( | )f x d
( ) ( | )( | )
( ) ( | )
f xx
f x d
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The Bayesian Model
4
Consider the posterior model
Typical point estimates based on the posterior include the following:
We are often interested in the mode of the posterior
If 𝜃 = (𝜃1, 𝜃2) and 𝜃2 is a nuisance parameter, marginal distributions are also often needed:
1 1 2 2( | ) ( , | )x x d
( ) ( | )( | )
( ) ( | )
f xx
f x d
22
| ( | )
| ( | ) |
x x d
Var x x d x
( | )x
argmax ( | )MAP x
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The Bayesian Model
5
Consider the posterior model
The predictive distribution of 𝑌, 𝑌~𝑔(𝑦|𝑥), and its mean , are:
Similarly, for model selection, we have seen that the posterior takes the
form:
1
( ) ( ) ( | , )( , | )
( ) ( ) ( | , )
k
k k k kk
k k k k k
k
k f x kk x
k f x k d
( ) ( | )( | )
( ) ( | )
f xx
f x d
( | ) ( | ) ( | )
| ( | ) ( | )
g y x f y x d
Y x yf y x d dy
|Y x
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Simulation
6
All of the calculations reviewed earlier require computing high-
dimensional integrals.
Monte Carlo simulation provides the means for effective calculation of
these integrals and for resolving many more issues.
Some historical (early) references on Monte Carlo Methods:
N. Metropolis and S. Ulam, The Monte Carlo Method, J Amer. Statist. Ass., Vol. 44, pp. 335-341
(1949)
N. Metropolis, The Beginning of the Monte Carlo Method, Los Alamos Science Special Issue
(1987)
http://www.amstat.org/misc/TheMonteCarloMethod.pdfhttps://www.dropbox.com/s/5iwwnfdj0nak2s9/TheMonteCarloMethod.pdf?dl=0http://library.lanl.gov/cgi-bin/getfile?00326866.pdfhttps://www.dropbox.com/s/qc5qv64krrlg57f/TheBeginningOfTheMCMethod.pdf?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Introducing Monte Carlo Simulation
7
Consider a 2 × 2 square as shown in the Figure below.
Let an inscribed circle D of radius 1 in the square S.
2S
S
D
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Introducing Monte Carlo Simulation
8
Let us consider dropping darts uniformly on the square S. This means that the probability of the dart falling in a subdomain 𝐴 of S is proportional to the area of 𝐴.
Let 𝐷 = (𝑥, 𝑦) define a random variable on that represents the location of the drop of the dart. We have:
Assume 𝑁 independent drops of the dart on the square S, i.e.
( ) Adxdy
P D Adxdy
S
S
1 2, ,..., ND D D
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Introducing Monte Carlo Simulation
9
A common sense estimation of the above probability would be
where 𝑁 is the total number of dart falls.
Can we give a statistical justification
of this result?
( ) Adxdy
P D Adxdy
S
𝑃(𝐷 ∈ 𝐴) ≃𝑇𝑖𝑚𝑒𝑠 𝑡ℎ𝑒 𝑑𝑎𝑟𝑡 𝑓𝑒𝑙𝑙 𝑖𝑛 𝐴
𝑁
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Indicator Functions
10
Let us introduce the indicator function for the event 𝐴, i.e. let
Let us compute the probability for
The above results come immediately noticing that
1 ,( , )A
if drop point of dart d = (x, y) Ax y
0, otherwise
.D A
( , ) ( , )1
( ) ( , )4 4
A A
A
x y dxdy x y dxdy
P D A x y dxdydxdy
S S
S
S
\ \
( , ) ( , ) ( , ) 1 0 1A A AA A A A A
x y dxdy x y dxdy x y dxdy dxdy dxdy dxdy S S S
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Indicator Functions
11
The probability density associated to 𝑃 is ¼ - this is the density of the uniform distribution on 𝑆 denoted as
Let us introduce the random variable where
𝑋, 𝑌 are the random variables representing the Cartesian coordinates of a uniformly distributed point 𝑑 = (𝑥, 𝑦) on S, where a dart falls.
With this notation
( , ) ( , )1
( ) ( , )4 4
A A
A
x y dxdy x y dxdy
P D A x y dxdydxdy
S S
S
S
1( ) ( , ) ( )
4 SAP d A x y dxdy V U
S
SU
( ) : ( ) : ( , ),A AV D D X Y
~ ,SD U
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Strong Law of Large Numbers
12
Let 𝑋𝑖 for 𝑖 = 1, 2, . . . , 𝑁 be independent and identically distritbutedrandom variables (i.i.d.) with mean 𝔼(𝑋𝑖) = 𝜇 and variance 𝑉(𝑋𝑖) =𝜎2 < ∞, the strong LLN states for the sample mean the following:NX
lim NX almost surelyN
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras) 13
Let 𝑋𝑖 for 𝑖 = 1, 2, . . . , 𝑁 be independent and identically distritbutedrandom variables (i.i.d.) with mean 𝔼(𝑋𝑖) = 𝜇 and variance 𝑉(𝑋𝑖) =𝜎2 < ∞, the weak LLN states that the sample mean is a random variable that converges to the true mean as , i.e.
More formally:
Note that
NXN
1
N
i
iN
X
X as NN
Pr 0 0NNlim X
2
2
1 1 1 1
2 2
[ ] var[ ]
[ ] var[ ]
N N N N
i i
i i i iN N
X X
X and Xn n N N N
Weak Law of Large Numbers
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The Central Limit Theorem
14
Let 𝑋𝑖 for 𝑖 = 1, 2, . . . , 𝑁 be independent and identically distributed (i.i.d) random variables each with expectation μ and variances Then:
Under some weak conditions (such as finite variance), the sample
mean has a limiting normal distribution:
2.
1
1 1
1 1 1[ ] ( )
N
i N Ni
N i i
i i
X
X X NN N N N
22 21
2 21 1
1 1 1[ ] ( )
N
i N Ni
N i
i i
X
Var X Var Var X NN N N N N
1
N
i
iN
X
XN
lim𝑁→∞
ത𝑋𝑁 − 𝜇
𝜎2
𝑁
∼ 𝒩(0,1)𝑜𝑟 𝑎𝑠 𝑁 → ∞ ത𝑋𝑁 ∼ 𝒩(𝜇,
𝜎2
𝑁)
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Back to Indicator Functions
15
Let 𝑋𝑖 for 𝑖 = 1, 2, . . . , 𝑁 be an indicator function for an event 𝐹, i.e. let
Then as a result of the law of large numbers,
Note that herein we used that
i
1 if theoutcomeof theexperiment i is F,X
0, otherwise
1
Pr( )N
iN
i
XX F as N
N
1 Pr( ) 0 Pr( ) Pr( )iX F F F
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Let us introduce the random variables
associated to the drops and consider the sum
Assuming (𝑁∞), then the law of large numbers (since )yields
where we already proved that
When 𝑁∞, this justifies our intuitive result introduced earlier.
Returning to Our Dart Drop Experiment
16
: ( ) : ( ), 1, 2., ,i i A iV V D D i N
lim ( ) ( )SNN
S V almost surely
U
1
. . . : ( ), 1,2.,,
i i
N
i
iN
Empirical averageof i i d V V D i N
Vnumber of drops that fell in A
SN N
, 1,2,...,iD i N
( )S
V U
Pr ( )S
D A V U
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Since is an unbiased estimator of
To characterize the precision of the estimator, we can use the following:
This means that
Thus the mean square error between decreases as
Mean Square Error
17
1( ) ,
4 4P d dxdy
D
D
121
1 1( ) ( ) ( ) ( ) ( ' . . .)
N
N N i i
i
Var E Var S Var V Var V recall theV s are i i dN N
NS .4
4N N N
Errorterm
S is a random variable : S E
2 2
1
1( ) ( ) Pr( ) ( )N N N NVar S S S S D Var V
N
D
Pr( )NS and DD1
N
For our problem 2
2
1( ) ( ) ( ) 0.1685.4 4
Var V P D P D
D D
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Applying the central limit theorem (since ) :
The probability of the error being larger than is given as:
where
One can similarly prove that for any integer 𝑁 ≥ 1 and 𝜀 > 0,
where and (see here).
Properties of the Estimator
18
( )Var V
( ),
4
N
d
Var VS
NN
var( )2
V
N
var( )2
var( )Pr 2 2 1 2 1 2 0.0456
4 var( )N
V
V NS
N V
N
.x the CDF of 0,1 N
2
2var( )
2
var( )Pr | | 2 1
4 var( ) var( )2
N
V
N
VS erfc Ce
NV V
N N
1
12 2
xx erf
2
:xe
for large x erfc xx
http://en.wikipedia.org/wiki/Standard_normal_tablehttp://mathworld.wolfram.com/Erf.html
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Coefficient of Variation
19
The coefficient of variation for our Monte Carlo estimator is given as:
For different COVs, the number of samples needed is given as:
With all of the above error estimates, we conclude that the
approximation error varies as .
(1 ) 4(1 )var 0.52334 4 4. .
( )4
N
N
N
SCoeff of Var COV S
S N NN
10% 28
5% 110
1% 2739
COV N
COV N
COV N
1O
N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
An alternative non-asymptotic error estimatea using a Bernstein type
inequality is:
Using this result, we can show that:
Alternatively, for any , using the Eq. above we can write:
These expressions for the approximation error show that it is inversely
proportional to
Properties of the Estimator
20
2
log(2 / )(0,1],
4 2
NP S for N
1N
log(40)Pr 0.05
4 2NS
N
.NaFrom Statistical Computing & Monte Carlo Methods, A. Doucet.
221 0, 24
N
NN and P S e
http://en.wikipedia.org/wiki/Bernstein_inequalities_(probability_theory)#cite_ref-3http://people.cs.ubc.ca/~arnaud/stat535/slides7.pdf
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Convergence of as a function of 𝑁 (one realization)
Convergence of the Simulator
21
4NS
See here for a
C++
implementation
(Meanerror_plt)
https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Convergence of as a function of 𝑁 (one hundred realizations)
Convergence of the Simulator
22
4NS
See here for a
C++
implementation
(Meanerror.plt)
https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Square root empirical mean square error across 100
realizations as a function of 𝑁 and (dotted)
Convergence of the Simulator
23
4NS
( )Var V
N
See here for a
C++
implementation
(Var_error.plt)
https://www.dropbox.com/s/8xnqtvao8ua4a8u/Dart_experiment_simulator.rar?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Convergence of the Estimator
24
Monte Carlo estimator of the area of a unit circle: (a) Circles: MC mean averaged over the S=100
runs; (b) Theoretical error bars using the estimator discussed earlier; (c) Average of the MC
numerical error bars (from the S=100 MC runs for each N). MatLab implementation is given here.
1
1
1( ) ( )
1( ) ( )
S
i
S
i
Mean Area Estimator at each N circles MC Area Estimator of circle run iS
Sampled Area Error Bars colored area Numerical Area Error Bars run iS
24 4
( ) /
Nour S MC estimatorr
MC Estimator of the area A of circle run i Area of square Number of counts of samples falling in circle N
2
2
1
1 1( ) 4 ( )
1
N
A j N
Area of Square jIndicator function
Numerical Area Error Bars run i r D SN N
2
2( ) 44 4
N
Area of Square
exact std of our S MC estimator
Theoretical Area Error Bars solid lines r N
https://www.dropbox.com/s/6vl1rfq457k78xi/AreaCircle.rar?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Consider now the case where We assume a
hypercube S nθ and the inscribed hyperball D nθ in
We can build the same estimator as in 2𝐷. Our indicator function now becomes .
All of our earlier derivations are still applicable. The rate of
convergence of the estimator in the mean square sense is
independent of n and equal to
This is not the case using a deterministic method on a grid of regularly
spaced points where the convergence rate is typically of the form
1/Nr/n where 𝑟 is related to the smoothness of the contours of D nθ.
Monte Carlo methods are thus attractive when n is large.
Generalization of the Dart Drop Experiment
25
, 1.n
n .
n DD
1 .N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Assume you are interested in computing the volume of the
hypersphere of radius R = 1 in n−dimensions:
We want to do so using samples from the hypercube of
volume
Using 𝑁 samples, the variance of our MC estimator is
Generalization of the Dart Drop Experiment
26
2
0
12
n
nvol as nn
S
1
.2
n n n
n
n n
p p pVar Xas n
N N N
volwith p
S
1,1n
2 .
n
𝑤ℎ𝑒𝑟𝑒: 𝑋 ∼ ℬℯ𝓇𝓃ℴ𝓊ℓℓ𝒾 𝑝𝑛𝜃
http://en.wikipedia.org/wiki/N-spherehttp://en.wikipedia.org/wiki/Bernoulli_distribution
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
The coefficient of variation is then:
To obtain a reasonable relative error, we would need
For this choice:
For
Clearly a plain Monte Carlo approach is not as helpful in 20 and higher dimensions! You are trying in a huge volume of the hypercube to hit
the infinitesimal hypersphere!
Monte Carlo and the Curse of Dimensionality
27
1
n
n n
pVar X
N NCOV as n
X p Np
1100 .nN p
0.1COV
1 20 2010 1
9
0
11 10!20, 100 100 2 100 2 4.06 10nn N p
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
To obtain a fixed precision in the variance, a number of samples that
is exponential in the dimension is needed!
So using a plain Monte Carlo method still suffers from the curse of
dimensionality
MC not appropriate for rare events modeling.
Generalization of the Dart Drop Experiment
28
1, ( ) , 1.
: .
n
n
Consider : X and Var X C with
Var X CThen For N
N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Assume i.i.d. samples
Now consider any set and assume that we are interested in
We naturally choose the following estimator:
which by the law of large numbers is a consistent estimator of
since
Generalization
29
1N
A
( ) ~ ( 1,2,..., )i i N
( ) ( ), ~ .A P A for
( )A
( )
1
1lim ( ) ( ( )) ( )
Ni
A AN
i
AN
𝜋(𝐴) ≃𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑖𝑛 𝐴
𝑡𝑜𝑡𝑎𝑙 𝑛𝑢𝑚𝑏𝑒𝑟 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Now we generalize the idea to tackle the generic problem
of estimating
where and 𝜋 is a probability distribution on
We assume that and that cannot be
calculated analytically.
Generalization
30
( ( )) ( ) ( )f f d
: fn
f .xn
( ( ) )f ( ( ))f
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
To evaluate we consider the unbiased estimator
From the law of large numbers will converge and
A good measure of the approximation is the variance of
From the central limit (if ), we have:
Generalization
31
( )
1
1( ) ( )
Ni
N
i
S f fN
( )
1
( )1( ) ( )
Ni
N
i
Var fVar S f Var f
N N
( )
1
1lim ( ) ( ( )) ( . .)
Ni
Ni
f f a sN
( ( )),f
( )NS f
( ),NS f
var ( )f
( ) ( ) 0, ( )
( )( ) ( ) ,
N
Nd
N
Nd
N S f f Var f or
Var fS f f
N
N
N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Generalization
32
The rate of convergence is independent of the dimension of
Integration in complex domains is now not a problem.
The method is easy to implement and rather general. You will need
to be able to evaluate
to be able to produce samples distributed according to
( )f
Some historical references on Monte Carlo Methods:
N. Metropolis and S. Ulam, The Monte Carlo Method, J Amer. Statist. Ass., Vol. 44, pp. 335-341
(1949)
N. Metropolis, The Beginning of the Monte Carlo Method, Los Alamos Science Special Issue
(1987)
http://www.amstat.org/misc/TheMonteCarloMethod.pdfhttps://www.dropbox.com/s/5iwwnfdj0nak2s9/TheMonteCarloMethod.pdf?dl=0http://library.lanl.gov/cgi-bin/getfile?00326866.pdfhttps://www.dropbox.com/s/qc5qv64krrlg57f/TheBeginningOfTheMCMethod.pdf?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sample Representation of the MC Estimator
33
Let us introduce the Dirac-Delta function for defined for
any as follows:
Note that this implies in particular that for
For we can introduce the following mixture of Delta-
Dirac functions:
which is the empirical measure, and consider for any
0( )
0 0( ) ( ) ( )f d f
( ) ~ , 1,..., ,i i N
0
: fn
f
A
0 0 0( ) ( ) ( ) ( )A A
A
d d
A
ො𝜋𝑁(𝜃):=1
𝑁
𝑖=1
𝑁
൯𝛿𝜃 𝑖 (𝜃
ො𝜋𝑁(𝐴) ≜ න
𝐴
ො𝜋𝑁 𝜃 𝑑𝜃 =ා
𝑖=1
𝑁
න
𝐴
1
𝑁𝛿𝜃 𝑖 (𝜃) 𝑑𝜃 =
1
𝑁
𝑖=1
𝑁
൯𝕀𝐴(𝜃𝑖 = 𝑆𝑁(𝐴) (# 𝑜𝑓 𝑠𝑎𝑚𝑝𝑙𝑒𝑠 𝑖𝑛𝐴)
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sample Representation of
34
The concentration of points in a given region of the space represents
This is contrast with parametric statistics where one starts with
samples and then introduces a distribution with an algebraic
representation of the underlying population.
Note that each sample has a weight of 1/𝑁, but that it is also possible to consider weighted sample representations of .
( )i
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sample Representation of
35
Sample representation of a Gaussian distribution is shown below.
See here for a MatLab Implementation
Software/GaussianSampleInterpretation.mhttps://www.dropbox.com/s/tak5ny5evo3jqeh/GaussianSampleInterpretation.m?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Deterministic Vs. Monte Carlo Integration
36
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sample Representation of the MC Estimator
37
Now consider the problem of estimating We simply replace
with its sample representation )ො𝜋𝑁(𝜃 and obtain:
This is precisely , the Monte Carlo estimator suggested earlier.
Clearly based on , we can easily estimate for any 𝑓.
For example, the variance of 𝑓 is approximated as:
( ).f
( )NS f
( )f)ො𝜋𝑁(𝜃
𝔼𝜋(𝑓) ≃ න
𝛩
)𝑓(𝜃 ා
𝑖=1
𝑁
න
𝐴
1
𝑁𝛿𝜃 𝑖 (𝜃) 𝑑𝜃 =ා
𝑖=1
𝑁
න
𝛩
1
𝑁𝑓(𝜃)𝛿𝜃 𝑖 (𝜃) 𝑑𝜃 =
1
𝑁
𝑖=1
𝑁
𝑓 𝜃 𝑖
𝑉𝑎𝑟𝜋(𝑓) = 𝔼𝜋(𝑓2) − 𝔼𝜋
2(𝑓) ≃1
𝑁
𝑖=1
𝑁
𝑓2 𝜃 𝑖 −1
𝑁
𝑖=1
𝑁
𝑓 𝜃 𝑖
2
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
From the Algebraic to the Sample Representation
38
Similarly, if we have:
The marginal distribution is given as:
If we want to estimate and is only known up to a
normalizing constant, then a reasonable estimate is the following:
arg max ( )
( )
( )( )arg max ( )
i
i
ො𝜋𝑁(𝜃1, 𝜃2) =1
𝑁
𝑖=1
𝑁
𝛿𝜃1𝑖𝜃2𝑖 𝜃1 , 𝜃2
ො𝜋𝑁(𝜃1) =1
𝑁
𝑖=1
𝑁
𝛿𝜃1𝑖 𝜃1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Sampling from an Arbitrary Distribution
39
We can now see that if we can sample easily from an arbitrary
distribution, then we could easily compute any quantities of interest.
But how do we sample from an arbitrary distribution?
We will discuss about MCMC and other methods in follow up
lectures
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Integration
40
We can use the same estimation for performing integration. Consider
computing the following:
We can write this integration as where the
mean is wrt the uniform distribution in 𝐷:
To compute the integral 𝐼, we thus need to draw samples from this distribution. Our estimator will then be as follows:
The estimator converges as .
( )D
I f x dx
1( ) ( )DD
I f x dx f x
1( )
0,
if x Dx
otherwise
1
1( )
n
i
i
I f xn
1( )O
N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Integration
41
We can extend this to high (d) dimensions:
The estimator converges as regardless of the dimensionality 𝑑.
For a Riemann integration of this, the rate of convergence is better
where a grid of equally spaced points is used (e.g. for 𝐷 = [0,1], Δ𝑥 =1/𝑁).
However, in let us say ten dimensions, 𝐷 = [0,1]10, you will need O (𝑁10) grid points to achieve the same rate of convergence as in 1D!
Generally, the rate of convergence of the Riemann approximation of
the above integral is where 𝑁 is the total number of grid points and r depends on the smoothness of the domain 𝐷.
( )D
I f d x x
1( )
N
1( )N
1( )N
J. S. Liu, MC Strategies in Scientific Computing (Chapter 2, Introduction)
/(1 )r dN
http://books.google.com/books?id=R8E-yHaKCGUC&printsec=frontcover&dq=monte+carlo+strategies+in+scientific+computing&hl=en&ei=afBiTOWwNoT7lwfi1-nMCw&sa=X&oi=book_result&ct=result&resnum=1&ved=0CDUQ6AEwAA
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
( )x( )x
Riemann integration
( )x
MC integration
Monte Carlo for Integration
42
In computing with Riemann integration, one e.g.
constructs a grid with Δ𝑥 = 1/𝑁, computes and with approximation error evaluates:
In 𝑑 −dimensions, we need 𝑁𝑑 evaluations to keep an error
Clearly the MC estimator is slower in convergence but it
converges independently of the dimension 𝑑. It requires no grid of points. But you need to be able to sample from 𝑝(𝑥).
[0,1]
( )D
I f x dx
if(x )= f(iΔx)
1 1
1( ) ( )
N N
N i i
i i
I f x x f xN
1x N
if(x )
1( )
N
1N
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Integration Consider calculating for any function 𝑓(𝑥), the following
integral
We can write this integral as:
Here
The Monte Carlo estimator of the integral is now:
1
0
( ) ( )f f x dx
[0,1]~X U
43
ො𝜇𝑁(𝑓) =1
𝑁
𝑖=1
𝑁
)𝑓(𝑋𝑖 , 𝑋𝑖 ~𝑖.𝑖.𝑑
𝑈 0,1
𝜇(𝑓) = න
−∞
+∞
𝑓(𝑥)𝕀 0,1 (𝑥)𝑑𝑥 = 𝔼 )𝑓(𝑋
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
0 5 10 15 20 25 30 35 40 45 500.2
0.4
0.6
0.8
f(x)=x
Empirical Average
Theoretical Mean
0 5 10 15 20 25 30 35 40 45 500
0.1
0.2
0.3
0.4f(x)=x2
Empirical Average
Theoretical Mean
0 5 10 15 20 25 30 35 40 45 50-0.5
0
0.5
1f(x)=cos( x)
Empirical Average
Theoretical Mean
Monte Carlo Integration
MatLab Implementation
( )
0.5
f x x
2( )
0.333
f x x
( ) cos
0
f x x
44
https://www.dropbox.com/s/kp98e697plbkkia/MonteCarloEx1.m?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Monte Carlo Integration: Variance From the earlier discussed properties of the MC
estimator:
where:
When the variance is unknown, we
can use its MC estimator instead: 2 ( ) ( )f Var f X
22
[0,1]( ) ( ) ( ) ( )f Var f X f x x dx
45
ො𝜎𝑁2(𝑓) =
1
𝑁 − 1
𝑖=1
𝑁
)𝑓(𝑋𝑖) − ො𝜇𝑁(𝑓2, ො𝜇𝑁(𝑓) =
1
𝑁
𝑗=1
𝑁
൯𝑓(𝑋𝑗 , 𝑋𝑖 ~𝑖.𝑖.𝑑
𝒰 0,1
𝔼 ො𝜇𝑁 = 𝜇
𝑉𝑎𝑟 ො𝜇𝑁 =1
𝑁𝑉𝑎𝑟 )𝑓(𝑋 =
1
𝑁𝜎2(𝑓), 𝑋~𝒰 0,1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1f(x)=x
Empirical Variance
Theoretical Variance
0 5 10 15 20 25 30 35 40 45 500
0.05
0.1f(x)=x2
Empirical Variance
Theoretical Variance
0 5 10 15 20 25 30 35 40 45 500.1
0.2
0.3
0.4
0.5f(x)=cos( x)
Empirical Variance
Theoretical Variance
Monte Carlo Integration: Empirical Variance
MatLab Implementation
( )
( ) 1/12
f x x
Var f X
2( )
( ) 4 / 45
f x x
Var f X
( ) cos
( ) 1/ 2
f x x
Var f X
46
ො𝜎𝑁2(𝑓) =
1
𝑁 − 1
𝑖=1
𝑁
)𝑓(𝑋𝑖) − Ƹ𝜇𝑁(𝑓2, 𝑋𝑖 ~
𝑖.𝑖.𝑑𝑈 0,1
https://www.dropbox.com/s/kp98e697plbkkia/MonteCarloEx1.m?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Optimal Number of MC Samples We can compute the optimal number of MC samples for a
desired accuracy of the estimator:
From the asymptotic properties we
can estimate:
where:
Approximating the variance , the number of needed 𝑁should satisfy:
2
2
2( )
( )
XNX N f
f
1 11 , [0,1]2
X inverse cdf of
N
2 ( )f
47
Pr | )ො𝜇𝑁(𝑓) − 𝜇(𝑓 | ≤ 𝜀 = 1 − 𝛼 ⇒ Pr| )ො𝜇𝑁(𝑓) − 𝜇(𝑓 |
ൗ𝜎
𝑁
≤𝜀 𝑁
𝜎= 1 − 𝛼
𝑁 ො𝜇𝑁 − 𝜇 →𝑁→∞
𝐷𝒩 )0, 𝜎2(𝑓
ො𝜎𝑁2(𝑓) ≤
𝑁𝜀2
𝑋𝛼2
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Optimal Number of MC Samples The condition can be satisfied iteratively
1. Start with n1 MC samples from
2. If then stop; otherwise
3. Evaluate and generate 𝑘1 samples
where here indicates the integer
part of 𝑥.
11,..., ~ [0,1]nX X U
1 11,..., ~ [0,1]n n kX X U x
48
ො𝜎𝑁2(𝑓) ≤
𝑁𝜀2
𝑋𝛼2
ො𝜎𝑛12 (𝑓) ≤
𝑛1𝜀2
𝑋𝛼2
𝑘1 =൯𝑋𝛼
2 ො𝜎𝑛12 (𝑓
𝜀2− 𝑛1
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Computing Expectations
49
Expectations with respect to any distribution also involve high-
dimensional integration:
If we can draw samples from 𝜋(𝒙), then we can use the estimator
( ) ( )D
I f d x x x
( )x( )x
Riemann integration
( )x
MC integration
መ𝐼 =1
𝑁
𝑖=1
𝑁
(𝑓(𝑥𝑖), 𝑤𝑖𝑡ℎ 𝐶𝑜𝑒𝑓𝑓. 𝑜𝑓𝑉𝑎𝑟𝑖𝑎𝑡. መ𝐼) =𝑉𝑎𝑟 )𝑓(𝑥
𝑁𝔼 )𝜋(𝑥 )𝑓(𝑥
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
For the CMBdata, consider 2 subsamples and
We consider that both come from distributions
and
We want to decide if both means are the same, i.e. test
We assume that the variance (error) 2 is the same for both models
and use a prior
The Bayes factor can then be computed as follows:
Note that the Bayes factor does not depend on the normalizing
constant of and thus the use of an improper prior
is fine.
50
1,..., nx x 1,..., .ny y
0 : .x yH
2 21
10B
2 2 2
10 2 2 2
, , | ,
, |
x y n x y x y
n
d d dB
d d
D
D
2 2 21
Bayes Factor Approximation
J-M Marin and C. P. Robert, Bayesian Core, Chapter 2
𝑦1, . . . , 𝑦𝑛 ∼ 𝒩 𝜇𝑦 , 𝜎2 .
𝑥1, . . . , 𝑥𝑛 ∼ 𝒩 𝜇𝑥 , 𝜎2
https://www.dropbox.com/s/7w9peyrlpwcdv4m/CMBdata?dl=0http://www.springerlink.com/content/x38261/
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
We introduce a new parametrization by writing:
with the prior
This allows the same prior to be used in both 𝐻0 and the alternative 𝐻1: 𝜇𝑥 ≠ 𝜇𝑦.
We choose the improper prior and
The Bayes’ factor can now be re-written as:
51
,x y
,
1
2
2 2 2
10 2 2 2
2 2/2 /2 12 2 2 2 2 2 2 22
2 2/2 /2 12 2 2 2 2 2 2 2
, , | ,
, |
1exp 2 exp 2
2
exp 2 exp 2
x y n x y x y
n
n n
x y
n n
x y
d d dB
d d
n x s n y s e d d d
n x s n y s d d
D
D
Bayes Factor Approximation
𝜉 ∼ 𝒩(0,1).
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Bayes Factor Approximation
52
To simplify the Bayes factor, let us define:
We can then write:
Performing the integrals in 𝜎2:
Lets now integrate in 𝜇. Start with the denominator. Note that:
22 2 2
2
2 2 2
2
12 22 2
101
2 22
1
2
nx y Sn
nx y Sn
e e d d d
B
e d d
2 2
222 1 1
n n
i iyx i i
x x y yss
Sn n n n
2
2 22 2
102 2
2
1
2
n
n
x y S e d d
B
x y S d
22
2 2
22 2
x yx yx y
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Bayes Factor Approximation
53
So the denominator takes the form:
With the definitions and we can simplify as:
2
2 22 2
102 2
2
1
2
n
n
x y S e d d
B
x y S d
22
22 2
2 22 4 2
n
nn
x yx y Sx y S d d
2
2
2 4 2
2 1
x y S
n
10
( 1)/22
2 22 2 2
2
2 1
2 22 1 2
1
2
1 1
2 22 2
2 2
nn n
n n n
Common Factorin the demomin.and numer.of
B
x y
x y S d d
1/22
2
1
2 2
n
x y S
2 1,n
Here we use the normalizing constant
of the t-distribution
http://en.wikipedia.org/wiki/Student's_t-distribution
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
Thus for the normal case with
54
and 0 : 0H
under prior 2 2, 1 and
2
1/22
2
01 1/22
2 /210
21
2 2 2
n
n
x y S
BB
x y S e d
For CMBdata, we simulate and approximate
with (for one simulation that we run)01B
when . Thus 𝐻0 is much more likely with the data available.
20.0888, 0.1078, 0.00875, 100x y S n
Bayes Factor Approximation
𝑥1, . . . , 𝑥𝑛 ∼ 𝒩 𝜇 + 𝜉, 𝜎2
𝑦1, . . . , 𝑦𝑛 ∼ 𝒩 𝜇 − 𝜉, 𝜎2
)𝜉 ∼ 𝒩(0,1
)𝜉1, . . . , 𝜉1000 ∼ 𝒩(0,1
𝐵01𝜋 =
ҧ𝑥 − ത𝑦 2 + 2𝑆2 −𝑛+ Τ1 2
Τ𝑖=1
10002𝜉𝑖 + ҧ𝑥 − ത𝑦
2 + 2𝑆2 −𝑛+ Τ1 2 1000= 43.3309
https://www.dropbox.com/s/7w9peyrlpwcdv4m/CMBdata?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
For an estimator, with , the variance can
be computed (in terms of the empirical mean) as
and for 𝑁 large,
We can thus find the variability
in the Bayes factor estimation.
We construct a convergence
test and of confidence bounds on
the approximation of
20 40 60 80 100 120 1400
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
55
2
1
1 1( )
1
N
N j N
j
v h x hN N
Histogram of 1000 realizations
of the approximation of
based on 1000 simulations
each.
Precision Evaluation in Bayes Factor( ) ( )I h d x x x
MatLab Implementationൗℎ𝑁 − 𝔼𝑓 )ℎ(𝑋 )𝑣𝑁 ∼ 𝒩(0,1
𝐵01𝜋
𝐵01𝜋
)𝑥𝑖 ∼ 𝜋(𝒙
https://www.dropbox.com/s/noh9vipbkl9pfrk/PrecisionEvaluation.m?dl=0
Statistical Computing, University of Notre Dame, Notre Dame, IN, USA (Fall 2018, N. Zabaras)
0 100 200 300 400 500 600 700 800 900 1000
9.6
9.8
10
10.2
10.4
10.6
For estimating a normal mean, a robust prior is a Cauchy prior
Under squared error loss, the posterior mean is given as:
Form of suggests simulating i.i.d. variables 𝜃1,· · · , 𝜃𝑚 ∼ 𝒩(𝑥, 1)and calculate
LLN implies
56
~ ( ,1), ~ (0,1)x N C
2
2
/2
2
/2
2
1( )1
1
x
x
e d
x
e d
MatLab Implementation
Example (Cauchy-Normal)
𝛿𝑚𝜋 (𝑥) =
𝑖=1
𝑚𝜃𝑖
1 + 𝜃𝑖2
𝑖=1
𝑚
11 + 𝜃𝑖
2
𝛿𝑚𝜋 (𝑥) → 𝛿𝜋(𝑥) 𝑎𝑠 𝑚 → ∞
https://en.wikipedia.org/wiki/Cauchy_distributionhttps://www.dropbox.com/s/xxfqlloltlqchek/CauchyPrior.m?dl=0