Upload
christian-robert
View
1.997
Download
0
Tags:
Embed Size (px)
DESCRIPTION
Seminar in Stanford University, August 09, 2010
Citation preview
Vanilla Rao–Blackwellisation ofMetropolis–Hastings algorithms
Christian P. Robert
Universite Paris-Dauphine and CREST, Paris, FranceJoint works with Randal Douc, Pierre Jacob and Murray Smith
August 9, 2010
1 / 22
Main themes
1 Rao–Blackwellisation on MCMC.
2 Can be performed in any Hastings Metropolis algorithm.
3 Asymptotically more efficient than usual MCMC with acontrolled additional computing
4 Can take advantage of parallel capacities at a very basic level
2 / 22
Main themes
1 Rao–Blackwellisation on MCMC.
2 Can be performed in any Hastings Metropolis algorithm.
3 Asymptotically more efficient than usual MCMC with acontrolled additional computing
4 Can take advantage of parallel capacities at a very basic level
2 / 22
Main themes
1 Rao–Blackwellisation on MCMC.
2 Can be performed in any Hastings Metropolis algorithm.
3 Asymptotically more efficient than usual MCMC with acontrolled additional computing
4 Can take advantage of parallel capacities at a very basic level
2 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
3 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
3 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
4 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hastings algorithm
1 We wish to approximate
I =
∫h(x)π(x)dx∫π(x)dx
=
∫
h(x)π(x)dx
2 x 7→ π(x) is known but not∫π(x)dx .
3 Approximate I with δ = 1n
∑n
t=1 h(x (t)) where (x (t)) is a Markov
chain with limiting distribution π.
4 Convergence obtained from Law of Large Numbers or CLT forMarkov chains.
5 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hastings algorithm
1 We wish to approximate
I =
∫h(x)π(x)dx∫π(x)dx
=
∫
h(x)π(x)dx
2 x 7→ π(x) is known but not∫π(x)dx .
3 Approximate I with δ = 1n
∑n
t=1 h(x (t)) where (x (t)) is a Markov
chain with limiting distribution π.
4 Convergence obtained from Law of Large Numbers or CLT forMarkov chains.
5 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hastings algorithm
1 We wish to approximate
I =
∫h(x)π(x)dx∫π(x)dx
=
∫
h(x)π(x)dx
2 x 7→ π(x) is known but not∫π(x)dx .
3 Approximate I with δ = 1n
∑n
t=1 h(x (t)) where (x (t)) is a Markov
chain with limiting distribution π.
4 Convergence obtained from Law of Large Numbers or CLT forMarkov chains.
5 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hastings algorithm
1 We wish to approximate
I =
∫h(x)π(x)dx∫π(x)dx
=
∫
h(x)π(x)dx
2 x 7→ π(x) is known but not∫π(x)dx .
3 Approximate I with δ = 1n
∑n
t=1 h(x (t)) where (x (t)) is a Markov
chain with limiting distribution π.
4 Convergence obtained from Law of Large Numbers or CLT forMarkov chains.
5 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t)).
2 Set x (t+1) = yt with probability
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)
q(yt |x (t))
}
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied: ⊲ π is thestationary distribution of (x (t)).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t)).
2 Set x (t+1) = yt with probability
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)
q(yt |x (t))
}
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied: ⊲ π is thestationary distribution of (x (t)).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t)).
2 Set x (t+1) = yt with probability
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)
q(yt |x (t))
}
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied:
π(x)q(y |x)α(x , y) = π(y)q(x |y)α(y , x).
⊲ π is the stationary distribution of (x (t)).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Metropolis Hasting Algorithm
Suppose that x (t) is drawn.
1 Simulate yt ∼ q(·|x (t)).
2 Set x (t+1) = yt with probability
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)
q(yt |x (t))
}
Otherwise, set x (t+1) = x (t) .
3 α is such that the detailed balance equation is satisfied:
π(x)q(y |x)α(x , y) = π(y)q(x |y)α(y , x).
⊲ π is the stationary distribution of (x (t)).
◮ The accepted candidates are simulated with the rejection algorithm.
6 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Some properties of the HM algorithm
1 Alternative representation of the estimator δ is
δ =1
n
n∑
t=1
h(x (t)) =1
N
MN∑
i=1
nih(zi ) ,
where
zi ’s are the accepted yj ’s,MN is the number of accepted yj ’s till time N,ni is the number of times zi appears in the sequence (x (t))t .
7 / 22
Metropolis Hastings revisited Rao–Blackwellisation
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate according to q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
π(x)q(y |x) =
8 / 22
Metropolis Hastings revisited Rao–Blackwellisation
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate according to q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
π(x)q(y |x) =
8 / 22
Metropolis Hastings revisited Rao–Blackwellisation
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate according to q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
π(x)q(y |x) =
8 / 22
Metropolis Hastings revisited Rao–Blackwellisation
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate according to q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
π(x)q(y |x) =π(x)p(x)
∫π(u)p(u)du
︸ ︷︷ ︸
π(x)
α(x , y)q(y |x)
p(x)︸ ︷︷ ︸
q(y |x)
8 / 22
Metropolis Hastings revisited Rao–Blackwellisation
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate according to q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
π(x)q(y |x) =π(x)α(x , y)q(y |x)∫π(u)p(u)du
8 / 22
Metropolis Hastings revisited Rao–Blackwellisation
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate according to q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
π(x)q(y |x) =π(y)α(y , x)q(x |y)∫π(u)p(u)du
8 / 22
Metropolis Hastings revisited Rao–Blackwellisation
q(·|zi ) =α(zi , ·) q(·|zi )
p(zi )≤ q(·|zi )
p(zi ),
where p(zi ) =∫α(zi , y) q(y |zi )dy . To simulate according to q(·|zi ):
1 Propose a candidate y ∼ q(·|zi )2 Accept with probability
q(y |zi )/(
q(y |zi )p(zi )
)
= α(zi , y)
Otherwise, reject it and starts again.
3 ◮ this is the transition of the HM algorithm.
The transition kernel q admits π as a stationary distribution:
π(x)q(y |x) = π(y)q(x |y) ,
8 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Lemme
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probabilityparameter
p(zi ) :=
∫
α(zi , y) q(y |zi ) dy ; (1)
4 (zi )i is a Markov chain with transition kernel Q(z, dy) = q(y |z)dy
and stationary distribution π such that
q(·|z) ∝ α(z, ·) q(·|z) and π(·) ∝ π(·)p(·) .
9 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Lemme
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probabilityparameter
p(zi ) :=
∫
α(zi , y) q(y |zi ) dy ; (1)
4 (zi )i is a Markov chain with transition kernel Q(z, dy) = q(y |z)dy
and stationary distribution π such that
q(·|z) ∝ α(z, ·) q(·|z) and π(·) ∝ π(·)p(·) .
9 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Lemme
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probabilityparameter
p(zi ) :=
∫
α(zi , y) q(y |zi ) dy ; (1)
4 (zi )i is a Markov chain with transition kernel Q(z, dy) = q(y |z)dy
and stationary distribution π such that
q(·|z) ∝ α(z, ·) q(·|z) and π(·) ∝ π(·)p(·) .
9 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Lemme
The sequence (zi , ni ) satisfies
1 (zi , ni )i is a Markov chain;
2 zi+1 and ni are independent given zi ;
3 ni is distributed as a geometric random variable with probabilityparameter
p(zi ) :=
∫
α(zi , y) q(y |zi ) dy ; (1)
4 (zi )i is a Markov chain with transition kernel Q(z, dy) = q(y |z)dy
and stationary distribution π such that
q(·|z) ∝ α(z, ·) q(·|z) and π(·) ∝ π(·)p(·) .
9 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
zi−1
10 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
zi−1 zi
ni−1
indep
indep
10 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
zi−1 zi zi+1
ni−1 ni
indep
indep
indep
indep
10 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
zi−1 zi zi+1
ni−1 ni
indep
indep
indep
indep
δ =1
n
n∑
t=1
h(x (t)) =1
N
MN∑
i=1
nih(zi ) .
10 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Old bottle, new wine [or vice-versa]
zi−1 zi zi+1
ni−1 ni
indep
indep
indep
indep
δ =1
n
n∑
t=1
h(x (t)) =1
N
MN∑
i=1
nih(zi ) .
10 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Outline
1 Metropolis Hastings revisited
2 Rao–BlackwellisationFormal importance samplingVariance reductionAsymptotic resultsIllustrations
11 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Importance sampling perspective
1 A natural idea:
δ∗ =1
N
MN∑
i=1
h(zi )
p(zi ),
12 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Importance sampling perspective
1 A natural idea:
δ∗ ≃
∑MN
i=1
h(zi )
p(zi )∑MN
i=1
1
p(zi )
=
∑MN
i=1
π(zi )
π(zi )h(zi )
∑MN
i=1
π(zi )
π(zi )
.
12 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Importance sampling perspective
1 A natural idea:
δ∗ ≃
∑MN
i=1
h(zi )
p(zi )∑MN
i=1
1
p(zi )
=
∑MN
i=1
π(zi )
π(zi )h(zi )
∑MN
i=1
π(zi )
π(zi )
.
2 But p not available in closed form.
12 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Importance sampling perspective
1 A natural idea:
δ∗ ≃
∑MN
i=1
h(zi )
p(zi )∑MN
i=1
1
p(zi )
=
∑MN
i=1
π(zi )
π(zi )h(zi )
∑MN
i=1
π(zi )
π(zi )
.
2 But p not available in closed form.
3 The geometric ni is the replacement obvious solution that is used inthe original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ).
12 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Importance sampling perspective
1 A natural idea:
δ∗ ≃
∑MN
i=1
h(zi )
p(zi )∑MN
i=1
1
p(zi )
=
∑MN
i=1
π(zi )
π(zi )h(zi )
∑MN
i=1
π(zi )
π(zi )
.
2 But p not available in closed form.
3 The geometric ni is the replacement obvious solution that is used inthe original Metropolis–Hastings estimate since E[ni ] = 1/p(zi ).
12 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
The crude estimate of 1/p(zi ),
ni = 1 +∞∑
j=1
∏
ℓ≤j
I {uℓ ≥ α(zi , yℓ)} ,
can be improved:
Lemma
If (yj)j is an iid sequence with distribution q(y |zi ), the quantity
ξi = 1 +
∞∑
j=1
∏
ℓ≤j
{1 − α(zi , yℓ)}
is an unbiased estimator of 1/p(zi ) which variance, conditional on zi , is
lower than the conditional variance of ni , {1 − p(zi )}/p2(zi ).
13 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Formal importance sampling
Rao-Blackwellised, for sure?
ξi = 1 +
∞∑
j=1
∏
ℓ≤j
{1 − α(zi , yℓ)}
1 Infinite sum but finite with at least positive probability:
α(x (t), yt) = min
{
1,π(yt)
π(x (t))
q(x (t)|yt)
q(yt |x (t))
}
For example: take a symetric random walk as a proposal.
2 What if we wish to be sure that the sum is finite?
14 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
Variance improvement
Proposition
If (yj)j is an iid sequence with distribution q(y |zi ) and (uj)j is an iiduniform sequence, for any k ≥ 0, the quantity
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1 − α(zi , yj)}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
is an unbiased estimator of 1/p(zi ) with an almost sure finite number ofterms.
15 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
Variance improvement
Proposition
If (yj)j is an iid sequence with distribution q(y |zi ) and (uj)j is an iiduniform sequence, for any k ≥ 0, the quantity
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1 − α(zi , yj)}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms. Moreover, for k ≥ 1,
V
h
ξki
˛
˛
˛zi
i
=1 − p(zi )
p2(zi )−
1 − (1 − 2p(zi ) + r(zi ))k
2p(zi ) − r(zi )
„
2 − p(zi )
p2(zi )
«
(p(zi ) − r(zi )) ,
where p(zi ) :=R
α(zi , y) q(y |zi ) dy . and r(zi ) :=R
α2(zi , y) q(y |zi ) dy .
15 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
Variance improvement
Proposition
If (yj)j is an iid sequence with distribution q(y |zi ) and (uj)j is an iiduniform sequence, for any k ≥ 0, the quantity
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1 − α(zi , yj)}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
is an unbiased estimator of 1/p(zi ) with an almost sure finite number of
terms. Therefore, we have
V
[
ξi
∣∣∣ zi
]
≤ V
[
ξki
∣∣∣ zi
]
≤ V
[
ξ0i
∣∣∣ zi
]
= V [ni | zi ] .
15 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
zi−1
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1 − α(zi , yj)}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
16 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
zi−1 zi
ξki−1
not indep
not indep
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1 − α(zi , yj)}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
16 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
zi−1 zi zi+1
ξki−1 ξk
i
not indep
not indep
not indep
not indep
ξki = 1 +
∞∑
j=1
∏
1≤ℓ≤k∧j
{1 − α(zi , yj)}∏
k+1≤ℓ≤j
I {uℓ ≥ α(zi , yℓ)}
16 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
zi−1 zi zi+1
ξki−1 ξk
i
not indep
not indep
not indep
not indep
δkM =
∑M
i=1 ξki h(zi )
∑M
i=1 ξki
.
16 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Variance reduction
zi−1 zi zi+1
ξki−1 ξk
i
not indep
not indep
not indep
not indep
δkM =
∑M
i=1 ξki h(zi )
∑M
i=1 ξki
.
16 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Let
δkM =
∑M
i=1 ξki h(zi )
∑M
i=1 ξki
.
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ <∞}.
17 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Let
δkM =
∑M
i=1 ξki h(zi )
∑M
i=1 ξki
.
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ <∞}. Assumethat there exists a positive function ϕ ≥ 1 such that
∀h ∈ Cϕ,
PMi=1 h(zi )/p(zi )
PMi=1 1/p(zi )
P−→ π(h)
Theorem
Under the assumption that π(p) > 0, the following convergence property holds:
i) If h is in Cϕ, then
δkM
P−→M→∞
π(h) (◮Consistency)
17 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Let
δkM =
∑M
i=1 ξki h(zi )
∑M
i=1 ξki
.
For any positive function ϕ, we denote Cϕ = {h; |h/ϕ|∞ <∞}.Assume that there exists a positive function ψ such that
∀h ∈ Cψ,√
M
(∑M
i=1 h(zi )/p(zi )∑M
i=1 1/p(zi )− π(h)
)
L−→ N (0, Γ(h))
Theorem
Under the assumption that π(p) > 0, the following convergence propertyholds:
ii) If, in addition, h2/p ∈ Cϕ and h ∈ Cψ, then
√M(δk
M − π(h))L−→M→∞ N (0,Vk [h − π(h)]) , (◮Clt)
where Vk(h) := π(p)∫π(dz)V
[
ξki
∣∣∣ z
]
h2(z)p(z) + Γ(h) .17 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
We will need some additional assumptions. Assume a maximal inequalityfor the Markov chain (zi )i : there exists a measurable function ζ such thatfor any starting point x ,
∀h ∈ Cζ , Px
∣∣∣∣∣∣
sup0≤i≤N
i∑
j=0
[h(zi ) − π(h)]
∣∣∣∣∣∣
> ǫ
≤ NCh(x)
ǫ2
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑N
t=1 h(x (t))
N− π(h)
)
L−→N→∞ N (0,V0[h − π(h)]) ,
where MN is defined by18 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
We will need some additional assumptions. Assume a maximal inequalityfor the Markov chain (zi )i : there exists a measurable function ζ such thatfor any starting point x ,
∀h ∈ Cζ , Px
∣∣∣∣∣∣
sup0≤i≤N
i∑
j=0
[h(zi ) − π(h)]
∣∣∣∣∣∣
> ǫ
≤ NCh(x)
ǫ2
Moreover, assume that ∃φ ≥ 1 such that for any starting point x ,
∀h ∈ Cφ, Qn(x , h)P−→ π(h) = π(ph)/π(p) ,
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑N
t=1 h(x (t)) − π(h)
)
L−→N→∞ N (0,V0[h − π(h)]) ,18 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
We will need some additional assumptions. Assume a maximal inequalityfor the Markov chain (zi )i : there exists a measurable function ζ such thatfor any starting point x ,
∀h ∈ Cζ , Px
∣∣∣∣∣∣
sup0≤i≤N
i∑
j=0
[h(zi ) − π(h)]
∣∣∣∣∣∣
> ǫ
≤ NCh(x)
ǫ2
Moreover, assume that ∃φ ≥ 1 such that for any starting point x ,
∀h ∈ Cφ, Qn(x , h)P−→ π(h) = π(ph)/π(p) ,
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑N
t=1 h(x (t)) − π(h)
)
L−→N→∞ N (0,V0[h − π(h)]) ,18 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
∀h ∈ Cζ , Px
∣∣∣∣∣∣
sup0≤i≤N
i∑
j=0
[h(zi ) − π(h)]
∣∣∣∣∣∣
> ǫ
≤ NCh(x)
ǫ2
∀h ∈ Cφ, Qn(x , h)P−→ π(h) = π(ph)/π(p) ,
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑N
t=1 h(x (t))
N− π(h)
)
L−→N→∞ N (0,V0[h − π(h)]) ,
where MN is defined by
M∑ M +1∑
18 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑N
t=1 h(x (t))
N− π(h)
)
L−→N→∞ N (0,V0[h − π(h)]) ,
where MN is defined by
MN∑
i=1
ξ0i ≤ N <
MN+1∑
i=1
ξ0i .
18 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Asymptotic results
Theorem
Assume that h is such that h/p ∈ Cζ and {Ch/p, h2/p2} ⊂ Cφ. Assume
moreover that
√M(δ0M − π(h)
) L−→ N (0,V0[h − π(h)]) .
Then, for any starting point x,
√
MN
(∑N
t=1 h(x (t))
N− π(h)
)
L−→N→∞ N (0,V0[h − π(h)]) ,
where MN is defined by
MN∑
i=1
ξ0i ≤ N <
MN+1∑
i=1
ξ0i .
18 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Illustrations
Variance gain (1)
h(x) x x2IX>0 p(x)
τ = .1 0.971 0.953 0.957 0.207τ = 2 0.965 0.942 0.875 0.861τ = 5 0.913 0.982 0.785 0.826τ = 7 0.899 0.982 0.768 0.820
Ratios of the empirical variances of δ∞ and δ estimating E[h(X )]: 100MCMC iterations over 103 replications of a random walk Gaussianproposal with scale τ .
19 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Illustrations
Illustration (1)
Figure: Overlay of the variations of 250 iid realisations of the estimatesδ (gold) and δ∞ (grey) of E[X ] = 0 for 1000 iterations, along with the90% interquantile range for the estimates δ (brown) and δ∞ (pink), inthe setting of a random walk Gaussian proposal with scale τ = 10.
20 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Illustrations
Extra computational effort
median mean q.8 q.9 timeτ = .25 0.0 8.85 4.9 13 4.2τ = .50 0.0 6.76 4 11 2.25τ = 1.0 0.25 6.15 4 10 2.5τ = 2.0 0.20 5.90 3.5 8.5 4.5
Additional computing effort due: median and mean numbers of additionaliterations, 80% and 90% quantiles for the additional iterations, and ratioof the average R computing times obtained over 105 simulations
21 / 22
Metropolis Hastings revisited Rao–Blackwellisation
Illustrations
Illustration (2)
Figure: Overlay of the variations of 500 iid realisations of the estimatesδ (deep grey), δ∞ (medium grey) and of the importance sampling version(light grey) of E[X ] = 10 when X ∼ Exp(.1) for 100 iterations, alongwith the 90% interquantile ranges (same colour code), in the setting ofan independent exponential proposal with scale µ = 0.02. 22 / 22