If you can't read please download the document
Upload
dangdan
View
223
Download
1
Embed Size (px)
Citation preview
21 IIIHP, 2008 5
III
10
1
1
257
1 1
1.1 . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 . . . . . 3
2 4
2.1 . . . . . . . . . . . . . . . . . . . . 4
2.2 . . . . . . . . . . . . . . . . . . . . . . 6
2.3 . . . . . . . . . . . . . . . . . . . . . . . . 8
3 9
3.1 . . . . . . . . 9
3.2 MH . . . . . . . . . . . . . . . . . . . . 12
3.3 MH . . . . . . . . . . . . . . . . 12
4 14
4.1 . . . . . . . . . . . . . . 14
4.2 MH . . 15
4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 17
5 18
5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
6 21
6.1 . . . . . . . . . . . . . . . . . . 22
6.2 . . . . . . . . . . . . . . . 24
258
1
Markov chain Monte Carlo methodMCMC
MCMC
1
1.1
x (x)1x h(x)
(1) I = E [h(x)] =X
h(x)(x)dx
x X RdE[] (x)
(x)
(x)
2(x) (x(1), . . . ,x(n))(1)
(2) IM =1n
ni=1
h(x(i))
1 2sampling
Devroye(1986)Ripley (1987)Gentle (2003)
259
n IM I nI
IM
Monte Carlo integration
(x)
(x)
q(x)3
(1)
I =X
h(x)(x)q(x)
q(x)dx = Eq
[h(x)
(x)q(x)
](x(1), . . . ,x(n)) q(x)I
(3) IIS =1n
ni=1
h(x(i))w(x(i))
w(x(i)) = (x(i))/q(x(i))x(i)
(3)
importance sampling
q(x)n
IIS I
q(x)q(x) |h(x)|(x)
Rubinstein (1981)Geweke (1989)
Evans and Swartz (2000)4
(x)
(x) = p(x)/X p(x)dx p(x)
5(1)
I =X
h(x)p(x)
X p(x)dxdx =
X h(x)p(x)dx
X p(x)dx
3q(x)support (x)4 5 1 p(x)
p(x)/R
X p(x)dxR
X p(x)dx
x
260
q(x)
(4) I =
X
h(x)p(x)q(x)
q(x)dxX
p(x)q(x)
q(x)dx=
Eq
[h(x)
p(x)q(x)
]Eq
[p(x)q(x)
](4)
IIS =1n
ni=1 h(x
(i))w(x(i))1n
ni=1 w(x(i))
=n
i=1
h(x(i))w(x(i))
w(x(i)) = p(x(i))/q(x(i))w(x(i)) = w(x(i))/n
j=1 w(x(j))
n
i=1 w(x(i)) = 1w(x(i))
IM IIS IIISn
IIS I Geweke (1989)
IIS IISRobert
and Casella (2004)
1.2
1
x (x)
(x) q(x)
x
1990
MCMCMCMCMH
261
2MCMC
6
3MH 4
5
6MCMC
2
MCMC
Markov chain
7
2.1
t x(t) (x(0),x(1), . . .)
x(t) (t = 0, 1, . . .)X
state space
x(t) 1X = {1, . . . , k}
(x(0),x(1), . . .) i, j X t 0
Pr(x(t+1) = j|x(0) = i0,x(1) = i1, . . . ,x(t1) = it1,x(t) = i
)= Pr(x(t+1) = j|x(t) = i)(5)
(x(0),x(1), . . .)ik X
kx(t+1)6MCMCLiu (2001)Robert and Casella (2004)Gamer-
man and Lopes (2006) (2001) (2003) (2005) (2005)
7Karlin and Taylo (1975) Ross (1995)
262
t {x(0) = i0,x(1) = i1, . . . ,x(t1) = it1,x(t) = i}
t + 1
t
Markov property
(5) p(i, j) = Pr(x(t+1) = j|x(t) = i)
transition probabilityp(i, j) (i, j)
k k
T =
p(1, 1) p(1, 2) p(1, k)
p(2, 1) p(2, 2) p(2, k)...
.... . .
...
p(k, 1) p(k, 2) p(k, k)
transition matrix
p(i, j)
i j
p(i, j) 0k
j=1 p(i, j) = 1
2.1. X = {1, 2, 3}
T =
1/2 1/3 1/6
3/4 0 1/4
0 1 0
1
1/2 1/3 2 1/6
3 2 1 3
3 2
x(0) (0)
(0) = ((0)1 , (0)2 , . . . ,
(0)k ) =
(Pr(x(0) = 1), Pr(x(0) = 2), . . . , Pr(x(0) = k)
)(1), (2), . . .x(1),x(2), . . .
(t) = ((t)1 , (t)2 , . . . ,
(t)k ) =
(Pr(x(t) = 1), Pr(x(t) = 2), . . . , Pr(x(t) = k)
)
263
x(1) (1)
(1)j = Pr(x
(1) = j) =k
i=1
Pr(x(0) = i,x(1) = j)
=k
i=1
Pr(x(0) = i) Pr(x(1) = j|x(0) = i) =k
i=1
(0)i p(i, j)
(1) = (0)T x(2) (2) =
(1)T = (0)T2 (t) = (0)Tt
2.2
(0)
T
(1) x(0) (0)
(2) t = 0, 1, . . .x(t+1) (T)x(t)
(T)x(t)T x(t)
(x(0),x(1), . . .) x(t) (t)
MCMC(t)
(1)
irreducibility(2) aperiodicity(3) invariant
distribution
T
i, j X (Tn)ij > 0 n
(Tn)ij Tn (i, j)
264
2.2. 2.1
T =
0.6 0.4 0
0.3 0.7 0
0.2 0.2 0.6
3
1 2 3
i X {n 1 : (Tn)ii > 0}
iperiod
1
2.3.
T =
0 0.5 0 0.5
0.5 0 0.5 0
0 0.5 0 0.5
0.5 0 0.5 0
{n 1 : (Tn)ii > 0} = {2, 4, 6, . . .} (i =
1, . . . , 4) 2
T
= (1, . . . , k)
(1) i 0 (i X )k
i=1 i = 1 (2) = T
T
2.4. 2.1 = (1/2, 1/3, 1/6)3
i=1 i = 1
= T T
(t)
Haggstrom (2002)
265
(). (x(0),x(1), . . .)
TT
(0)t 12k
i=1 |(t)i i| 0
8
2.3
(x(0),x(1), . . .)
m(x(m+1),x(m+2), . . .)
detailed balance condition
(6) ip(i, j) = jp(j, i) (i, j X )
reversible
9MCMC
8
9(6) i Pk
i=1 ip(i, j) =Pk
i=1 jp(j, i) = j
266
10
Pr(x(t+1) A|x(t) = x) =
AT (x,y)dy (x X , A X )
T (x,y) T (x,y)
transition kernelT (x,y)
(y) =X
(x)T (x,y)dx
(x)
(x)T (x,y) = (y)T (y,x)
3
MCMC
MetropolisHastingsMH
Metropolis et al. (1953) Hastings (1970)
11MH
3.1
(x) (x)
target distributionMH
proposal distribution
12
q(y|x)10Nummelin (1984) Meyn and
Tweedie (1993)11Dongarra and Sullivan (2000) 20 10
1MH12candidate generating distribution
267
MH
acceptance probability
(x)
MH
(1) x(0)
(2) t = 0, 1, . . .
(i) y q(y|x(t))
(ii) u U(0, 1)13
x(t+1) =
y u (x(t),y)
x(t)
(x,y) = min{
1, (y)q(x|y)(x)q(y|x)}
(x,y)
MH
x(t+1) = x(t)
MH
MH q(y|x)
MH
3.1. (x)x(t) =
x
y = x + , N (0, 2I)
14random walk chain
q(y|x) = q(x|y) (x,y) =
13U(a, b) (a, b)14N (,)
268
min {1, (y)/(x)} Metropolis et al. (1953)
q(y|x) = q(x|y)MH
U(, )
t T(0, 2I)15
step size
1
16
3.2.
y = x +2
2 log (x)
x+ , N (0, 2I)
Langevin chainRoberts and Rosenthal (1998)Christensen et al. (2001)
Christensen and Waagepetersen (2002) x
log (x)/x 0
x
log (x)/x = 0
3.3. y x
q(y|x) = q(y)independent chain
(x,y) = min{
1, (y)/q(y)(x)/q(x)}
(x)/q(x)y x
y y
q(y) (x)
15T(,) t1 t t(, 2)
16Roberts et al. (1997)Roberts and Rosenthal(2001)
269
(x)
{ log (x)/xx}1 t
Chib and Greenberg (1995)
3.2 MH
MH (x(0),x(1), . . .)
MH
x y
(7) T (x,y) = q(y|x)(x,y) + r(x)x(y)
2
r(x) =X q(y|x) {1 (x,y)} dyx(y) = I(x = y)
17
(7) q(y|x)(x,y)(x) = q(x|y)(y,x)(y)
r(x)x(y)(x) = r(y)y(x)(y)
(x)T (x,y) = q(y|x)(x,y)(x) + r(x)x(y)(x)
= q(x|y)(y,x)(y) + r(y)y(x)(y) = (y)T (y,x)
MH
(x)
Roberts and Smith (1994)Tierney (1994)MH
MH (x(0),x(1), . . .)
m (x(m+1),x(m+2), . . .) (x)
m
burn-in period
3.3 MH
MH
MH17I()
270
(x)2MH
T1(x,y)T2(x,y)
w T1(x,y)MH
1w T2(x,y)MH
w
1 w
(8) T (x,y) = wT1(x,y) + (1 w)T2(x,y)
mixture of tran-
sition kernelsTi(x,y) (i = 1, 2)
(x)(8) T (x,y) (x)
MH
MH
MH
T1(x,x)MH
x x T2(x,y)
MH x yx
y
(9) T (x,y) =X
T1(x,x)T2(x,y)dx
(9)cycle of transition kernels
(x)X
(x)T (x,y)dx =X
X
(x)T1(x,x)T2(x,y)dxdx
=X
(x)T2(x,y)dx = (y)
(x) T (x,y)18182 MH
Tierney (1994)
271
MCMC
4
MH
Gibbs samplingGeman and Geman
(1984)Gelfand
and Smith (1990)
4.1
x k x = (x1, . . . ,xk)
xi (xi|xi)19
xi = (x1, . . . ,xi1,xi+1, . . . ,xk) (xi|xi)
(full conditional distribution)
(1) x(0) = (x(0)1 , . . . ,x(0)k )
(2) t = 0, 1, . . .
(i) x(t+1)1 (x1|x(t)2 , . . . ,x
(t)k )
(ii) x(t+1)2 (x2|x(t+1)1 ,x
(t)3 , . . . ,x
(t)k )
...
(k) x(t+1)k (xk|x(t+1)1 , . . . ,x
(t+1)k1 )
MH
(xi|xi)
xi
xiLiu et al. (1995)19 5
272
4.1. (x1, x2) = nCx1xx1+12 (1 x2)nx1+1
x1 {0, 1, . . . , n}x2 [0, 1]x1 x2
(x1|x2) nCx1xx12 (1 x2)
nx1 , (x2|x1) xx1+12 (1 x2)nx1+1
x1|x2 Bi(n, x2)x2|x1
Be(x1 + , n x1 + )20
4.2 MH
k
k = 2
MH
MHmultipleblock MH
algorithm
x = (x1,x2) y = (y1,y2)(x1,x2) (y1,x2)
(y1,y2)MH(x1,x2)
(y1,x2) (x1|x2)MH
q1(y1|x1,x2)
1(x1,y1) = min{
1,(y1|x2)q1(x1|y1,x2)(x1|x2)q1(y1|x1,x2)
}MH T1(x1,y1|x2)
(y1,x2) (y1,y2)
(x2|x1) q2(y2|y1,x2)
2(x2,y2) = min{
1,(y2|y1)q2(x2|y1,y2)(x2|y1)q2(y2|y1,x2)
}MHT2(x2,y2|y1)
20Bi(n, p) (x) px(1 p)nx Be(a, b) (x) xa1(1 x)b1
273
MHx y
T (x,y) = T1(x1,y1|x2)T2(x2,y2|y1)
T1(x1,y1|x2) (x1|x2)X
(x)T (x,y)dx =
(x1,x2)T1(x1,y1|x2)T2(x2,y2|y1)dx1dx2
= [
(x1|x2)T1(x1,y1|x2)dx1]
(x2)T2(x2,y2|y1)dx2
=
(y1|x2)(x2)T2(x2,y2|y1)dx2
(y1|x2) =
(y1)(x2|y1)/(x2) X
(x)T (x,y)dx =
(y1)(x2|y1)T2(x2,y2|y1)dx2
= (y1)(y2|y1) = (y)
MH
(x)
MH
MH
q1(y1|x1,x2) = (y1|x2)
1(x1,y1) = min{
1,(y1|x2)(x1|x2)(x1|x2)(y1|x2)
}= 1
q2(y2|y1,x2) = (y2|y1) 2(x2,y2) = 1
1
MH2121 (x)
274
(x)22
MH
MH
MHMetropolis within
GibbsMuller (1991)
4.3
(x) z Z
(x, z) (x, z)
Z (x, z)dz = (x)
MCMC x (x)
data augmentation methodTanner and
Wong (1987)
Chib (1992)Albert and
Chib (1993)
(x|z) (z|x)
x
zmissing data(z|x)
Rubin (1987)imputation
xx
EMDempster et al. (1970)
(z|x)E(x|z)
MLiu (2002)
x z (x) p(x)
l(x)(x) p(x)l(x)
22Chan (1993) Roberts and Smith (1994)
275
z(x, z)
(10) (x, z) I[z < l(x)]p(x)
x
slice sampling
(10)(z|x) U(0, l(x))z
(x|z) {x : l(x) > z}
p(x)
Damien et al. (1999)
Besag and Green (1993)Higdon (1998)Damien and Walker (2001)
Neal (2003)
5
5.1
MCMC
(x(0),x(1), . . .)
MCMC
Cowles and Carlin (1996)Mengersen et al. (1999)
Heidelberger and Welch (1983)
Gelman and Rubin (1992)Geweke (1992)Raftery and Lewis (1992)
276
23
MCMC
multiple chain1 1
single chain
5.2
MCMC
(x(0),x(1), . . . ,x(m+n))E[h(x)]
(11) I =1n
ni=1
h(x(m+i))
m
n IE[h(x)]Tierney
(1994)I
Var(I) =2
n
1 + 2n1j=1
(1 j
n
)j
2n1 + 2
j=1
j
2 = Var[h(x)]j h(x(t))
h(x(t+j))
j > 0MCMC
23R CODA
277
MCMC 1
j
(11)
MCMC
1 + 2
j=1 j
inefficiency factor24
I 2/n
5.3
mixingMH
MH
1
Liu (1994)
x = (x1,x2,x3)(x1,x2)
(x1,x2)
(x1|x2) (x2|x1)x3 (x3|x1,x2)
24L 1+2
PLj=1 j
278
Gelfand et al. (1995)
Roberts and Sahu (1997)
EM
Liu and Wu (1999)Meng and van Dyk (1999)van Dyk and Meng
(2001)EMMCMC
Neal (1996)
Liu and Sabatti (2000)
Liu (2003)
parallel temperingGeyer (1991) simulated tempring
Geyer and Thompson (1995)
multiple try Liu et al. (2000)multiple point
Qin and Liu (2001)
xxxMCMC
Moral et al. (2006)
6
MCMC
279
25
MCMC
6.1
Breslow and Clayton (1993)
21
26 i
ni yiBreslow and Clayton (1993)
(12) yi Bi(ni, pi), logpi
1 pi= xi + bi, bi N (0, 2)
xi bi 2
N (0,B0)2 IG (n0/2, s0/2) 27
2bi (i = 1, . . . 21)
2
2|, {bi} IG
(21 + n0
2,
21i=1 b
2i + s0
2
)
(|2, {bi}) 21i=1
pyii (1 pi)niyi exp
{1
2( 0)B10 ( 0)
}MH
(13) yi = xi + bi + i, i N (0, 2i )25 Berger (1980) 26Crowder (1978)27IG(a, b) (x) (1/x)a+1 exp(b/x)
280
McCullagh and Nelder (1989)yi = xi + bi +
(yi nipi)/{nipi(1 pi)}2i = 1/{nipi(1 pi)}MH
(13)
N (, V)
V1 =21i=1
xixi2i
+ B10 , = V
{21i=1
xi(yi bi)2i
+ B10 0
} t
t T(, V)
bi
(bi|, 2) pyii (1 pi)niyi exp
( b
2i
22
) t(bi, v2i )MH
v2i = 22i /(2 + 2i )
bi = v2i (yi xi)/2i
Breslow and Clayton (1993)
(12) 1
0 = 0B0 = 100In0 = 1s0 = 0.01
= 2028 1
29 90%
1
Breslow and Clayton (1993)
30
(12)
yi Bi(ni, pi), logpi
1 pi= i, i N (xi, 2)
28 2930 5000
10000
281
MH
0 2500 5000 7500 10000 12500 15000
1.0
0.5
0.0 0
0 5 10 15 20 25
0.5
1.0ACF0
0 2500 5000 7500 10000 12500 15000
1
0
1 1
0 5 10 15 20 25
0.5
1.0ACF1
0 2500 5000 7500 10000 12500 15000
1
22
0 5 10 15 20 25
0.5
1.0ACF2
0 2500 5000 7500 10000 12500 15000
1
13
0 5 10 15 20 25
0.5
1.0ACF3
0 2500 5000 7500 10000 12500 15000
1
0
10
0 5 10 15 20 25
0.5
1.0ACF0
0 2500 5000 7500 10000 12500 15000
1
0
1 1
0 5 10 15 20 25
0.5
1.0ACF1
0 2500 5000 7500 10000 12500 15000
1
2
32
0 5 10 15 20 25
0.5
1.0ACF2
0 2500 5000 7500 10000 12500 15000
2
1
0
13
0 5 10 15 20 25
0.5
1.0ACF3
1:
1:
-0.546 0.177 -0.542 0.190 0.091 0.293 0.146 0.308 1.333 0.250 1.339 0.270 -0.803 0.399 -0.825 0.430 0.242 0.125 0.313 0.121
MH|2, {i}
N (, V)31V1 =21
i=1 xixi/
2+B10
= V(21
i=1 xii/2 +B10 0) 5
1
6.2
Leroux and Puterman (1992)
5
t yt
31i 2
282
yt Po(st) (t = 1, . . . , T )
Po(st) st st {1, 2}
Leroux and Puterman (1992)
T =
p11 p12p21 p22
32T 0 = (01, 02)
s1 0st
({i}, {pii}, {st})
i
Ga(ai, bi)
i|{pii}, {st} Ga
ai + tSi
yt, bi + ni
(i = 1, 2)33Si = {t : st = i}ni Si
pii Be(i, i)pii
pii|{i}, {st} Be(i + nii, i + nij) (i = 1, 2)
nij i j st
st
(st = i|{i}, {pii}, {st}t =t)
f(yt|i)0ipi,st+1 t = 1
f(yt|i)pst1,ipi,st+1 1 < t < T
f(yt|i)pst1,i t = T
f(y|)
st 132Leroux and Puterman (1992){st}33 Ga(a, b)(x) xa1 exp(bx)
283
Chib (1996) {st}
2
2{st} 1
i
{st}
1
0 2000 4000 6000 8000 10000
0.1
0.2
0.3
0.41
0 10 20 30 40 50
0.25
0.50
0.75
1.00ACF1
0 2000 4000 6000 8000 10000
2
4
6 2
0 10 20 30 40 50
0.25
0.50
0.75
1.00ACF2
0 2000 4000 6000 8000 10000
0.1
0.2
0.3
0.4 1
0 10 20 30 40 50
0.25
0.50
0.75
1.00ACF1
0 2000 4000 6000 8000 10000
2
4
62
0 10 20 30 40 50
0.25
0.50
0.75
1.00ACF2
2:
2
IF34
2: 1
IF IF1 0.220 0.050 14.346 0.220 0.048 19.289
2 2.282 0.769 13.445 2.273 0.744 24.257
p11 0.968 0.024 16.195 0.968 0.023 20.295
p22 0.672 0.154 2.6689 0.671 0.148 4.7609
34Chib (1996) 5000 10000 50
284
[1] (2005). , II, 3106,
[2] (2001). , 31, 305344.
[3] (2003).MCMC
[4] (2005)
[5] Albert, J. and Chib, S. (1993). Bayesian analysis of binary and polychoto-
mous response data, Journal of the American Statistical Association 88,
669679.
[6] Berger, J.O. (1980). Statistical Decision Theory and Bayesian Analysis (2nd
ed.). Springer, New York.
[7] Besag, J. and Green, P. (1993). Spatial statistics and Bayesian computa-
tion, Journal of the Royal Statistics Society B55, 2537.
[8] Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in gener-
alized linear mixed models, Journal of the American Statistical Association
88, 925.
[9] Chan, K.S. (1993). Asymptotic behavior of the Gibbs sampler, Journal of
the American Statistical Association 88, 320326.
[10] Chib, S. (1992). Bayes regression for the tobit censored regression model,
Journal of Econometrics 51, 7999.
[11] Chib, S. (1996). Calculating posterior distributions and modal estimates in
Markov mixture models, Journal of Econometrics 75, 7997.
[12] Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings
algorithm, American Statistician 49, 327335.
[13] Christensen, O.F., Mller, J., and Waagepetersen, R.P. (2001). Geometric
ergodicity of Metropolis-Hastings algorithms for conditional simulation in
generalized linear mixed models, Methodology and Computing in Applied
Probability 3, 309327.
[14] Christensen, O.F. and Waagepetersen, R. (2002). Bayesian prediction of
spatial count data using generalized linear mixed models, Biometrics 58,
280286.
[15] Cowles, M.K. and Carlin, B.P. (1996). Markov chain Monte Carlo conver-
gence diagnostics: A comparative review, Journal of the American Statis-
tical Association 91, 883904.
285
[16] Crowder, M.J. (1978). Beta-Binomial ANOVA for proportions, Applied
Statistics 27, 3437.
[17] Damien, P., Wakefield, J., and Walker, S.G. (1999). Gibbs sampling for
Bayesian non-conjugate and hierarchical models by using auxiliary vari-
ables, Journal of the Royal Statistics Society B61, 331344.
[18] Damien, P. and Walker, S.G. (2001). Sampling truncated normal, beta,
and gamma densities, Journal of Computational and Graphical Statistics
10, 206215.
[19] Dempster, A. P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood
from incomplete data via the EM algorithm, Journal of the Royal Statistics
Society B39, 138.
[20] Devroye, L. (1896). Non-Uniform Random Variate Generation. Springer
Verlag, New York.
[21] Dongarra, J. and Sullivan, F. (2000). Guest editors introduction: The top
10 algorithms, Computing in Science and Engineering 2, 2223.
[22] Evans, M. and Swartz, T. (2000). Approximating Integrals Via Monte Carlo
and Deterministic Methods. Oxford University Press, Oxford.
[23] Gamerman, D. and Lopes, H. (2006). Markov Chain Monte Carlo: Stochas-
tic Simulation for Bayesian Inference (2nd ed.). Chapman & Hall/CRC,
London.
[24] Gelfand, A.E., Sahu, S.K., and Carlin, B.P. (1995). Efficient parametrisa-
tions for normal linear mixed models, Biometrika 82, 479488.
[25] Gelfand, A.E. and Smith, A.F.M. (1990). Sampling-based approaches to
calculating marginal densities, Journal of the American Statistical Associ-
ation 85, 398409.
[26] Gelman, A. and Rubin, D.B. (1992). Inference from iterative simulation
using multiple sequences (with discussion), statistical Science 7, 457511.
[27] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribu-
tions, and the Bayesian restoration of images, IEEE Transactions on Pat-
tern Analysis and Machine Intelligence 6, 721741.
[28] Gentle, J.E. (2003). Random Number Generation and Monte Carlo Methods.
Springer, New York.
[29] Geweke, J. (1989). Bayesian inference in econometric models using Monte
Carlo integration, Econometrica 57, 13171340.
[30] Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches
to the calculation of posterior moments, in J. M. Bernardo et al.(eds.),
Bayesian Statistics 4, 169193, Oxford University Press, Oxford.
[31] Geyer, C.J. (1991). Markov chain Monte Carlo maximum likelihood, in
286
E.Keramigas (eds.), Computing Science and Statistics: The 23rd symposium
on the inference, Interface Foundation, Fairfax, 156163.
[32] Geyer, C.J. and Thompson, E. (1995). Annealing Markov chain Monte
Carlo with applications to ancestral inference, Journal of the American
Statistical Association 90, 909920.
[33] Hastings, W.K. (1970). Monte Carlo sampling methods using Markov
chains and their applications, Biometrika 57, 97109.
[34] Haggstrorm, O. (2002). Finite Markov Chains and Algorithmic Applications.
Cambridge University Press, Cambridge.
[35] Heidelberger, P. and Welch, P.D. (1983). Simulation run length control in
the presence of an initial transient, Operations Research 31, 11091144.
[36] Higdon, D. (1998). Auxiliary variable methods for Markov chain Monte
Carlo with applications, Journal of the American Statistical Association
93, 398409.
[37] Karlin, S. and Taylor, J. (1975). A First Course in Stochastic Processes (2nd
ed.). Academic Press, New York.
[38] Leroux, B.G. and Puterman, M.L. (1992). Maximum-penalized likelihood
estimation for independent and Markov-dependent mixture models, Bio-
metrics 48, 545558.
[39] Liu, C. (2002). An example of algorithm mining: Covariance adjustment to
accelerate EM and Gibbs, in J. Huang and H. Zhang (eds.), Development of
Modern Statistics and Related Topics, 7488, World Scientific, New Jersey.
[40] Liu, C. (2003). Alternating subspace-spanning resampling to accelerate
Markov chain Monte Carlo simulation, Journal of the American Statistical
Association 98, 110117.
[41] Liu, J.S. (1994). The collapsed Gibbs sampler in Bayesian computations
with applications to a gene regulation problem, Journal of the American
Statistical Association 89, 958966.
[42] Liu, J.S. (2001). Monte Carlo Strategies in Scientific Computing. Springer,
New York.
[43] Liu, J.S., Liang, F., and Wong, W.H. (2000). The use of multiple-try
method and local optimization in Metropolis sampling, Journal of the
American Statistical Association 95, 121134.
[44] Liu, J.S. and Sabatti, C. (2000). Generalized Gibbs sampler and multigrid
Monte Carlo for Bayesian computation, Biometrika 87, 353369.
[45] Liu, J.S., Wong, W.H., and Kong, A. (1995). Covariance structure and
convergence rate of the Gibbs sampler with various scans, Journal of the
Royal Statistical Society B57, 157169.
287
[46] Liu, J.S. and Wu, Y.N. (1999). Parameter expansion for data augmenta-
tion, Journal of the American Statistical Association 94, 12641274.
[47] McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models (2nd ed.).
Chapman & Hall, London.
[48] Meng, X.-L. and van Dyk, D.A. (1999). Seeking efficient data augmentation
schemes via conditional and marginal augmentation, Biometrika 86, 301
320.
[49] Mengersen, K.L., Robert, C.P., and Guihenneuc-Jouyaux, C. (1999).
MCMC convergence diagnostics: A review, in J. M. Bernardo et al. (eds.),
Bayesian Statistics 6, 415440, Clarendon Press, Oxford.
[50] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and
Teller, E. (1953). Equations of state calculations by fast computing ma-
chines, Journal of Chemical Physics 21, 10871091.
[51] Meyn, S.P. and Tweedie, R.L. (1993). Markov Chains and Stochastic Stabil-
ity. SpringerVerlag, London.
[52] Moral, P.D., Doucet, A., and Jasra, A. (2006). Sequential Monte carlo
samplers, Journal of the Royal Statistical Society B68, 411436.
[53] Muller, P. (1991). A generic approach to posterior integration and Gibbs
sampling, Technical report 91-09, Institute of Statistics and Decision Sci-
ences, Duke University.
[54] Neal, R.M. (1996). Bayesian Learning for Neural Networks. Lecture Notes
118, SpringerVerlag, New York.
[55] Neal, R.M. (2003). Slice sampling, Annals of Statistics 31, 705767.
[56] Nummelin, E. (1984). General Irreducible Markov Chains and Non-negative
Operators. Cambridge University Press, Cambridge.
[57] Qin, Z. and Liu, J.S. (2001). Multi-point Metropolis method with applica-
tion to hybrid Monte Carlo, Journal of Computational Physics 172, 827
840.
[58] Raftery, A.E. and Lewis, S. (1992). How many iterations in the Gibbs
sampler? in J. M. Bernardo et al.(eds.), Bayesian Statistics 4, 763773,
Oxford University Press, Oxford.
[59] Ripley, W. (1987). Stochastic Simulation. Wiley, New York.
[60] Robert, C.P. and Casella, G. (2004). Monte Carlo Statistical Methods (2nd
ed.). SpringerVerlag, New York.
[61] Roberts, G.O., Gelman, A., and Gilks, W.R. (1997). Weak convergence and
optimal scaling of random walk Metropolis algorithms, Annals of Applied
Probability 7, 110120.
[62] Roberts, G.O. and Rosenthal, J.S. (1998). Optimal scaling of discrete ap-
288
proximations to Langevin diffusions, Journal of the Royal Statistical Society
B60, 255268.
[63] Roberts, G.O. and Rosenthal, J.S. (2001). Optimal scaling for various
Metropolis-Hastings algorithms, Statistical Science 16, 351367.
[64] Roberts, G.O. and Sahu, S.K. (1997). Updating schemes, correlation struc-
ture, blocking and parameterization for the Gibbs sampler, Journal of the
Royal Statistical Society B56, 377384.
[65] Roberts, G.O. and Smith, A.F.M. (1994). Simple conditions for the conver-
gence of the Gibbs sampler and MetropolisHastings algorithms, Stochastic
Processes and Their Applications 49, 207216.
[66] Ross, S.M. (1995). Stochastic Processes (2nd ed.). Wiley, New York.
[67] Rubin, D.B. (1987) Multiple Imputation or Non-response in Surveys. Wiley,
New York.
[68] Rubinstein, R.Y. (1981). Simulation and the Monte Carlo Method. Wiley,
New York.
[69] Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior dis-
tributions by data augmentation, Journal of the American Statistical As-
sociation 82, 528549.
[70] Tierney, L. (1994). Markov chains for exploring posterior distributions
(with discussion), Annals of Statistics 22, 17011762.
[71] van Dyk, D.A. and Meng, X.-L. (2001). The art of data augmentation (with
discussions), Journal of Computational and Graphical Statistics 10, 1111.
289