21世紀の統計科学 - ebsa.ism.ac.jpebsa.ism.ac.jp/ebooks/sites/default/files/ebook/1881/pdf/vol3_ch10.pdf · 6 応用 例 21 6.1 ロジット ... な計算を行う方法である.この節では,マルコフ連鎖とその

  • Upload
    dangdan

  • View
    223

  • Download
    1

Embed Size (px)

Citation preview

  • 21 IIIHP, 2008 5

    III

    10

    1

    1

    257

  • 1 1

    1.1 . . . . . . . . . . . . . . . . . . . . . . . . 1

    1.2 . . . . . 3

    2 4

    2.1 . . . . . . . . . . . . . . . . . . . . 4

    2.2 . . . . . . . . . . . . . . . . . . . . . . 6

    2.3 . . . . . . . . . . . . . . . . . . . . . . . . 8

    3 9

    3.1 . . . . . . . . 9

    3.2 MH . . . . . . . . . . . . . . . . . . . . 12

    3.3 MH . . . . . . . . . . . . . . . . 12

    4 14

    4.1 . . . . . . . . . . . . . . 14

    4.2 MH . . 15

    4.3 . . . . . . . . . . . . . . . . . . . . . . . . . . 17

    5 18

    5.1 . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

    5.2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

    5.3 . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

    6 21

    6.1 . . . . . . . . . . . . . . . . . . 22

    6.2 . . . . . . . . . . . . . . . 24

    258

  • 1

    Markov chain Monte Carlo methodMCMC

    MCMC

    1

    1.1

    x (x)1x h(x)

    (1) I = E [h(x)] =X

    h(x)(x)dx

    x X RdE[] (x)

    (x)

    (x)

    2(x) (x(1), . . . ,x(n))(1)

    (2) IM =1n

    ni=1

    h(x(i))

    1 2sampling

    Devroye(1986)Ripley (1987)Gentle (2003)

    259

  • n IM I nI

    IM

    Monte Carlo integration

    (x)

    (x)

    q(x)3

    (1)

    I =X

    h(x)(x)q(x)

    q(x)dx = Eq

    [h(x)

    (x)q(x)

    ](x(1), . . . ,x(n)) q(x)I

    (3) IIS =1n

    ni=1

    h(x(i))w(x(i))

    w(x(i)) = (x(i))/q(x(i))x(i)

    (3)

    importance sampling

    q(x)n

    IIS I

    q(x)q(x) |h(x)|(x)

    Rubinstein (1981)Geweke (1989)

    Evans and Swartz (2000)4

    (x)

    (x) = p(x)/X p(x)dx p(x)

    5(1)

    I =X

    h(x)p(x)

    X p(x)dxdx =

    X h(x)p(x)dx

    X p(x)dx

    3q(x)support (x)4 5 1 p(x)

    p(x)/R

    X p(x)dxR

    X p(x)dx

    x

    260

  • q(x)

    (4) I =

    X

    h(x)p(x)q(x)

    q(x)dxX

    p(x)q(x)

    q(x)dx=

    Eq

    [h(x)

    p(x)q(x)

    ]Eq

    [p(x)q(x)

    ](4)

    IIS =1n

    ni=1 h(x

    (i))w(x(i))1n

    ni=1 w(x(i))

    =n

    i=1

    h(x(i))w(x(i))

    w(x(i)) = p(x(i))/q(x(i))w(x(i)) = w(x(i))/n

    j=1 w(x(j))

    n

    i=1 w(x(i)) = 1w(x(i))

    IM IIS IIISn

    IIS I Geweke (1989)

    IIS IISRobert

    and Casella (2004)

    1.2

    1

    x (x)

    (x) q(x)

    x

    1990

    MCMCMCMCMH

    261

  • 2MCMC

    6

    3MH 4

    5

    6MCMC

    2

    MCMC

    Markov chain

    7

    2.1

    t x(t) (x(0),x(1), . . .)

    x(t) (t = 0, 1, . . .)X

    state space

    x(t) 1X = {1, . . . , k}

    (x(0),x(1), . . .) i, j X t 0

    Pr(x(t+1) = j|x(0) = i0,x(1) = i1, . . . ,x(t1) = it1,x(t) = i

    )= Pr(x(t+1) = j|x(t) = i)(5)

    (x(0),x(1), . . .)ik X

    kx(t+1)6MCMCLiu (2001)Robert and Casella (2004)Gamer-

    man and Lopes (2006) (2001) (2003) (2005) (2005)

    7Karlin and Taylo (1975) Ross (1995)

    262

  • t {x(0) = i0,x(1) = i1, . . . ,x(t1) = it1,x(t) = i}

    t + 1

    t

    Markov property

    (5) p(i, j) = Pr(x(t+1) = j|x(t) = i)

    transition probabilityp(i, j) (i, j)

    k k

    T =

    p(1, 1) p(1, 2) p(1, k)

    p(2, 1) p(2, 2) p(2, k)...

    .... . .

    ...

    p(k, 1) p(k, 2) p(k, k)

    transition matrix

    p(i, j)

    i j

    p(i, j) 0k

    j=1 p(i, j) = 1

    2.1. X = {1, 2, 3}

    T =

    1/2 1/3 1/6

    3/4 0 1/4

    0 1 0

    1

    1/2 1/3 2 1/6

    3 2 1 3

    3 2

    x(0) (0)

    (0) = ((0)1 , (0)2 , . . . ,

    (0)k ) =

    (Pr(x(0) = 1), Pr(x(0) = 2), . . . , Pr(x(0) = k)

    )(1), (2), . . .x(1),x(2), . . .

    (t) = ((t)1 , (t)2 , . . . ,

    (t)k ) =

    (Pr(x(t) = 1), Pr(x(t) = 2), . . . , Pr(x(t) = k)

    )

    263

  • x(1) (1)

    (1)j = Pr(x

    (1) = j) =k

    i=1

    Pr(x(0) = i,x(1) = j)

    =k

    i=1

    Pr(x(0) = i) Pr(x(1) = j|x(0) = i) =k

    i=1

    (0)i p(i, j)

    (1) = (0)T x(2) (2) =

    (1)T = (0)T2 (t) = (0)Tt

    2.2

    (0)

    T

    (1) x(0) (0)

    (2) t = 0, 1, . . .x(t+1) (T)x(t)

    (T)x(t)T x(t)

    (x(0),x(1), . . .) x(t) (t)

    MCMC(t)

    (1)

    irreducibility(2) aperiodicity(3) invariant

    distribution

    T

    i, j X (Tn)ij > 0 n

    (Tn)ij Tn (i, j)

    264

  • 2.2. 2.1

    T =

    0.6 0.4 0

    0.3 0.7 0

    0.2 0.2 0.6

    3

    1 2 3

    i X {n 1 : (Tn)ii > 0}

    iperiod

    1

    2.3.

    T =

    0 0.5 0 0.5

    0.5 0 0.5 0

    0 0.5 0 0.5

    0.5 0 0.5 0

    {n 1 : (Tn)ii > 0} = {2, 4, 6, . . .} (i =

    1, . . . , 4) 2

    T

    = (1, . . . , k)

    (1) i 0 (i X )k

    i=1 i = 1 (2) = T

    T

    2.4. 2.1 = (1/2, 1/3, 1/6)3

    i=1 i = 1

    = T T

    (t)

    Haggstrom (2002)

    265

  • (). (x(0),x(1), . . .)

    TT

    (0)t 12k

    i=1 |(t)i i| 0

    8

    2.3

    (x(0),x(1), . . .)

    m(x(m+1),x(m+2), . . .)

    detailed balance condition

    (6) ip(i, j) = jp(j, i) (i, j X )

    reversible

    9MCMC

    8

    9(6) i Pk

    i=1 ip(i, j) =Pk

    i=1 jp(j, i) = j

    266

  • 10

    Pr(x(t+1) A|x(t) = x) =

    AT (x,y)dy (x X , A X )

    T (x,y) T (x,y)

    transition kernelT (x,y)

    (y) =X

    (x)T (x,y)dx

    (x)

    (x)T (x,y) = (y)T (y,x)

    3

    MCMC

    MetropolisHastingsMH

    Metropolis et al. (1953) Hastings (1970)

    11MH

    3.1

    (x) (x)

    target distributionMH

    proposal distribution

    12

    q(y|x)10Nummelin (1984) Meyn and

    Tweedie (1993)11Dongarra and Sullivan (2000) 20 10

    1MH12candidate generating distribution

    267

  • MH

    acceptance probability

    (x)

    MH

    (1) x(0)

    (2) t = 0, 1, . . .

    (i) y q(y|x(t))

    (ii) u U(0, 1)13

    x(t+1) =

    y u (x(t),y)

    x(t)

    (x,y) = min{

    1, (y)q(x|y)(x)q(y|x)}

    (x,y)

    MH

    x(t+1) = x(t)

    MH

    MH q(y|x)

    MH

    3.1. (x)x(t) =

    x

    y = x + , N (0, 2I)

    14random walk chain

    q(y|x) = q(x|y) (x,y) =

    13U(a, b) (a, b)14N (,)

    268

  • min {1, (y)/(x)} Metropolis et al. (1953)

    q(y|x) = q(x|y)MH

    U(, )

    t T(0, 2I)15

    step size

    1

    16

    3.2.

    y = x +2

    2 log (x)

    x+ , N (0, 2I)

    Langevin chainRoberts and Rosenthal (1998)Christensen et al. (2001)

    Christensen and Waagepetersen (2002) x

    log (x)/x 0

    x

    log (x)/x = 0

    3.3. y x

    q(y|x) = q(y)independent chain

    (x,y) = min{

    1, (y)/q(y)(x)/q(x)}

    (x)/q(x)y x

    y y

    q(y) (x)

    15T(,) t1 t t(, 2)

    16Roberts et al. (1997)Roberts and Rosenthal(2001)

    269

  • (x)

    { log (x)/xx}1 t

    Chib and Greenberg (1995)

    3.2 MH

    MH (x(0),x(1), . . .)

    MH

    x y

    (7) T (x,y) = q(y|x)(x,y) + r(x)x(y)

    2

    r(x) =X q(y|x) {1 (x,y)} dyx(y) = I(x = y)

    17

    (7) q(y|x)(x,y)(x) = q(x|y)(y,x)(y)

    r(x)x(y)(x) = r(y)y(x)(y)

    (x)T (x,y) = q(y|x)(x,y)(x) + r(x)x(y)(x)

    = q(x|y)(y,x)(y) + r(y)y(x)(y) = (y)T (y,x)

    MH

    (x)

    Roberts and Smith (1994)Tierney (1994)MH

    MH (x(0),x(1), . . .)

    m (x(m+1),x(m+2), . . .) (x)

    m

    burn-in period

    3.3 MH

    MH

    MH17I()

    270

  • (x)2MH

    T1(x,y)T2(x,y)

    w T1(x,y)MH

    1w T2(x,y)MH

    w

    1 w

    (8) T (x,y) = wT1(x,y) + (1 w)T2(x,y)

    mixture of tran-

    sition kernelsTi(x,y) (i = 1, 2)

    (x)(8) T (x,y) (x)

    MH

    MH

    MH

    T1(x,x)MH

    x x T2(x,y)

    MH x yx

    y

    (9) T (x,y) =X

    T1(x,x)T2(x,y)dx

    (9)cycle of transition kernels

    (x)X

    (x)T (x,y)dx =X

    X

    (x)T1(x,x)T2(x,y)dxdx

    =X

    (x)T2(x,y)dx = (y)

    (x) T (x,y)18182 MH

    Tierney (1994)

    271

  • MCMC

    4

    MH

    Gibbs samplingGeman and Geman

    (1984)Gelfand

    and Smith (1990)

    4.1

    x k x = (x1, . . . ,xk)

    xi (xi|xi)19

    xi = (x1, . . . ,xi1,xi+1, . . . ,xk) (xi|xi)

    (full conditional distribution)

    (1) x(0) = (x(0)1 , . . . ,x(0)k )

    (2) t = 0, 1, . . .

    (i) x(t+1)1 (x1|x(t)2 , . . . ,x

    (t)k )

    (ii) x(t+1)2 (x2|x(t+1)1 ,x

    (t)3 , . . . ,x

    (t)k )

    ...

    (k) x(t+1)k (xk|x(t+1)1 , . . . ,x

    (t+1)k1 )

    MH

    (xi|xi)

    xi

    xiLiu et al. (1995)19 5

    272

  • 4.1. (x1, x2) = nCx1xx1+12 (1 x2)nx1+1

    x1 {0, 1, . . . , n}x2 [0, 1]x1 x2

    (x1|x2) nCx1xx12 (1 x2)

    nx1 , (x2|x1) xx1+12 (1 x2)nx1+1

    x1|x2 Bi(n, x2)x2|x1

    Be(x1 + , n x1 + )20

    4.2 MH

    k

    k = 2

    MH

    MHmultipleblock MH

    algorithm

    x = (x1,x2) y = (y1,y2)(x1,x2) (y1,x2)

    (y1,y2)MH(x1,x2)

    (y1,x2) (x1|x2)MH

    q1(y1|x1,x2)

    1(x1,y1) = min{

    1,(y1|x2)q1(x1|y1,x2)(x1|x2)q1(y1|x1,x2)

    }MH T1(x1,y1|x2)

    (y1,x2) (y1,y2)

    (x2|x1) q2(y2|y1,x2)

    2(x2,y2) = min{

    1,(y2|y1)q2(x2|y1,y2)(x2|y1)q2(y2|y1,x2)

    }MHT2(x2,y2|y1)

    20Bi(n, p) (x) px(1 p)nx Be(a, b) (x) xa1(1 x)b1

    273

  • MHx y

    T (x,y) = T1(x1,y1|x2)T2(x2,y2|y1)

    T1(x1,y1|x2) (x1|x2)X

    (x)T (x,y)dx =

    (x1,x2)T1(x1,y1|x2)T2(x2,y2|y1)dx1dx2

    = [

    (x1|x2)T1(x1,y1|x2)dx1]

    (x2)T2(x2,y2|y1)dx2

    =

    (y1|x2)(x2)T2(x2,y2|y1)dx2

    (y1|x2) =

    (y1)(x2|y1)/(x2) X

    (x)T (x,y)dx =

    (y1)(x2|y1)T2(x2,y2|y1)dx2

    = (y1)(y2|y1) = (y)

    MH

    (x)

    MH

    MH

    q1(y1|x1,x2) = (y1|x2)

    1(x1,y1) = min{

    1,(y1|x2)(x1|x2)(x1|x2)(y1|x2)

    }= 1

    q2(y2|y1,x2) = (y2|y1) 2(x2,y2) = 1

    1

    MH2121 (x)

    274

  • (x)22

    MH

    MH

    MHMetropolis within

    GibbsMuller (1991)

    4.3

    (x) z Z

    (x, z) (x, z)

    Z (x, z)dz = (x)

    MCMC x (x)

    data augmentation methodTanner and

    Wong (1987)

    Chib (1992)Albert and

    Chib (1993)

    (x|z) (z|x)

    x

    zmissing data(z|x)

    Rubin (1987)imputation

    xx

    EMDempster et al. (1970)

    (z|x)E(x|z)

    MLiu (2002)

    x z (x) p(x)

    l(x)(x) p(x)l(x)

    22Chan (1993) Roberts and Smith (1994)

    275

  • z(x, z)

    (10) (x, z) I[z < l(x)]p(x)

    x

    slice sampling

    (10)(z|x) U(0, l(x))z

    (x|z) {x : l(x) > z}

    p(x)

    Damien et al. (1999)

    Besag and Green (1993)Higdon (1998)Damien and Walker (2001)

    Neal (2003)

    5

    5.1

    MCMC

    (x(0),x(1), . . .)

    MCMC

    Cowles and Carlin (1996)Mengersen et al. (1999)

    Heidelberger and Welch (1983)

    Gelman and Rubin (1992)Geweke (1992)Raftery and Lewis (1992)

    276

  • 23

    MCMC

    multiple chain1 1

    single chain

    5.2

    MCMC

    (x(0),x(1), . . . ,x(m+n))E[h(x)]

    (11) I =1n

    ni=1

    h(x(m+i))

    m

    n IE[h(x)]Tierney

    (1994)I

    Var(I) =2

    n

    1 + 2n1j=1

    (1 j

    n

    )j

    2n1 + 2

    j=1

    j

    2 = Var[h(x)]j h(x(t))

    h(x(t+j))

    j > 0MCMC

    23R CODA

    277

  • MCMC 1

    j

    (11)

    MCMC

    1 + 2

    j=1 j

    inefficiency factor24

    I 2/n

    5.3

    mixingMH

    MH

    1

    Liu (1994)

    x = (x1,x2,x3)(x1,x2)

    (x1,x2)

    (x1|x2) (x2|x1)x3 (x3|x1,x2)

    24L 1+2

    PLj=1 j

    278

  • Gelfand et al. (1995)

    Roberts and Sahu (1997)

    EM

    Liu and Wu (1999)Meng and van Dyk (1999)van Dyk and Meng

    (2001)EMMCMC

    Neal (1996)

    Liu and Sabatti (2000)

    Liu (2003)

    parallel temperingGeyer (1991) simulated tempring

    Geyer and Thompson (1995)

    multiple try Liu et al. (2000)multiple point

    Qin and Liu (2001)

    xxxMCMC

    Moral et al. (2006)

    6

    MCMC

    279

  • 25

    MCMC

    6.1

    Breslow and Clayton (1993)

    21

    26 i

    ni yiBreslow and Clayton (1993)

    (12) yi Bi(ni, pi), logpi

    1 pi= xi + bi, bi N (0, 2)

    xi bi 2

    N (0,B0)2 IG (n0/2, s0/2) 27

    2bi (i = 1, . . . 21)

    2

    2|, {bi} IG

    (21 + n0

    2,

    21i=1 b

    2i + s0

    2

    )

    (|2, {bi}) 21i=1

    pyii (1 pi)niyi exp

    {1

    2( 0)B10 ( 0)

    }MH

    (13) yi = xi + bi + i, i N (0, 2i )25 Berger (1980) 26Crowder (1978)27IG(a, b) (x) (1/x)a+1 exp(b/x)

    280

  • McCullagh and Nelder (1989)yi = xi + bi +

    (yi nipi)/{nipi(1 pi)}2i = 1/{nipi(1 pi)}MH

    (13)

    N (, V)

    V1 =21i=1

    xixi2i

    + B10 , = V

    {21i=1

    xi(yi bi)2i

    + B10 0

    } t

    t T(, V)

    bi

    (bi|, 2) pyii (1 pi)niyi exp

    ( b

    2i

    22

    ) t(bi, v2i )MH

    v2i = 22i /(2 + 2i )

    bi = v2i (yi xi)/2i

    Breslow and Clayton (1993)

    (12) 1

    0 = 0B0 = 100In0 = 1s0 = 0.01

    = 2028 1

    29 90%

    1

    Breslow and Clayton (1993)

    30

    (12)

    yi Bi(ni, pi), logpi

    1 pi= i, i N (xi, 2)

    28 2930 5000

    10000

    281

  • MH

    0 2500 5000 7500 10000 12500 15000

    1.0

    0.5

    0.0 0

    0 5 10 15 20 25

    0.5

    1.0ACF0

    0 2500 5000 7500 10000 12500 15000

    1

    0

    1 1

    0 5 10 15 20 25

    0.5

    1.0ACF1

    0 2500 5000 7500 10000 12500 15000

    1

    22

    0 5 10 15 20 25

    0.5

    1.0ACF2

    0 2500 5000 7500 10000 12500 15000

    1

    13

    0 5 10 15 20 25

    0.5

    1.0ACF3

    0 2500 5000 7500 10000 12500 15000

    1

    0

    10

    0 5 10 15 20 25

    0.5

    1.0ACF0

    0 2500 5000 7500 10000 12500 15000

    1

    0

    1 1

    0 5 10 15 20 25

    0.5

    1.0ACF1

    0 2500 5000 7500 10000 12500 15000

    1

    2

    32

    0 5 10 15 20 25

    0.5

    1.0ACF2

    0 2500 5000 7500 10000 12500 15000

    2

    1

    0

    13

    0 5 10 15 20 25

    0.5

    1.0ACF3

    1:

    1:

    -0.546 0.177 -0.542 0.190 0.091 0.293 0.146 0.308 1.333 0.250 1.339 0.270 -0.803 0.399 -0.825 0.430 0.242 0.125 0.313 0.121

    MH|2, {i}

    N (, V)31V1 =21

    i=1 xixi/

    2+B10

    = V(21

    i=1 xii/2 +B10 0) 5

    1

    6.2

    Leroux and Puterman (1992)

    5

    t yt

    31i 2

    282

  • yt Po(st) (t = 1, . . . , T )

    Po(st) st st {1, 2}

    Leroux and Puterman (1992)

    T =

    p11 p12p21 p22

    32T 0 = (01, 02)

    s1 0st

    ({i}, {pii}, {st})

    i

    Ga(ai, bi)

    i|{pii}, {st} Ga

    ai + tSi

    yt, bi + ni

    (i = 1, 2)33Si = {t : st = i}ni Si

    pii Be(i, i)pii

    pii|{i}, {st} Be(i + nii, i + nij) (i = 1, 2)

    nij i j st

    st

    (st = i|{i}, {pii}, {st}t =t)

    f(yt|i)0ipi,st+1 t = 1

    f(yt|i)pst1,ipi,st+1 1 < t < T

    f(yt|i)pst1,i t = T

    f(y|)

    st 132Leroux and Puterman (1992){st}33 Ga(a, b)(x) xa1 exp(bx)

    283

  • Chib (1996) {st}

    2

    2{st} 1

    i

    {st}

    1

    0 2000 4000 6000 8000 10000

    0.1

    0.2

    0.3

    0.41

    0 10 20 30 40 50

    0.25

    0.50

    0.75

    1.00ACF1

    0 2000 4000 6000 8000 10000

    2

    4

    6 2

    0 10 20 30 40 50

    0.25

    0.50

    0.75

    1.00ACF2

    0 2000 4000 6000 8000 10000

    0.1

    0.2

    0.3

    0.4 1

    0 10 20 30 40 50

    0.25

    0.50

    0.75

    1.00ACF1

    0 2000 4000 6000 8000 10000

    2

    4

    62

    0 10 20 30 40 50

    0.25

    0.50

    0.75

    1.00ACF2

    2:

    2

    IF34

    2: 1

    IF IF1 0.220 0.050 14.346 0.220 0.048 19.289

    2 2.282 0.769 13.445 2.273 0.744 24.257

    p11 0.968 0.024 16.195 0.968 0.023 20.295

    p22 0.672 0.154 2.6689 0.671 0.148 4.7609

    34Chib (1996) 5000 10000 50

    284

  • [1] (2005). , II, 3106,

    [2] (2001). , 31, 305344.

    [3] (2003).MCMC

    [4] (2005)

    [5] Albert, J. and Chib, S. (1993). Bayesian analysis of binary and polychoto-

    mous response data, Journal of the American Statistical Association 88,

    669679.

    [6] Berger, J.O. (1980). Statistical Decision Theory and Bayesian Analysis (2nd

    ed.). Springer, New York.

    [7] Besag, J. and Green, P. (1993). Spatial statistics and Bayesian computa-

    tion, Journal of the Royal Statistics Society B55, 2537.

    [8] Breslow, N.E. and Clayton, D.G. (1993). Approximate inference in gener-

    alized linear mixed models, Journal of the American Statistical Association

    88, 925.

    [9] Chan, K.S. (1993). Asymptotic behavior of the Gibbs sampler, Journal of

    the American Statistical Association 88, 320326.

    [10] Chib, S. (1992). Bayes regression for the tobit censored regression model,

    Journal of Econometrics 51, 7999.

    [11] Chib, S. (1996). Calculating posterior distributions and modal estimates in

    Markov mixture models, Journal of Econometrics 75, 7997.

    [12] Chib, S. and Greenberg, E. (1995). Understanding the Metropolis-Hastings

    algorithm, American Statistician 49, 327335.

    [13] Christensen, O.F., Mller, J., and Waagepetersen, R.P. (2001). Geometric

    ergodicity of Metropolis-Hastings algorithms for conditional simulation in

    generalized linear mixed models, Methodology and Computing in Applied

    Probability 3, 309327.

    [14] Christensen, O.F. and Waagepetersen, R. (2002). Bayesian prediction of

    spatial count data using generalized linear mixed models, Biometrics 58,

    280286.

    [15] Cowles, M.K. and Carlin, B.P. (1996). Markov chain Monte Carlo conver-

    gence diagnostics: A comparative review, Journal of the American Statis-

    tical Association 91, 883904.

    285

  • [16] Crowder, M.J. (1978). Beta-Binomial ANOVA for proportions, Applied

    Statistics 27, 3437.

    [17] Damien, P., Wakefield, J., and Walker, S.G. (1999). Gibbs sampling for

    Bayesian non-conjugate and hierarchical models by using auxiliary vari-

    ables, Journal of the Royal Statistics Society B61, 331344.

    [18] Damien, P. and Walker, S.G. (2001). Sampling truncated normal, beta,

    and gamma densities, Journal of Computational and Graphical Statistics

    10, 206215.

    [19] Dempster, A. P., Laird, N.M., and Rubin, D.B. (1977). Maximum likelihood

    from incomplete data via the EM algorithm, Journal of the Royal Statistics

    Society B39, 138.

    [20] Devroye, L. (1896). Non-Uniform Random Variate Generation. Springer

    Verlag, New York.

    [21] Dongarra, J. and Sullivan, F. (2000). Guest editors introduction: The top

    10 algorithms, Computing in Science and Engineering 2, 2223.

    [22] Evans, M. and Swartz, T. (2000). Approximating Integrals Via Monte Carlo

    and Deterministic Methods. Oxford University Press, Oxford.

    [23] Gamerman, D. and Lopes, H. (2006). Markov Chain Monte Carlo: Stochas-

    tic Simulation for Bayesian Inference (2nd ed.). Chapman & Hall/CRC,

    London.

    [24] Gelfand, A.E., Sahu, S.K., and Carlin, B.P. (1995). Efficient parametrisa-

    tions for normal linear mixed models, Biometrika 82, 479488.

    [25] Gelfand, A.E. and Smith, A.F.M. (1990). Sampling-based approaches to

    calculating marginal densities, Journal of the American Statistical Associ-

    ation 85, 398409.

    [26] Gelman, A. and Rubin, D.B. (1992). Inference from iterative simulation

    using multiple sequences (with discussion), statistical Science 7, 457511.

    [27] Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distribu-

    tions, and the Bayesian restoration of images, IEEE Transactions on Pat-

    tern Analysis and Machine Intelligence 6, 721741.

    [28] Gentle, J.E. (2003). Random Number Generation and Monte Carlo Methods.

    Springer, New York.

    [29] Geweke, J. (1989). Bayesian inference in econometric models using Monte

    Carlo integration, Econometrica 57, 13171340.

    [30] Geweke, J. (1992). Evaluating the accuracy of sampling-based approaches

    to the calculation of posterior moments, in J. M. Bernardo et al.(eds.),

    Bayesian Statistics 4, 169193, Oxford University Press, Oxford.

    [31] Geyer, C.J. (1991). Markov chain Monte Carlo maximum likelihood, in

    286

  • E.Keramigas (eds.), Computing Science and Statistics: The 23rd symposium

    on the inference, Interface Foundation, Fairfax, 156163.

    [32] Geyer, C.J. and Thompson, E. (1995). Annealing Markov chain Monte

    Carlo with applications to ancestral inference, Journal of the American

    Statistical Association 90, 909920.

    [33] Hastings, W.K. (1970). Monte Carlo sampling methods using Markov

    chains and their applications, Biometrika 57, 97109.

    [34] Haggstrorm, O. (2002). Finite Markov Chains and Algorithmic Applications.

    Cambridge University Press, Cambridge.

    [35] Heidelberger, P. and Welch, P.D. (1983). Simulation run length control in

    the presence of an initial transient, Operations Research 31, 11091144.

    [36] Higdon, D. (1998). Auxiliary variable methods for Markov chain Monte

    Carlo with applications, Journal of the American Statistical Association

    93, 398409.

    [37] Karlin, S. and Taylor, J. (1975). A First Course in Stochastic Processes (2nd

    ed.). Academic Press, New York.

    [38] Leroux, B.G. and Puterman, M.L. (1992). Maximum-penalized likelihood

    estimation for independent and Markov-dependent mixture models, Bio-

    metrics 48, 545558.

    [39] Liu, C. (2002). An example of algorithm mining: Covariance adjustment to

    accelerate EM and Gibbs, in J. Huang and H. Zhang (eds.), Development of

    Modern Statistics and Related Topics, 7488, World Scientific, New Jersey.

    [40] Liu, C. (2003). Alternating subspace-spanning resampling to accelerate

    Markov chain Monte Carlo simulation, Journal of the American Statistical

    Association 98, 110117.

    [41] Liu, J.S. (1994). The collapsed Gibbs sampler in Bayesian computations

    with applications to a gene regulation problem, Journal of the American

    Statistical Association 89, 958966.

    [42] Liu, J.S. (2001). Monte Carlo Strategies in Scientific Computing. Springer,

    New York.

    [43] Liu, J.S., Liang, F., and Wong, W.H. (2000). The use of multiple-try

    method and local optimization in Metropolis sampling, Journal of the

    American Statistical Association 95, 121134.

    [44] Liu, J.S. and Sabatti, C. (2000). Generalized Gibbs sampler and multigrid

    Monte Carlo for Bayesian computation, Biometrika 87, 353369.

    [45] Liu, J.S., Wong, W.H., and Kong, A. (1995). Covariance structure and

    convergence rate of the Gibbs sampler with various scans, Journal of the

    Royal Statistical Society B57, 157169.

    287

  • [46] Liu, J.S. and Wu, Y.N. (1999). Parameter expansion for data augmenta-

    tion, Journal of the American Statistical Association 94, 12641274.

    [47] McCullagh, P. and Nelder, J.A. (1989). Generalized Linear Models (2nd ed.).

    Chapman & Hall, London.

    [48] Meng, X.-L. and van Dyk, D.A. (1999). Seeking efficient data augmentation

    schemes via conditional and marginal augmentation, Biometrika 86, 301

    320.

    [49] Mengersen, K.L., Robert, C.P., and Guihenneuc-Jouyaux, C. (1999).

    MCMC convergence diagnostics: A review, in J. M. Bernardo et al. (eds.),

    Bayesian Statistics 6, 415440, Clarendon Press, Oxford.

    [50] Metropolis, N., Rosenbluth, A.W., Rosenbluth, M.N., Teller, A.H., and

    Teller, E. (1953). Equations of state calculations by fast computing ma-

    chines, Journal of Chemical Physics 21, 10871091.

    [51] Meyn, S.P. and Tweedie, R.L. (1993). Markov Chains and Stochastic Stabil-

    ity. SpringerVerlag, London.

    [52] Moral, P.D., Doucet, A., and Jasra, A. (2006). Sequential Monte carlo

    samplers, Journal of the Royal Statistical Society B68, 411436.

    [53] Muller, P. (1991). A generic approach to posterior integration and Gibbs

    sampling, Technical report 91-09, Institute of Statistics and Decision Sci-

    ences, Duke University.

    [54] Neal, R.M. (1996). Bayesian Learning for Neural Networks. Lecture Notes

    118, SpringerVerlag, New York.

    [55] Neal, R.M. (2003). Slice sampling, Annals of Statistics 31, 705767.

    [56] Nummelin, E. (1984). General Irreducible Markov Chains and Non-negative

    Operators. Cambridge University Press, Cambridge.

    [57] Qin, Z. and Liu, J.S. (2001). Multi-point Metropolis method with applica-

    tion to hybrid Monte Carlo, Journal of Computational Physics 172, 827

    840.

    [58] Raftery, A.E. and Lewis, S. (1992). How many iterations in the Gibbs

    sampler? in J. M. Bernardo et al.(eds.), Bayesian Statistics 4, 763773,

    Oxford University Press, Oxford.

    [59] Ripley, W. (1987). Stochastic Simulation. Wiley, New York.

    [60] Robert, C.P. and Casella, G. (2004). Monte Carlo Statistical Methods (2nd

    ed.). SpringerVerlag, New York.

    [61] Roberts, G.O., Gelman, A., and Gilks, W.R. (1997). Weak convergence and

    optimal scaling of random walk Metropolis algorithms, Annals of Applied

    Probability 7, 110120.

    [62] Roberts, G.O. and Rosenthal, J.S. (1998). Optimal scaling of discrete ap-

    288

  • proximations to Langevin diffusions, Journal of the Royal Statistical Society

    B60, 255268.

    [63] Roberts, G.O. and Rosenthal, J.S. (2001). Optimal scaling for various

    Metropolis-Hastings algorithms, Statistical Science 16, 351367.

    [64] Roberts, G.O. and Sahu, S.K. (1997). Updating schemes, correlation struc-

    ture, blocking and parameterization for the Gibbs sampler, Journal of the

    Royal Statistical Society B56, 377384.

    [65] Roberts, G.O. and Smith, A.F.M. (1994). Simple conditions for the conver-

    gence of the Gibbs sampler and MetropolisHastings algorithms, Stochastic

    Processes and Their Applications 49, 207216.

    [66] Ross, S.M. (1995). Stochastic Processes (2nd ed.). Wiley, New York.

    [67] Rubin, D.B. (1987) Multiple Imputation or Non-response in Surveys. Wiley,

    New York.

    [68] Rubinstein, R.Y. (1981). Simulation and the Monte Carlo Method. Wiley,

    New York.

    [69] Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior dis-

    tributions by data augmentation, Journal of the American Statistical As-

    sociation 82, 528549.

    [70] Tierney, L. (1994). Markov chains for exploring posterior distributions

    (with discussion), Annals of Statistics 22, 17011762.

    [71] van Dyk, D.A. and Meng, X.-L. (2001). The art of data augmentation (with

    discussions), Journal of Computational and Graphical Statistics 10, 1111.

    289