Click here to load reader

View

220Download

1

Embed Size (px)

DESCRIPTION

problemas resueltos

Bayesian Statistics:

An Introduction

PETER M. LEEFormerly Provost of Wentworth College,University of York, England

Fourth Edition

John Wiley & Sons, Ltd

Appendix D

Answers to Exercises

D.1 Exercises on Chapter 11. Considering trumps and non-trumps separately, required probability is

2

(3

3

)(23

10

)/(26

13

)=

11

50.

Probability of a 2 : 1 split is 39/50, and the conditional probability that the king is theodd one is 1/3, so probability one player has the king and the other the remaining twois 13/50.

2. (a) If P(A) = 0 or P(A) = 1.

(b) Take A as odd red die, B as odd blue die and C as odd sum.

(c) Take C as {HHH, THH, THT, TTH}.

3. (a) P(homozygous) = 1/3; P(heterozygous) = 2/3.

(b) By Bayes Theorem

P(BB | 7 black) = (1/3)(1)(1/3)(1) + (2/3)(1/27)

=64

65.

(c) Similarly we see by induction (wih case n = 0 given by part (a)) thatP(BB |first n black) =2n1/(2n1 + 1) since

P(BB |first n+ 1 black) = {2n1/(2n1 + 1)}(1)

{2n1/(2n1 + 1)}(1) + {1/(2n1 + 1)}(1/2)=

2n

2n + 1.

1

4. Use P(GG) = P(M)P(GG|M) + P(D)P(GG|D) to get

P(M) =P(GG) p2p(1 p)

5. p(3) = p(4) = 1/36, p(5) = p(6) = 2/36, p(7) = = p(14) = 3/36,p(15) = p(16) = 2/36, p(17) = p(18) = 1/36. As for the distribution function,

F (x) =

0 for x < 3;1/36 for 3 6 x < 4;2/36 for 4 6 x < 5;4/36 for 5 6 x < 6;6/36 for 6 6 x < 7;(3[x] 12)/36 for n 6 x < n+ 1 where 7 6 n < 15;32/36 for 15 6 x < 16;34/36 for 16 6 x < 17;35/36 for 17 6 x < 18;1 for x > 18

where [x] denotes the integer part of x.

6. P(k = 0) = (1 pi)n = (1 /n)n e. More generally

p(k) =

(n

k

)pik(1 pi)nk

=k

k!

(1

n

)nn

n

n 1n

. . .n k + 1

n

(1

n

)k

k

k!exp().

7. (a) P(k = 0

)= P(m = n = 0) = P(m = 0)P(n = 0) = ee = e(+),

P( k = 1)

= P({m = 1, n = 0} or {m = 0, n = 1})

= e(+) + e(+) = (+ )e(+).

(b) More generally

P(k = k

)=

km=0

P(m = m, n = k m) =k

m=0

P(m = m)P(n = k m)

=

km=0

k

k!exp()

km

(k m)! exp()

2

=1

k!exp((+ ))

km=0

(k

m

)mkm =

(+ )k

k!exp((+ )).

(c) By definition of conditional probability

P(m = m | k = k) = P(m = m, k = k)

P(k = k

) = P(m = m, n = k m)P(k = k

)=

m

m! exp()) km

(km)! exp()(+)k

k! exp((+ ))

=

(k

m

)(

+

)m(1

+

)km.

8. Let y = x2 where x N(0, 1). ThenP (y 6 y) = P

(x2 6 y

)= P (y 6 x 6 y )

= P(x 6 y ) P(x < x )

so that (because P(x = y) = 0)Fy(y) = Fx (

y ) Fx (y )

and on differentiation

py(y) =12y 12 px (

y ) + 12y

12 px (y ) .Alternatively, you could argue that

P(y < y 6 y + dy) = P(x < x 6 x+ dx) + P(x dx 6 x < x)implying

py(y) dy = px(x) dx+ px(x) dxwhich as dy/ dx = 2x = 2

y leads to the same conclusion.

In the case where x N(0, 1) this gives

py(y) = (2piy) 12 exp

( 12y)which is the density of 21.

9. By independence, since M 6M iff every one of the indvidual Xi are less than orequal to M

FM (M) = P(Xi 6M i) = (F (M))n; Fm(m) = 1 (1 F (m))n;pM (M) = nf(M)(F (M))

n1; pm(m) = nf(m)(1 F (m))n1

3

10. Let m be the minimum of u and v and c be the length of the centre section. Thenp(m, c) = pu, v(m, c) + pu, v(c,m) = 2 if m+ c 6 1 and 0 otherwise.

11. If F (x, y) = F (x)F (y) then by differentiation with respect to x and with respectto y we have p(x, y) = p(x)p(y). The converse follows by integration.

In discrete cases if F (x, y) = F (x)F (y) then p(x, y) = F (x, y) F (x 1, y)F (x, y 1) + F (x 1, y 1) = (Fx(x) Fx(x 1))(Fy(y) Fy(y 1)) =p(x)p(y). Coversely if p(x, y) = p(x)p(y) then F (x, y) =

6x

6y px,y(, )

and so F (x, y) =6x px()

6y py() = F (x)F (y).

12. EX = n(1 pi)/pi; VX = n(1 pi)pi2.

13. EX =EZ2i =

1 = , while EX2 =

EZ4i +

i 6=j EX

2iX

2j = 3 +

( 1) so that VZ = EX2 (EX)2 = 2. Similarly by integration.

14. EX = npi, EX(X 1) = n(n 1)pi2 and EX(X 1)(X 2) = n(n 1)(n2)pi3, so EX = npi, EX2 = EX(X 1) + EX = n(n 1)pi + npi andE(X EX)3 = EX3 3(EX2)(EX) + 2(EX)3

= EX(X 1)(X 2) 3(EX2)(EX) + 3EX2 + 2(EX)3 2EX= npi(1 pi)(1 2pi),

and thus 1 = (1 2pi)/

[npi(1 pi)].

15. >{x; |x|>c} c

2p(x) dx = c2P(|x | > c).

16. By symmetry Ex = Ey = Exy = 0 so C(x, y) = Exy ExEy = 0. However0 = P(x = 0, y = 0) 6= P(x = 0)P(y = 0) = 12 12 = 14 .

17. p(y|x) = {2pi(12)} 12 exp{ 12 (yx)2/(12)} so conditional on x = xwe have y N(x, 1 2), so E(y|x) = x. E(y2|x) = V(y|x) + {E(y|x)}2 =1 2 + 2x2, hence E(xy|x) = x2, E(x2y2|x) = x2 2x2 + 2x4. ThereforeExy = and (as Ex4 = 3) Ex2y2 = 1 + 22, so Exy (Ex)(Ey) = and Ex2y2 (Ex2)(Ey2) = 22. As Vx = 1 and Vx2 = Ex4 (Ex2)2 = 3 1 = 2 = Vy, theresult follows.

18. (a) p(x, y) = (xex/x!)(xy

)piy(1pi)xy so adding over x = y, y+1, . . . and

usingxy(1pi)xy = e(1pi) we get p(y) = (pi)yepi/y! so that y P(pi).

Now note that Ey|x(y|x) = xpi and this has expectation pi.

(b) Note that V y|x(y|x) = xpi(1 pi) which has expectation pi(1 pi) and that

4

Ey|x(y|x) = xpi which has variance pi2 so that the right hand side adds to pi, thevariance of y.

19. We note that

I =

0

exp( 12z2) dz = 0

exp( 12 (xy)2) y dx

for any y (on setting z = xy). Putting z in place of y, it follows that for any z

I =

0

exp( 12 (zx)2) z dx

so that

I2 =

( 0

exp( 12z2) dz)(

0

exp( 12 (zx)2) dx)

=

0

0

exp{ 12 (x2+1)z2} z dz dx.

Now set (1 + x2)z2 = 2t so that z dz = dt/(1 + x2) to get

I2 =

0

0

exp(t) dt(1 + x2)

dx =

( 0

exp(t) dt)(

0

dx

(1 + x2)

)= [ exp(t)]0

[tan1 x

]0

=[1][

12pi]

=pi

2

and hence I =pi/2 so that the integral of from to is 1, and hence is a

probability density function. This method is apparently due to Laplace (1812, Section24, pages 9495 in the first edition).

D.2 Exercises on Chapter 2

1. p(pi) =(B(k + 1, n k + 1))1pik(1 pi)nk or pi Be(k + 1, n k + 1).

2. x = 16.35525, so assuming uniform prior, posterior is N(16.35525, 1/12). A90% HDR is 16.35525 1.6449/12 or 16.35525 0.47484, that is, the interval(15.88041, 16.83009).

3. x N(0, 1) and N(16.35525, 1/12), so x N(16.35525, 13/12).

4. Assumming uniform prior, posterior is N(x, /n), so take n = k. If prior varianceis 0, posterior is {1/0 + n/}1, so take n the least integer such that n > (k 1)/0.

5. Posterior is k(2pi/25)12 exp{ 12 ( 0.33)2 25} for > 0 and 0 for 0) where X N(0.33, 1/25), so k = {1

5

(1.65)}1 = 1.052. We now seek 1 such that p( 6 1|x) = 0.95 or equivalentlykP(0 < X 6 1) = 1 with k and X as before. This results in 0.95 = 1.052{(51 1.65) (1.65)}, so (51 1.65) = 0.95/1.052 + 0.0495 = 0.9525 leading to51 1.65 = 1.67 and so to 1 = 0.664. The required interval is thus [0, 0.664).n

6. From the point of view of someone starting from prior ignorance, my beliefs areequivalent to an observation of mean and variance and yours to one of mean andvariance , so after taking into account my beliefs such a person is able to use Section2.2 and conclude N(1, 1) where 1/1 = 1/+ 1/ and 1/1 = /+ /.

7. The likelihood (after inserting a convenient constant) is

l(x|) = (2pi/n) 12 exp{ 12 (x )2/(/n)}.Hence by Bayes Theorem, within I

Ac(1 )(2pi/n) 12 exp{ 12 (x )2/(/n)} 6 p(|x)6 Ac(1 + )(2pi/n) 12 exp{ 12 (x )2/(/n)}

and outside I

0 6 p(|x) 6 AMc(2pi/n) 12 exp{ 12 (x )2/(/n)},where A is a constant equal to p(x)1. Using the right hand inequality for the regioninside I we getI

p(|x) d 6 Ac(1 + )I

(2pi/n)12 exp{ 12 (x )2/(/n)} d

= Ac(1 + )

(2pi)12 exp( 12 t2) dt, where t = (x )/

/n

= Ac(1 + )[() ()] = Ac(1 + )(1 )since () = 1(). Similarly, the same integral exceeds AQc(1 )(1),and, if J is the outside of I,

0 6J

p(|x) d 6 AMc.

Combining these results we have, sinceIJ p(|x) d = 1,

Ac(1 )(1 ) 6 1 6 Ac[(1 + )(1 ) +M],and hence

1

(1 + )(1 ) +M 6 Ac 61

(1 )(1 ) .

The result now follows on remarking that the maximum value of the exponential in Joccurs at the end-points

/n, where it has the value exp( 122).

6

8. The likelihood is

l(|x) h()n exp{

t(xi)()}

h()n exp{

exp[log() log

(1/

t(xi))]}

.

This is of the data-translated form g(() t(x)) if the function h(x) 1.

9. Take prior S02 where = 8 and S0/( 2) = 100 so S0 = 600 and take n =30 and S = 13.22(n1) = 5052.96. Then (cf. Section 2.7) posterior is (S0 +S)2+nwhere + n = 38 and S0 + S = 5652.96. Values of 2 corresponding to a 90%HDR for log 238 are (interpolating between 35 and 40 degrees of freedom) 25.365 and54.269 so 90% interval is (104, 223).

10. n = 10, x = 5.032, S = 3.05996, s/n = 0.1844. Posterior for mean is

such that ( 5.032)/0.1844 t , so as t9,0.95 = 1.833 a 90% HDR is (4.69, 5.37).Posterior for variance is S2 , so as values of

2 corresponding to a 90% HDR forlog29 are 3.628 and 18.087 required interval is (0.17, 0.84).

11. Sufficient statistic isni=1 xi, or equivalently x.

12. Sufficient statistic is (ni=1 xi,

ni=1 xi) or equivalently (x, x) where x is the

geometric mean.

13. p() 0 exp(/).

14. If U(pi/2, pi/2) and x = tan , then by the usual change-of-variable rulep(x) = pi1|d/ dx| = pi1 (1 + x2)1, and similarly if U(pi/2, 3pi/2). Theresult follows.

15. Straightforwardly

p(x|pi) = n!x! y! z!

pixyz

=

(n!

x! y! (n x y)!) exp{n log(1 pi )}

exp[x log{pi/(1 pi }+ y log{/(1 pi )}]= g(x, y) h(pi, ) exp[t(x, y)(pi, ) + u(x, y)(pi, )]

16. 1 = 0 + n = 104; n1 = n0 + n = 101; 1 = (n00 + nx)/n1 = 88.96;S = (n 1)s2 = 2970; S1 = S0 +S+ (n10 +n1)1(0x)2 = 3336; s1/

n1 =

7

3336/(101 104) = 0.56. It follows that a posteriori

1s1/n1 t1 , S121 .

For prior, find 75% point of t4 from, e.g., Neaves Table 3.1 as 0.741. For posterior,as degrees of freedom are large, can approximate t by normal, noting that the 75%point of the standard normal distribution is 0.6745. Hence a 50% prior HDR for is85 0.741(350/4 1), that is (75.6, 94.4), while a 50% posterior HDR for is88.96 0.6745 0.56, that is, (88.58, 89.34).

17. With p1() being N(0, 1) we see that p1(|x) is N(2, 2), and with p2() beingN(1, 1) we see that p2(|x) is N(3, 1). As

p(x|)p1() d =

1

2piexp{ 12 (x )2 122} d

=

12

2piexp{ 14x2}

12pi 12

exp{ 12 ( 12x)2/ 12} d

=

12

2piexp{ 14x2} =

1

2pi

exp{2}

and similarlyp(x|)p1() d =

1

2piexp{ 12 (x )2 12 ( 1)2} d

=

12

2piexp{ 14 (x 1)2}

12pi 12

exp{ 12 ( 12 (x+ 1))2/ 12} d

=

12

2piexp{ 14 (x 1)2} =

1

2pi

exp{ 14}

so that just as in Section 2.13 we see that the posterior is an to mixture of N(2, 1)and N(3, 1) where 23e2 = 0.09 and 13e1/4 = 0.26, so that = 0.26 and = 0.74. It follows that

P( > 1) = 0.26 0.1587 + 0.74 0.0228 = 0.058.

8

18. Elementary manipulation gives

nX2

+ n020 (n+ n0)

(nX + n00n+ n0

)2=

1

n+ n0[{n(n+ n0) n2}X2 + {n0(n+ n0) n2o}20 2(nn0)X]

=nn0n+ n0

[X2

+ 20 2X0] = (n10 + n1)1(X )2.

D.3 Exercises on Chapter 31. Using Bayes postulate p(pi) = 1 for 0 6 pi 6 1 we get a posterior (n + 1)pinwhich has mean (n+ 1)/(n+ 2).

2. From Table B.5, F 40,24 = 0.55 and F 40,24 = 1.87, so for Be(20, 12) take lowerlimit 20 0.55/(12 + 20 0.55) = 0.48 and upper limit 20 1.87/(12 + 20 1.87) = 0.76 so 90% HDR (0.48.0.76). Similarly by interpolation F 41,25 = 0.56and F 41,25 = 1.85, so for Be(20.5, 12.5) quote (0.48.0.75). Finally by interpolationF 42,26 = 0.56 and F 42,26 = 1.83, so for Be(21, 13) quote (0.47.0.75). It does notseem to make much difference whether we use a Be(0, 0), a Be( 12 ,

12 ) or a Be(1, 1)

prior.

3. Take /(+ ) = 1/3 so = 2 and

Vpi = (+ )2(+ + 1)

=2

32(3+ 1)

so = 55/27 = 2 and = 4. Posterior is then Be(2 + 8, 4 + 12), that is Be(10, 16).The 95% values for F32,20 are 0.45 and 2.30 by interpolation, so those for F20,32 are0.43 and 2.22. An appropriate interval for pi is from 10 0.43/(16 + 10 0.43 to10 2.22/(16 + 10 2.22), that is (0.21, 0.58).

4. Take /(+) = 0.4 and + = 12, so approximately = 5, = 7. Posterioris then Be(5 + 12, 7 + 13), that is, Be(17, 20).

5. Take beta prior with /( + ) = 13 and + =1411 = 2.75. It is convenient

to take integer and , so take = 1 and = 2, giving /( + ) = 13 and( + )/11 = 0.273. Variance of prior is 2/36 = 0.0555 so standard deviationis 0.236. Data is such that n = 11 and X = 3, so posterior is Be(4, 10). Fromtables, values of F corresponding to a 50% HDR for log F20,8 are F = 0.67 and F =1.52. It follows that the appropriate interval of values of F8,20 is (1/F , 1/F ), that is(0.66, 1.49). Consequently and appropriate interval for the proportion pi required is4 0.66/(10 + 4 0.66) 6 pi 6 4 1.49/(10 + 4 1.49), that is (0.21, 0.37). Theposterior mean is 4/(4 + 10) = 0.29, the posterior mode is (4 1)/(4 + 10 2) =

9

0.25 and using the relationship median = (2 mean + mode)/3 the posterior mode isapproximately 0.28. The actual overall proportion 86/433 = 0.20 is consistent withthe above. [Data from the York A.U.T. Membership List for 1990/91.]

6. By Appendix A.13, Ex = n(1 pi)/pi and Vx = n(1 pi)/pi2, so g(Ex) =12n1[(1 pi)/pi2] 12 , so Vx = 1/4n (cf. Section 1.5). By analogy with the argument

that an arc-sine prior for the binomial distribution is approximately data-translated,this suggests a prior uniform in sinh1

pi so with density 12pi

12 (1 + pi)12 . But note

Jeffreys Rule suggests Be(0, 12 ) as remarked in Section 7.4.

7. As this is a rare event, we use the Poisson distribution. We have n = 280,T =

xi = 196. Posterior is S11

2 where S1 = S0 + 2n,

= + 2T . Forreference prior take = 1, S0 = 0 we get a posterior which is 56012393. Using theapproximation in Appendix A.3, we find that a 95% interval is 12 (

785 1.96)2/560,

that is, (0.61, 0.80).

8. Prior is such that /S0 = 0.66, 2/S20 = 0.1152, so = 66, S0 = 100, so

posterior has S1 = 660, = 458. This gives a posterior 95% interval 12 (

915 1.96)2/660, that is, (0.61, 0.79).

9. p(x|)/ = 3/2 12x2 and 2p(x|)/2 = 3/22, so I(|x) = 3/22and we take p() 1/ or equivalently a uniform prior in = log.

10. We have 2L/pi2 = x/pi2 z/(1 pi )2, etc., so

I(pi, |x, y, z) =(n/pi + n/(1 pi ) n/(1 pi )

n/(1 pi ) n/+ n/(1 pi ))

and so after a little manipulation det I(pi, |x, y, z) = n{pi(1pi)}1, suggestinga prior

p(pi, ) pi 12 12 (1 pi ) 12which is evidently related to the arc-sine distribution.

11. p(x|)/ = 1/ + log log x and 2p(x|)/2 = 1/2, so I(|x) =1/2 and we take p() 1/ or equivalently a uniform prior in = log .

12. Using Appendix A.16, coefficient of variation is {2/( + 1)( 2)}. Thisis less than 0.01 if 12 ( + 1)( 2) > 1/0.012 or 2 20, 002 > 0, so if > 12 (1 +

80, 009) = 141.9. Taking the reference prior p(, ) ( )2, that

is, Pabb(,,1) (cf. Section 3.6), we need = n 1 > 141.9, that is, n at least143.

10

13. Take prior p() = (d 1)d for 0 < < . Then posterior is p(|x) =(d 1)d for 1 < 100, so posterior is approximately

p( | 71, 100) 3/ >100

3

.Approximating sums by integrals, the posterior probability P( > 0 | 71, 100) =1002/20 , and in particular the posterior median is 100

2 or about 140.

15. We have p() = 1/ and setting = log we get p() = p() |d/d| = 1.Thus we expect all pages to be more or less equally dirty.

16. The group operation is (x, y) 7 x + y mod 2pi and the appropriate Haar mea-sure is Lebesgue measure on the circle, that is, arc length around the circle.

17. Table of c2{c2 + (x1 )2}1{c2 + (x2 )2}1:\c 1 2 3 4 4

0 0.00 0.01 0.01 0.01 0.012 0.06 0.05 0.04 0.03 0.024 0.04 0.06 0.05 0.04 0.036 0.06 0.05 0.04 0.03 0.028 0.00 0.01 0.01 0.01 0.01

Integrating over c using Simpsons Rule (and ignoring the constant) for the above val-ues of we get 0.11, 0.48, 0.57, 0.48 and 0.11 respectively. Integrating again we getfor intervals as shown:

(1, 1) (1, 3) (3, 5) (5, 7) (7, 9) Total0.92 2.60 3.24 2.60 0.92 10.28

so the required posterior probability is 3.24/10.28 = 0.31.

18. First part follows as sum of concave functions is concave and a concave functionhas a unique maximum. For the example, note that

p(x|) = exp( x)/{1 + exp( x)}2 ( < x

Hence

L(|x) = 1 2 exp( x)/{1 + exp( x)}= {1 exp( x)}/{1 + exp( x)}= 1 2/{1 + exp(x )}

L(|x) = 2 exp(x )/{1 + exp(x )}2= 2 exp( x)/{1 + exp( x)}2.

As this is clearly always negative, the log-likelihood is concave. Also

L(|x)/L(|x) = 12{exp( x) exp(x )}

I(|x) = 2

exp(2( x))/{1 + exp( x)}4 dx

= (1/8)

sech4( x) dx

= (1/8)

sech4 y dy

= (1/24)[sinh y sech3 y + 2 tanh y] = 1/6.

(The integration can be checked by differentiation of the result). Now proceed as inSection 3.10.

D.4 Exercises on Chapter 4

1. 1 (1 p0) = p0 =[1 + (1 pi0)pi10 B1

]1= 1 (1 pi0)pi10 B1 + (1

pi0)2pi20 B

2. Result then follows on noting pi10 = {1 (1 pi0)}1 = 1+(1pi0).

2. Substituting in the formulae in the text we get

1 = (0.92 + 1.82)1 = 0.648 = 0.802;

1 = 0.648(93.3/0.92 + 93.0/1.82) = 93.2;

pio = ((93.0 93.3)/0.9) = (0.333) = 0.3696; pi0/(1 pi0) = 0.59;p0 = ((93.0 93.2)/0.8) = (0.25) = 0.4013; p0/(1 p0) = 0.67;B = 0.67/0.59 = 1.14.

3. n = 12, x = 118.58, S = 12969, s/n = 9.91, (x 100)/(s/n) = 1.875.

Taking a normal approximation this gives a P -value of 1 (1.875) = 0.0303.

4. n = 300, npi = 900/16 = 56.25, npi(1 pi) = 45.70, (56 56.25)/45.70 =0.037, so certainly not significant.

12

5. Posterior with reference prior is S11 2 with S1 = 12 and

= 31. Values of2 corresponding to a 90% HDR for log231 are (by interpolation) 19.741 and 45.898,so a 90% posterior HDR is from 1.65 to 3.82 which includes 3. So not appropriate toreject null hypothesis.

6. If k = 0.048 then exp(2k) = 1.101 = 1/0.908, so take within = k(/n)z =0.048

(/10)/2.5 = 0.006

.

7. Using standard power series expansions

B = (1 + 1/)12 exp[ 12 (1 + )1]

= 12 (1 + )

12 exp( 12z2) exp[12z2(1 + )1]

= 12 (1 + 12+ . . . ) exp( 12z2)[1 + 12z2(1 + )1 + . . . ]

= 12 exp( 12z2)(1 + 12(z2 + 1) + . . . ).

8. Likelihood ratio is

{2pi(+ )}n/2 exp[ 12x2i /(+ )]

{2pi( )}n/2 exp[ 12x2i /( )]

=(

1 +

)nexp

[

x2i /2]

= exp[(

x2i n)/2]

= exp[n

(x2i /n

1)]

.

9. p1(x)/ = { 12 ( + /n)1 + 12 (x )2/( + /n)2}p1(x) which van-ishes if + /n = (x )2 = z2/n so if = (z2 1)/n. Hence p1(x) 6(2pi/n)

12 z1 exp( 12 ). ConsequentlyB = (2pi/n)

12 exp( 12z2)/p1(x) >

e z exp( 12z2).

For last part use p0 =[1 + (pi1/pi0)B

1].10. Posterior probability is a minimum when B is a minimum, hence when 2 logBis a minimum, and

d (2 logB)/dn = {1 + n}1 + z2{1 + 1/n}2(1/n2)

which vanishes when n = z21. Since z21 is not necessarily an integer, this answeris only approximate.

11. Test against B(7324, 0.75) with mean 5493 and variance 1373, so z = (5474 5493)/

1373 = 0.51. Not significant so theory confirmed.

13

12. Likelihood ratio is

(2pi2)12 exp[ 12u2/(2)](2pi)

12 exp[ 12 (z )/]

(2pi2)12 exp[ 12u2/(2)](2pi/2)

12 exp[ 12 (z )/(/2)]

= (/2)12 exp[ 12u2(1//2 1/2) + 12 (z )2/]

= (/2) 12 exp[ 12u2/(2) + 12 (z )2/]

With

(/) = 100, u = 2(2) and z = , we get B = (100/2) exp(2) =9.57, although 2 standard deviation is beyond 1.96, the 5% level.

13. p1(x) =1()p(x|) d which is 1/ times the integral between + /2 and

/2 of an N(, /n) density and so as /n is nearly the whole integral. Hencep1(x) = 1/ , from which it easily follows that B = (2pi/n2) 12 exp( 12z2). InSection 4.5 we found B = (1 + n/)

12 exp[ 12z2(1 + /n)1], which for large

n is about B = (/n)12 exp( 12z2). This agrees with the first form found here if

2 = 2pi. As the variance of a uniform distribution on ( /2, + /2) is 2/12,this may be better expressed as 2/12 = (pi/6) = 0.52.

14. Jeffreys considers the case where both the mean and the variance are un-known, and wishes to test H0 : = 0 versus H1 : 6= 0 using the conventionalchoice of prior odds pi0/pi1 = 1, so that B = p0/p1. Under both hypotheses he usesthe standard conventional prior for , namely p() 1/. He assumes that the priorfor is dependent on =

as a scale factor. Thus if = / he assumes that

pi(, ) = p()p() so that and are independent. He then assumes that

(i) if the sample size n = 1 then B = 1, and

(ii) if the sample size n > 2 and S =

(Xi X)2 = 0, then B = 0, that is, p1 = 1.

From (i) he deduces that p() must be an even function with integral 1, while he showsthat (ii) is equivalent to the condition

0

p()n2 d =.

He then shows that the simplest function satisfying these conditions is p() = pi1(1+2)1 from which it follows that p() = p()|d/d| = pi1(2 + 2)1. Putting =

and generalizing to the case where H0 is that = 0 we get the distribution

in the question.There is some arbitrariness in Jeffreys simplest function; if instead he had taken

p() = pi1(2 + 2)1 he would have ended up with p() = pi1(2 + ( 0)

2)1 where = . However, even after this generalization, the argument is notoverwhelmingly convincing.

14

15. (a) Maximum likelihood estimator of is x/n, so

B >(0

)x(1 01

)nx; p0 >

1 + 1 pi0pi0

(

0

)x(1 1 0

)nx1 .(b) From tables (e.g. D. V. Lindley and W. F. Scott, New Cambridge ElementaryStatistical Tables, Cambridge: University Press 1995 [1st edn (1984)], Table 1, orH. R. Neave, Statistics Tables for mathematicians, engineers, economists and the be-havioural and management sciences, London: George Allen & Unwin (1978), Ta-ble 1.1) the probability of a binomial observation 6 14 from B(20, 0.5) is 0.9793,so the appropriate (two-tailed) P -value is 2(1 0.9793) = 0.0414 = 1/24. Themaximum likelihood estimate is = 15/20 = 0.75, so the lower bound on B is(0.5/0.75)15(0.5/0.25)5 = 220/315 = 0.0731, implying a lower bound on p0 of0.0681 or just over 1/15.

16. (a) n = 12, = 11, t = x/(s/n) = 1.2/

(1.1/12) = 3.96 and if we take

k = 1 the Bayes factor B is

(1 + t2/)(+1)/2

(1 + nk)12 (1 + t2(1 + nk)1/)(+1)/2

=0.004910

(0.2774)(0.5356)= 0.033.

(b) z = x/(s/n) = 1.2/

12 = 4.16 and (taking = as usual)

B = (1 + n)12 exp

[ 12z2(1 + 1/n)1]= (1 + 12)

12 exp

[ 12 (4.16)2(1 + 1/12)1] = 0.001.17. Two-tailed P -value is 0.0432 (cf. D. V. Lindley and W. F. Scott, New CambridgeElementary Statistical Tables, Cambridge: University Press 1995 [1st edn (1984)], Ta-ble 9), while Bayes factor is

(1 + t2/)(+1)/2

(1 + nk)12 (1 + t2(1 + nk)1/)(+1)/2

=0.08712

(0.3162)(0.7313)= 0.377

so F = 1/B = 2.65. Range (1/30P, 3/10P ) is (0.77, 6.94), so F is inside it and wedo not need to think again.

18. For P -value 0.1 think again if F not in ( 13 , 3). As p0 = [1 + F ]1 this means

if p0 not in (0.250, 0.750), so if n = 1000. Similarly if P -value 0.05, if p0 not in(0.143, 0.600), so if n = 1000 (and the case n = 100 is marginal); if P -value 0.01,if p0 not in (0.032, 0.231), so if n = 100 or n = 1000; if P -value 0.001, if p0 not in(0.003, 0.029), so if n = 50, n = 100 or n = 1000.

15

D.5 Exercises on Chapter 51. Mean difference w = 0.053, S = 0.0498, s = 0.0789. Assuming a standardreference prior for and a variance known equal to 0.07892, the posterior distributionof the effect of Analyst A over Analyst B is N(0.053, 0.07892/9) leading to a 90%interval 0.053 1.6449 0.0789/3, that is, (0.010, 0.097). If the variance is notassumed known then the normal distribution should be replaced by t8.

2. If variance assumed known, z = 0.053/(0.0789/3) = 2.028 so with =

B = (1 + n)12 exp[ 12z2(1 + 1/n)1]

= (1 + 9)12 exp[ 12 (2.028)2(1 + 19 )1] = 0.497

and p0 =[1 + 0.4971

]1= 0.33 with the usual assumptions. If the variance is not

assumed known,

B ={1 + 2.0282/8}9/2

1012 {1 + 2.0282101/8}9/2 =

0.1546

(0.3162)(0.7980)= 0.613

and p0 =[1 + 0.6131

]1= 0.38.

3. We know

(xi+yi) N(2, 2i) and

(xiyi) is independently N(0, 2

i),

so

(xi yi)2/2i 2n and hence if = 0 then

(xi + yi)/

(xi yi)2 tn.

Hence test whether = 0. If 6= 0, it can be estimated as 12

(xi + yi).

4. In that case

B = (1 + 440.5/99)12 exp[ 12 (1.91)2{1 + 99/440.5}1]

= 5.449512 exp(1.4893) = 0.53.

If the prior probability of the null hypothesis is taken as pi0 = 12 , then this gives aposterior probability of p0 = (1 + 0.531)1 = 0.35.

5. m = 9, n = 12, x = 12.42, y = 12.27, sx = 0.1054, sy = 0.0989.

(a) Posterior of is N(12.42 12.27, 0.12(1/9 + 1/12)) so 90% interval is 0.15 1.64490.00194, that is, (0.077, 0.223).

(b) S = 8 0.10542 + 11 0.09892 = 0.1965, s = 0.1965/19 = 0.102, sos

(m1+n1) = 0.045, so from tables of t19 a 90% interval is 0.151.7290.045,that is (0.072, 0.228).

16

6. With independent priors uniform in , and log , that is, p(, , ) 1/,p(, , |x,y) p(, , ) p(x|, ) p(y|, )

(1/)(2pi)(m+n)/2 exp[ 12{

(xi )2 + 12

(yi )2}/]

(m+n)/21 exp[ 12{Sx +m(x )2 + 12Sy + 12n(y )2}/]Writing S = Sx + 12Sy and = m+ n 2 we get

p(, , |x,y) /21 exp[ 12S/] (2pi/m)12 exp[ 12m( x)2)/2]

(2pi2/m) 12 exp[ 12m( y)2)/2] p(|S) p(|, x) p(|, y)

wherep( |S) is an S2 density,p( |, x) is an N(x, /m) density,p( |, x) is an N(y, 2/m) density,

It follows that, for given , the parameters and have independent normal distribu-tions, and so that the joint density of = and is

p(, |x,y) = p(|S) p(|x y, )where p( |x y, ) is an N(x y, (m1 + 2n1) density. This variance can nowbe integrated out just as in Sections 2.12 and 5.2, giving a very similar conclusion, thatis, that if

t = (x y)

s{m1 + 2n1} 12where s2 = S/, then t t .

7. We find that

S1 = S0 + Sx + Sy +m020 +mx

2 + n020 + ny

2 m121 n121and then proceed as in Exercise 18 on Chapter 2 to show that

S1 = S0 + Sx + Sy + (m10 +m

1)1(x 0)2 + (n10 + n1)1(y 0)2.

8. m = 10, n = 9, x = 22.2, y = 23.1, sx = 1.253, sy = 0.650. Consequently(s2x/m+ s

2y/n) = 0.452 and tan == (sy/

n)/(sx/

m) = 0.547, so = 30.

Interpolating in Table B.1 with 1 = 8 and 2 = 9 the 90% point of the Behrens dis-tribution is 1.42, so a 90% HDR is 22.223.11.420.452, that is, (1.54,0.26).

9. Evidently f1 = (m 1)/(m 3) and

f2 =(m 1)2

(m 3)2(m 5)(sin4 + cos4 ).

17

Also 1 = (sin2 +cos2 )2 = sin4 +cos4 +2 sin2 cos2 , so that sin4 +cos4 =1 sin2 2 = cos2 2. The result follows.

10. As Tx t(x) and Ty t(y) we have

p(Tx, Ty|x,y) (1 + T 2x/(x))((x)+1)/2(1 + T 2y /(y))((y)+1)/2

Jacobian is trivial, and the result follows (cf. Appendix A.18).

11. Set y = 1x/(2 + 1x) so that 1 y = 1/(2 + 1x) and dy/dx = 1/(2 +1x)

2. Then using Appendix A.19 for the density of x

p(y) = p(x)

dxdy x1/21(2 + 1x)(1+2)/22 y1/21(1 y)2/21

from which the result follows.

12. k = s21/s22 = 6.77, so

1 = k/ = 6.77/ F24,14. A 95% interval for F24,14from Table B.5 is (0.40, 2.73) so one for is (2.5, 16.9).

13. k = 3 with x = y = 9. A an interval corresponding to a 99% HDR for log Ffor F9,9 is (0.15, 6.54), so a 99% interval for

is from

3 0.15 to3 6.54, that

is, (0.67, 4.43).

14. Take 0+0 = 6, 0/(0+0) = 12 , 0+0 = 6, 0/(0+0) =23 , so 0 = 3,

0 = 3, 0 = 4, 0 = 2 and hence = 3 + a = 11, = 3 + b = 15, = 4 + c = 52, = 2 + d = 64 so that

log{( 12 )( 12 )/( 12 )( 12 )} = log 0.983 = 0.113while 1 +1 +1 + 1 = 0.192. Hence posterior of log-odds ratio is N(0.113, 0.192). The posterior probability that pi > is

(0.113/

0.192) = (0.257) = 0.3986.

15. Cross-ratio is (45 29)/(28 30) = 1.554 so its logarithm is 0.441. Moreaccurately, adjusted value is (44.5 28.5)/(27.5 29.5) = 1.563 so its logarithm is0.446. Value of a1 + b1 + c1 + d1 is 0.126, and so the posterior distribution ofthe log-odds ratio is N(0.441, 0.126) or more accurately N(0.446, 0.126).The posterior probability that the log-odds ratio is positive is (0.441/

0.126) =

(1.242) = 0.8929 or more accurately (0.446/

0.126) = (1.256) = 0.8955.With the same data sin1

(45/73) = 0.903 radians and sin1

(30/59) = 0.794

radians, and 1/4m + 1/4n = 0.00766, so the posterior probability that pi > is((0.903 0.794)/0.00766) = (1.245) = 0.8934.

18

16. = log(pi/) log{(1 pi)/(1 )} so if pi = we get = log{pi/(pi )} log{(1 pi)/(1 pi + )}

= log{1 )/pi}+ log{1 + /(1 pi)}= /pi + /(1 pi) = /{pi(1 pi)}.

17. With conventional priors, posteriors are near pi Be(56, 252) and Be(34, 212),so approximating by normals of the same means and variances pi N(0.1818, 0.0004814), N(0.1382, 0.0004822), so pi N(0.0436, 0.000963) so P(pi > 0.01) =((0.0436 0.01)/0.000963) = (1.083) = 0.8606 and so the posterior odds are0.8606/(1 0.8606) = 6.174.

18. Using a normal approximation, x N(8.5, 8.5) and y N(11.0, 11.0), so thatx y N(2.5, 19.5).

D.6 Exercises on Chapter 61. Straightforward substitution gives

Sample 1 2 3 4 5 Totaln 12 45 23 19 30 129r 0.631 0.712 0.445 0.696 0.535

tanh1 z 0.743 0.891 0.478 0.860 0.597n tanh1 z 8.916 40.095 10.994 16.340 17.910 94.255

Posterior for is N(94.255/129, 1/129) and 95% HDR is 0.7307 1.96 0.0880,that is, (0.5582, 0.9032). A corresponding interval for is (0.507, 0.718).

2. Another straightforward substitution gives

n 45 34 491/n 0.0222 0.0294 0.0204r 0.489 0.545 0.601

tanh1 z 0.535 0.611 0.695

Hence 1 2 N(0.535 0.611, 0.0222 + 0.0294), that is, N(0.076, 0.2272).Similarely 2 3 N(0.084, 0.2232) and 3 1 N(0.160, 0.2062). It followswithout detailed examination that there is no evidence of difference.

3. N (ni tanh1 ri)/ni , 1/ni) so required interval isni tanh

1 rini

1.96ni.

19

4. We found in Section 6.1 that

p( |x,y) p() (1 2)(n1)/2

(1 r)n(3/2)

As p()(1 2) 12 (1 r)3/2 does not depend on n, for large n

p( |x,y) (1 2)n/2

(1 r)n

so that log l( |x,y) = c+ 12n log(1 2) n log(1 r), and hence

(/) log l( |x,y) = n(1 2)1 + nr(1 r)1.

implying that the maximum likelihood estimator = r. Further

(2/2) log l( |x,y) = n(1 2)1 2n2(1 2)2 + nr2(1 r)2,

so that if = r we have (2/2) log l( |x,y) = n(1 2)2. This implies thatthe information should be about I( |x,y) = n(1 2)2, and so leads to a priorp() (1 2)1.

5. We have

p(|x,y) p()(1 2)(n1)/2 0

(cosh t r)(n1) dt

Now write r = cos so that

p(|x,y) p()(1 2)(n1)/2 0

(cosh t+ cos )(n1) dt

and set

Ik =

0

(cosh t+ cos )k dt

We know that I1 = / sin (cf. J. A. Edwards, A Treatise on the Integral Calcuous(2 vols), London: Macmillan (1921) [reprinted New York: Chelsea (1955)], art. 180).Moreover

(/)(cosh t+ cos )k = k sin (cosh t+ cos )(k+1)

and by induction

(/ sin )k(cosh t+ cos )1 = k!(cosh t+ cos )(k+1)

Differentiating under the integral sign, we conclude that

(/ sin )kI1 = k!Ik+1 (k > 0).

20

Taking k = n 2, or k + 1 = n 1, we get

Ik =

0

(cosh t+ cos )k dt (/ sin )k(/ sin ).

(ignoring the factorial). Consequently

p(|x,y) p()(1 2)(n1)/2(/ sin )k(/ sin ).Since d(r)/d = d( cos )/d = sin and so / sin = d/d(r), we couldalternatively write

p(|x,y) p()(1 2)(n1)/2 dn2

d(r)n2

(arc cos(r)(1 2r2) 12

)

6. Supposing that the prior is

p() (1 2)k/2

and r = 0 thenp(|x,y) p()(1 2)(n1)/2

so with the prior as in the question we get

p(|x,y) (1 2)(k+n1)/2

If we define

t =

(k + n+ 1)

(1 2) , =t{(k + n+ 1) + t2}

so that

1 2 = (k + n+ 1)(k + n+ 1) + t2

, 2ddt

=(k + n+ 1)2t

{(k + n+ 1) + t2}2ddt

=(k + n+ 1)

{(k + n+ 1) + t2}3/2

and hence

p(t|X,Y ) p(|X,Y ) d/ dt

[

(k + n+ 1)

(k + n 1) + t2](k+n1)/2

(k + n+ 1)

{(k + n+ 1) + t2}3/2

[1 +

t2

k + n+ 1

](k+n+2)/2This can be recognized as Students t distribution on k+n+ 1 degrees of freedom (seeSection 2.12).

21

7. See G. E. P. Box and G. C. Tiao, Bayesian Inference in Statistical Analysis, Read-ing, MA: Addison-Wesley (1973, Section 8.5.4the equation given in the question isimplicit in their equation (8.5.49)).

8. See G. E. P. Box and G. C. Tiao, Bayesian Inference in Statistical Analysis, Read-ing, MA: Addison-Wesley (1973, Section 8.5.4, and in particular equation (8.5.43)).

9. For a bivariate normal distribution

log l(, , |x, y) = log 2pi + 12 log 12(x )2 (x )(y ) 12(y )2

where = 2 = 1/. Then(/) log l = 12/ 12 (x )2(/) log l == 12/ 12 (y )2,(/) log l = / (x )(y ),

Consequently the information matrix, i.e. minus the matrix of second derivatives (tak-ing expectations is trivial as all elements are constant) is

I =

1222 121 + 122 2 121 + 1

22 1

222 2

2 2 1 + 222

=

1

22

2 2 22 2 22 2 2( + 2)

so that its determinant is

det I =1

86

2 2 22 2 22 2 2( + 2)

Adding 12/ times the last column to the first and

12/ times the last column to the

second we get

det I =1

86

0 0 2 1 21 1 2( + 2)

=

1

86(2)

(2 +

2)

=1

43

We thus conclude that det I = 1/43 implying a reference prior

p(, , 3/2.

22

Rather similarly we get

(, , )

(, , )=

2/2 1/ /2 2/2

1/ /2 2/2 2/2/2 /2 1/ 22/2

=

1

6

2 2 22 2 2 + 2

and evaluate this as 1/3. It then follows that

p(, , ) | 1/3| p(, , ) 3/2.

10. Age is y, weight is x. n = 12, x = 2911, y = 38.75, Sxx = 865, 127,Syy = 36.25, Sxy = 4727. Hence a = y = 38.75, b = Sxy/Sxx = 0.005464,r = Sxy/

(SxxSyy) = 0.8441, See = SyyS2xy/Sxx = Syy(1 r2) = 10.422, and

s2 = See/(n 2) = 1.0422. Take x0 = 3000 and get a + b(x0 x) = 39.24as mean age for weight 3000. For a particular baby, note that the 95% point oft10 is 1.812 and s

{1 + n1 + (x0 x)2/Sxx} = 1.067 so a 90% interval is39.24 1.812 1.067, that is, (37.31, 41.17). For the mean weight of all such ba-bies note s

{n1 + (x0 x)2/Sxx} = 0.310, so interval is (38.68, 39.80).11. From the formulae in the text

See = Syy S2xy/Sxx = Syy 2S2xy/Sxx + S2xy/Sxx = Syy 2bSxy + b2Sxxwhere b = Sxy/Sxx. Hence

See =

(yi y)2 2b

(yi y)(xi x) + b2

(xi x)2

={(yi y) b(xi x)}2.

The result now follows as yi = a.

12. Clearly (as stated in the hint)ui =

vi =

uivi = 0, hence u = v = 0

and Suv = 0. We now proceed on the lines of Section 6.3, redefining See as Syy Suy2/Suu S2vy/Svv and noting that

(yi ui vi)2 ={(yi y) + (y ) ui vi}2

= Syy + n(yi y)2 + 2Suu + 2Svv 2Suy 2Svy= Syy S2uy/Suu S2vy/Svv + n(yi y)2

+ Suu( Suy/Suu)2 + Svv( Svy/Svv)2= See + n(yi y)2 + Suu( Suy/Suu)2 + Svv( Svy/Svv)2

23

We consequently get a density p(, , , |x,y) from which we integrate out both and to get

p(, |x,y) n/2 exp[ 12{See + n( a)2}/].The result now follows.

13. I = 4,Ki = 28, G = 1779,

x2ik = 114, 569, C = G

2/N = 113, 030.Further, ST = 1539, St = (4262 + 4612 + 4502 + 4422)/7 C = 93 and hence theanalysis of variance table is as follows:

ANOVA TableDegrees

Sum of of MeanSource squares freedom square RatioTreatments 93 6 31 (< 1)Error 1446 24 60TOTAL 1539 27

We conclude that the results from the four samples agree.

14. d = 2 + 4 + 6 3 5 7 with d = 1.4 and Kd = (6/4)1 so that0.82(d+ 1.4)/2.74 t25.

15. The analysis of variance table is as follows:

ANOVA TableDegrees

Sum of of MeanSource squares freedom square RatioTreatments 49,884 2 24,942 13.3Blocks 149,700 5 29,940 16.0Error 18,725 10 1,872TOTAL 218,309 17

We conclude that the treatments differ significantly (an F2,10 variable exceeds 9.43with probability 0.5%).

16. This is probably seen most clearly by example. When r = t = 2 we take

x =

x111x112x121x122x211x212x221x222

, A =

1 1 0 1 0 01 1 0 1 0 01 1 0 0 1 11 1 0 0 1 11 0 1 1 0 11 0 1 1 1 11 0 1 0 1 01 0 1 0 1 0

, =

121212

.

24

17. EvidentlyA+A = I from which (a) and (b) and (d) are immediate. For (c), noteAA+ = A(ATA)1AT which is clearly symmetric.

18. We take

A =

1 x11 x2

. . .1 xn

, ATA =(

nxi

xix2i

)

so that writing S =

(xi x)2 we see that det(ATA) = nS and hence

(ATA)1 =1

nS

( x2i

xi

xi n), ATy =

( yixiyy

)from which = (ATA)1ATy is easily found. In particular

1 =1

nS

(

xi

yi + n

xiyi

)=

(yi y)(xi x)

(xi x)2 .

D.7 Exercises on Chapter 71. Recall that t is sufficient for given x if p(x|) = f(t, )g(x) (see Section 2.9).Now

p(t|) ={p(x|) + p(y|) = p(z|) if t = t(z)p(x|) if t = t(x) for x 6= y

so that

p(t|) ={

0 if x = yp(x|) if x = t

Setting f(t, ) = p(t|) and g(x) = 1 (x 6= y), g(y) = 0, it follows that t is sufficientfor x given . It then follows from a nave appliaction of the weak sufficiency principlethat Ev(E, y, ) = Ev(E, z, ). However if x is a continuous random variable, thenpx(y|) = 0 for all y, so we may take y and z as any two possible values of x.

2. l( | g(x)) = l( |h(x)) from which the result follows.

3. The likelihood function is easily seen to be

l(|x) = 13

for =

x/2, 2x, 2x+ 1 when x is even(x 1)/2, 2x, 2x+ 1 when x 6= 1 is odd1, 2, 3 when x = 1.

The estimators d1, d2 and d3 corresponding to the smallest, middle and largest possible are

d1(x) =

x/2 when x is even(x 1)/2 when x 6= 1 is odd1 when x = 1,

25

and d2(x) = 2x, d3(x) = 2x+ 1. The values of p(d1 = ) given in the question nowfollow. However, a Bayesian analyis leads to a posterior

p(|x) = l(|x)pi()p(x)

=pi()IA()

pi(d1(x)) + pi(d2(x)) + pi(d3(x))

where pi() is the prior, A = {d1(x), d2(x), d3(x)} and IA() is the indicator functionof A. Thus, indeed, the data conveys nothing to the Bayesian version except that is d1(x), d2(x), or d3(x). However, Bayesians are indifferent between d1(x), d2(x),or d3(x) only if they have equal prior probabilities, which cannot hold for all x if theprior is proper. For a discussion, see J. O. Berger and R. L. Wolpert, The LikelihoodPrinciple, Hayward, CA: Institute of Mathematical Statistics (1984 and 1988, Example34).

4. Computation of the posterior is straightforward. For a discussion, see J. O. Bergerand R. L. Wolpert The Likelihood Principle, Hayward, CA: Institute of MathematicalStatistics (1984 and 1988, Example 35).

5. Rules (a) and (b) are stopping times and rule (c) is not.

6. n = 4, x = 1.25, z = 1.25

4 = 2.50.

(a) Reject at the 5% level (and indeed possibly do so on the basis of the first observa-tion alone).

(a) Since B = (1 + n/)12 exp[ 12z2(1 + /n)1] = 1.07 with = , so with

pi0 =12 we get p0 = (1 + B

1)1 = 0.52. Null hypothesis still more probable thannot.

7. As we do not stop the first four times but do the fifth time

p(x|) = 3+1+2+5+7

3! 1! 5! 2! 7!exp(5)3

3

1

4

2

6

5

11

11

18

l(|x) 10 exp(5).

8. ES = E(S+1)1 and after re-arrangement E(S+1) is (s+1)(R2)/(r2)times a sum of probabilities for the beta-Pascal distribution with S replaced by (S+1),with s replaced by (s+ 1), and with r replaced by (r 1). As probabilities sum tounity, the result follows.

9. Up to a constant

L(pi|x, y) = log l(pi|x, y) = (x+ n) log pi + (n x+ y) log(1 pi)

26

so (2L(pi)|x, y)/pi2) = (x + n)/pi2 + (n x + y)/(1 pi)2. Because theexpectations of x and y are npi and n(1 pi)/pi respectively, we get I(pi|x, y) =n(1 + pi)/pi2(1 pi), so that Jeffreys rule leads to

p(pi|x, y) (1 + pi) 12pi1(1 pi) 12 .

10. Eu(x) =u(x)p(x) =

u(x)2x suggesting you would enter provided

e < Eu(x). If u(x) x then Eu(x) 2x2x = resulting in the implausi-ble proposition that you would pay an arbitrarily large entry fee e.

11. By differentiation of the log-likelihood L(pi|x) = x log + (n x) log(1 )with respect to we see that x/n is the maximum likelihood estimator.

Because prior for is uniform, that is, Be(1, 1), posterior is Be(x+ 1, n x+ 1).The question deals with a particular case of weighted quadratic loss, so we find d(x) as

Ew(|x) = E((1 )1|x)

E(1(1 )1 |x) =B(x+ 1, n x)

B(x+ 1, n x+ 1)B(x+ 1, n x+ 1)

B(x, n x) =x

n.

If x = 0 or x = n then the posterior loss (a, x) is infinite for all a because the integraldiverges at x = 0 or at x = n so the minimum is not well-defined.

12. The minimum of E(pi)22a(E(pi)+a2 clearly occurs when a = E(pi).But since the prior for pi is uniform, that is, Be(1, 1), its posterior is Be(x+1, nx+1)and so its posterior mean is (x+1)/(n+2); similarly for y. We conclude that the Bayesrule is

d(x, y) = (x y)/(n+ 2).

13. Posterior mean (resulting from quadratic loss) is a weighted mean of the compo-nent means, so with the data in Section 2.10 is

10

10 + 20+

20

10 + 20=

115

129

10

10 + 20+

14

129

20

10 + 20= 0.370.

Posterior median (resulting from absolute error loss) can be found by computing theweighted mean of the distribution functions of the two beta distributions for variousvalues and honing in. Result is 0.343. Posterior mode (resulting from zero-one loss)is not very meaningful in a bimodal case, and even in a case where the posterior is notactually bimodal it is not very useful.

14. If we take as loss function

L(, a) =

{u(a ) if a 6 v( a) if a >

27

Suppose m(x) is a v/(u+ v) fractile, so that

P(x 6 m(x)

)> v/(u+ v), P

(x > m(x)

)> u/(u+ v).

Suuppose further that d(x) is any other rule and, for definiteness, that d(x) > m(x)for some particular x (the proof is similar if d(x) < m(x)). Then

L(,m(x)) L(, d(x)) = u[m(x) d(x)] if 6 m(x)(u+ v) [um(x) + vd(x)] if m(x) < < d(x)

v[d(x)m(x)] if > d(x)

while for m(x) < < d(x)

(u+ v) [um(x) + vd(x)] < u[ m(x)] < u[d(x)m(x)]

so that

L(,m(x)) L(, d(x)) 6{u[m(x) d(x)] if 6 m(x)v[d(x)m(x)] if > m(x)

and hence on taking expectations over

(m(x), x) (d(x), x) 6 {u[m(x) d(x)]}P( 6 m(x) |x)+{v[d(x)m(x)]}P( > m(x) |x)

={d(x)m(x)}{uP( 6 m(x) |x)+ vP( > m(x) |x)}

6{d(x)m(x)}{uv/(u+ v) + uv/(u+ v)} = 0

from which it follows that taking a v/(u+ v) fractile of the posterior distribution doesindeed result in the appropriate Bayes rule for this loss function.

15. By integration by parts, if Be(2, k) then

P( < ) =

0

k(k + 1)(1 )k d

=[(k + 1)(1 )k]

0+

0

(k + 1)(1 )k d

= [(k + 1)(1 )k (1 )k+1 = 1 (1 )k(1 + k).

In this case, the prior for is Be(2, 12) so that the prior probability of H0, that is, that < 0.1, is 0.379, while the posterior is Be(2, 18) so that the posterior probability that < 0.1 is 0.580.

(a) With 01 loss we accept the hypothesis with the greater posterior probability,in this case H1.

28

(b) The second suggested loss function is of the 0Ki form and the decision de-pends on the relative sizes of p1 and 2p0. Again this results in a decision in facour ofH1.

16. Posterior expected losses are

(a0, x) = 10p1, (a1, x) = 10p0, (a2, x) = 3p0 + 3p1.

Choose a0 if 0 6 p0 6 0.3, choose a1 if 0.3 6 p0 6 0.7 and choose a2 if 0.7 6 p0 61.

17. Posterior variance is (2251 + 1001)1 = 69.23 = 8.322 and the posteriormean is 69.23(100/225 + 115/100) = 110.38. Posterior expected losses are

p(a1, x) =

11090

( 90)pi(|x) d + 110

2( 90)pi(|x) d

Now note that (with (x) the N(0, 1) density function 11090

( 90)pi(|x) d =

69.23

0.0462.450

z(z) d + 20.38 0.0462.450

(z) d

=

(69.23/2pi){exp( 120.0462) exp( 122.4502)}+ 20.39{(0.046) (2.450)}

= 3.15 + 9.66 = 6.51.By similar manipulations we find that (a1, x) = 34.3, (a2, x) = 3.6 and (a3, x) =3.3, and thus conclude that a3 is the Bayes decision.

18. In the negative binomial case

p(pi|x) = p(pi)p(x|pi)/px(x)

=

(n+ x 1

x

)(1 pi)npix/px(x)

It follows that the posterior mean is

E(pi|x) =pi

(n+ x 1

x

)(1 pi)npix/px(x)

=x+ 1

n+ x

(n+ x

x+ 1

)(1 pi)npix/px(x)

=(x+ 1)

(n+ x)

px(x+ 1)

px(x)

This leads in the same way as in Section 7.5 to the estimate

n =( + 1)fn( + 1)

(n+ )(fn() + 1).

29

D.8 Exercises on Chapter 81. Write = /(+ ) and = (+ )1/2. The Jacobian is / // /

= /(+ )2 /(+ )2 12 (+ )3/2 12 (+ )3/2 = (+ )5/2

from which the result follows.

2. p(, |x) p() p(|) p(x|). So

p(, |x) n

xii exp((+ 1)i)

and by the usual changeof-variable rule (using |dz/ d| = 1/(1 + )2)

p(z, |x) zn2(1 z)n

xii exp(i/z).

Integrating with respect to all the i usingx exp(/z) zx+1 we get

p(z|x) znx2(1 z)n

or z Be(nx 1, n+ 1), from which it follows that Ez = (nx 1)/(nx+ n). Nownote that

p(i |xi, ) p(xi|i) p(i|) xii exp(i) exp(i) xii exp(i/z)

which is a G(xi, z) or 12z2xi distribution, from which it follows that E(i |xi, ) = xiz

and soE(i|x) = xiE(z|x) = xi(nx 1)/(nx+ n).

3. (a) Use same estimator = B . Expression for (0,X) has S1 replaced by aweighted sum of squares about .

(b) Use estimator with i the posterior median of the ith component.

4. We find the mean square error of the transformed data is 3.14 times smaller andof the equivalent probabilities is 3.09 (as opposed to corresponding figures of 3.50 and3.49 for the Efron-Morris estimator) For a complete analysis, see the program

http://www-users.york.ac.uk/~pml1/bayes/rprogs/baseball.txt

or

http://www-users.york.ac.uk/~pml1/bayes/cprogs/baseball.cpp

30

5. Evidently ATA = I or equivalently jTk = 1 if j = k and 0 if j 6= k. Itfollows that Wj N(0, 1) and that C(Wj ,Wk) = 0 if j 6= k which, since the Wj havea multivariate normal distribution, implies that Wj and Wk are independent. Further,asATA = I we also haveAAT = I and so

W TW = (X )TAAT(X ) = (X )T(X ).

For the last part, normalize the ij for fixed j to produce a unit vector and the restfollows as above.

6. We know that

R(, JS)R(, JS+ ) = 1r

[E(JS

2 JS+2) 2iE

(JSi JS+i

)].

To show that the experession in brackets is always > 0, calculate the expectation byfirst conditioning on S1. For any value S1 6 r 2, we have JSi = JS+i so that itis enough to show that the right hand side is positive when conditional on any valueS1 = s1 > r 2. Since in that case JS+i = 0, it is enough to show that for anys1 > r 2

iE[JSi

S1 = s1] = i(1 r 2s1

)E(Xi i |S1 = s1) 6 0

and hence it is enough to show that iE(Xi i |S1 = s1) > 0. Now S1 = s1is equivalent to (X1 )2 = s1

{(X2 )2 + + (Xn )2

}. Conditioning

further on X2, . . . , Xn and noting that

P(X1 = y |S1 = s1, X2 = x2, . . . , Xn = xn) exp( 12 (y 1)2)

P(X1 = y |S1 = s1, X2 = x2, . . . , Xn = xn) exp( 12 (y + 1)2)

we find that

E [1(X1 1) |S1 = s1, X2 = x2, . . . Xn = xn] = 1y(e1y e1y)

e1y + e1y

where y =s1 (x2 )2 (xn )2. The right hand side is an increasing

function of |1y|, which is zero when 1y = 0, and this completes the proof.

7. Note that, asATA+ kI andATA commute,

k ={

(ATA+ kI) (ATA)} (ATA)1(ATA+ kI)1ATx= k(ATA)1k.

The bias isb(k) =

{(ATA+ kI)1ATA I},

31

and the sum of the squares of the biases is

G(k) = b(k)Tb(k) = (Ek )T(Ek ).The variance-covariance matrix is

V(k) = (ATA+ kI)1ATA(ATA+ kI)1

so that the sum of the variances of the regression coefficients is

F(k) = E(k E)T(k E)= Trace(V(k))= Trace

{(ATA+ kI)1ATA (ATA+ kI)1

}and the residual sum of squares is

RSSk = (xAk)T(xAk)= E{(xA) +A( k)}T{(xA) +A( k)}.

BecauseAT(xA) = ATxATA(ATA)1ATx = 0, this can be written as

RSSk = (xA)T(xA) + ( k)TATA( k)= RSS + k2 kT(ATA)1k

while the mean square error is

MSEk = E(k )T(k )= E{(k Ek) + (Ek )}T{(k Ek) + (Ek )}= E(k Ek)T(k Ek) + (Ek )T(Ek )= F(k) + G(k)

using the fact that E(k Ek) = 0.It can be shown that F(k) is a continuous monotonic decreasing function of k with

F (0) < 0, while G(k) is a continuous monotonic increasing function with G(0) =G(0) = 0 which approaches T as an upper limit as k . It follows that therealways exists a k > 0 such that MSEk < MSE0. In fact, this is always true whenk < 2/T (cf. C. M. Theobold, Generalizations of mean square error applied toridge regression, Journal of the Royal Statistical Society Series B, 36 (1974), 103106).

8. We defined

H1 = 1 1B (BT1B)BT1so that

BTH1B = BT1B BT1B = 0.

32

IfB is square and non-singular then

H1 = 1 1BB1 (BT)1BT1 = 1 1 = 0.9. It is easily seen that

BT1 =(1

1 0 0

0 0 1 1

),

BT1B =(

21 00 21

)so that

H1 = 1 1B (BT1B)BT1=

121 121 0 0 121 121 0 0

0 0 121 121

0 0 121 121

while

ATA =

4 0 2 20 4 2 22 2 4 02 2 0 4

.Now

K1 = AT1A+H1

=

4+ 12

1 121 2 2 121 4+ 121 2 2

2 2 4+ 121 121

2 2 121 4+ 121

10. See D. V. Lindley and A. F. M. Smith, Bayes estimates for the linear model(with discussion), Journal of the Royal Statistical Society Series B, 34 (1971), 141[reprinted in N. Polson and G. C. Tiao, Bayesian Inference (2 vols), (The InternationalLibrary of Critical Writings in Econometrics, No. 7), Aldershot: Edward Elgar (1995)].

11. We have all ai = a so uTu = mb/a so 1 + uTu = (amb)/a and thus

ii =1

a

(1 1

1 + uTu

b

a

)=

1

a

(1 b

amb)

=a (m+ 1)ba(amb)

ij = (m+ 1)ba(amb)

where a = n/+ 1/ and b = 1/r.

33

D.9 Exercises on Chapter 91. A simulation in R, the program for which can be seen on

http://www-users.york.ac.uk/~pml1/bayes/tprogs/integral.txt

resulted in values 1.684199, 1.516539, 1.7974, 1.921932, 1.595924, 1.573164, 1.812976,1.880691, 1.641073, 1.770603 with a mean of 1.719450 as as opposed to the theoreti-cal value of e 1 = 1.71828. The theoretical variance of a single value of eX whereX U(0, 1) is 12 (e 1)(3 e) = 0.24204 so that the standard deviation of the meanof 10 values is

0.24204/10 = 0.15557 which compares with a sample standard

deviation of 0.1372971.A simulation in C++, the program for which can be seen on

http://www-users.york.ac.uk/~pml1/bayes/cprogs/integral.cpp

resulted in values 1.74529, 1.86877, 1.68003, 1.95945, 1.62928, 1.62953, 1.84021,1.87704, 1.49146, 1.55213 with a mean of 1.72732 and a sample standard deviation of0.155317.

2. The easiest way to check the value of At is by induction. Then note that for larget the matrixAt is approximately (

2/5 3/52/5 3/5

)from which the result follows.

3. In this case

Q = E[(y1 + y2 1) log + (y3 + y4 1) log(1 )]

giving

(t+1) =E(y1 | (t),x) + x2 1

E(y1 | (t),x) + x2 + x3 + E(y4 | (t),x) 1where y1 B(x1, /( + 1)) and y4 B(x4, (1 )/(2 )), so that

(t+1) =x1

(t)/((t) + 1) + x2 1x1(t)/((t) + 1) + x2 + x3 + x4(1 (t))/(2 (t)) 2

=461(t)/((t) + 1) + 130 1

461(t)/((t) + 1) + 130 + 161 + 515(1 (t))/(2 (t)) 2 .

Starting with (0) = 0.5, successive iterations give 0.4681, 0.4461, 0.4411, 0.4394,0.4386, 0.4385, and thereafter 0.4384. This compares with a value of 0.439 found bymaximum likelihood in C. A. B. Smith, op. cit.

34

4. The proof that p((t+1) |x) > p((t+1) |x) in the subsection Why the EM al-gorithm works only uses the properties of a GEM algorithm. As for the last part, ifconvergence has occurred there is no further room for increasing Q, so we must havereached a maximum.

5. Take prior for variance S02 , that is, 35024, and prior for mean which is N(0, 0)

where 0 = 85 and 0 = S0/n0(2) = 350/(12) = 175. We then have data withn = 100, x = 89 and S1 = (n 1)s2 = 2970. Successive iterations using the EMalgorithm give the posterior mean as 88.9793 and at all subsequent stages 88.9862.This compares with 88.96 as found in Exercise 16 on Chapter 2.

6. Means for the four looms are 97.3497, 91.7870, 95.7272 and 96.8861, overallmean is 97.4375. Variance of observations from the same loom is 1.6250 and varianceof means from different looms in the population is 5.1680.

7. (a) For the example on genetic linkage in Section 9.3, see

http://www-users.york.ac.uk/~pml1/bayes/rprogs/dataaug.txt

or

http://www-users.york.ac.uk/~pml1/bayes/cprogs/dataaug.cpp

(b) For the example on chained data sampling due to Casella and George, see a similarfile with chained in place of dataaug.

(c) For the example on the semi-conjugate prior with a normal likelihood (using boththe EM algorithm and chained data augmentation, see a similar file with semiconjin place of dataaug.

(d) For the example on a change-point analysis of data on coal mining disasters, seesimilar files with coalnr.cpp or coal.cpp in place of dataaug.cpp (the dif-ference is that the former uses only routines in W. H. Press et al., Numerical Recipesin C (2nd edn), Cambridge: University Press 1992, whereas the latter program, whichis in many ways preferable, uses gamma variates with non-integer parameters).

8. See the program referred to in part (a) of the previous answer.

9. See the program referred to in part (b) of the answer to the question before last.

10. See the program

35

http://www-users.york.ac.uk/~pml1/bayes/rprogs/semicon2.txt

or

http://www-users.york.ac.uk/pml1/bayes/cprogs/semicon2.cpp

11. B. P. Carlin and T. A. Louis, Bayes and Empirical Bayes Methods for Data Anal-ysis (2nd edn), Chapman and Hall 2000, p. 149, remark that

In our dataset, each rat was weighed once a week for five consecutive weeks. Infact the rats were all the same age at each weighing xi1 = 8, xi2 = 15, xi3 = 22,xi4 = 29, and xi5 = 36 for all i. As a result we may simplify our computations byrewriting the likelihood as

Yij N(i + i(xij x) , 2), i = 1, ..., k, j = 1, ..., ni,

so that it is now reasonable to think of i and i as independent a priori. Thus wemay set = Diag(2,

2), and replace the Wishart prior with a product of idependent

inverse gamma priors, say IG(a, b) and IG(a , b).This being so, it probably suffices to proceed as in the Rats example supplied with

WinBUGS.A more detailed description of the general set-up is to be found in Taken from

Illustration of Bayesian Inference in Normal Data Models Using Gibbs Sampling,Alan E. Gelfand, Susan E. Hills, Amy Racine-Poon, and Adrian F. M. Smith, Journalof the American Statistical Association 85 (412) (1990), 972985, Section 6, pp. 978979, which can be found at

http://www-users.york.ac.uk/~pml1/bayes/rprogs/gelfand.pdf

(LATEXsource at gelfand.htm). A program for generating random matrices witha Wishart distribution based on the algorithm of Odell and Feiveson described inW. J. Kennedy and J. E. Gentle, Statistical Computing, Marcel Dekker 1980, pp. 231232 can be found at

http://www-users.york.ac.uk/~pml1/bayes/rprogs/wishart.txt

while the relevant extract from Kennedy and Gentle is at

http://www-users.york.ac.uk/~pml1/bayes/rprogs/rwishart.pdf

12. See the program at

http://www-users.york.ac.uk/~pml1/bayes/rprogs/linkagemh.txt

13. See the program at

36

http://www-users.york.ac.uk/~pml1/bayes/winbugs/wheat.txt

or

http://www-users.york.ac.uk/~pml1/bayes/rprogs/wheat.txt

14. See the program at

http://www-users.york.ac.uk/~pml1/bayes/winbugs/logisti2.txt

or

http://www-users.york.ac.uk/~pml1/bayes/rprogs/logisti2.txt

D.10 Exercises on Chapter 101. We note that, since EX2 > (EX)2,

Ew(x)2 =

(f(x)q(x)

p(x)

)2p(x) dx >

( |f(x)|q(x)p(x)

p(x) dx)2

=

(|f(x)|q(x) dx

)2.

If we take

p(x) =|f(x)|q(x) |f()|q() d

then

Ew(x)2 =

(|f()|q() d

)2p(x) dx. =

(|f(x)|q(x) dx

)2so that this choice of p(x) minimizes Ew(x)2 and, since we always have Ew(x) = ,consequently minimizes Vw(x).

2. For sampling with the Cauchy distribution we proceed thus:

(a) Substituting x = tan we find

= P(x > 2) =

2

1

pi

1

1 + x2dx = 12 tan1(2)/pi = tan1( 12 )/pi

= 0.147 583 6.

The variance is(1 )/n = 0.125 802 7/n.

37

(b) Use the usual change of variable rule to deduce the density of y. Then note that

q(yi)

p(yi)=

1/(pi(1 + y2i ))

2/y2i=

1

2pi

y2i1 + y2i

and that we can take f(y) = 1 as all values of y satisfy y > 2.

(c) Substitute xi = 2/yi in the above to get

1

2pi

4

4 + x2i.

(d) On writing x = 2 tan and 0 = tan1( 12 ) we find

E =

10

1

2pi

4

4 + x2dx =

10

1

2pi

1

sec2 2 sec2 d = 0/pi =

Similarly, noting that sin(20) = 4/5 the integral for E2 transforms to 10

1

4pi21

sec4 2 sec2 d =

10

1

4pi22 cos2 d

=

10

1

4pi2{1 + cos(2)} d

=[ + 12 sin(2)

]00/(4pi)2

={

tan1( 12 ) +25

}/(4pi2)

= 0.021 876 4

so that

V = E2 (E)2 = 0.000 095 5.

3. A suitable program is

n

A run of this program resulted in a value of the mean of 0.399 as compared with thetrue value of 0.4 and a value of the mean of 0.0399 compared with a true value of 0.04.

4. Continue the program (avoiding ambiguity in the definition of ) by

theta

- for (i in 2:r) {phi[i]
= 12 log(/) E(x )2/+ E(x )2/

where x N(, ). By writing (x)2 = {(x)+()}2 it is easily concludedthat

2I(q : p) = log(/) + ( )/ + ( )2/.In particular if = then

I(q : p) = 12 ( )2/.

8. We find

I(q : p) =

q(x) log{q(x)/p(x)} dx

=

0

1 exp(x/) log{(1 exp(x/))/2(2pi)1/2 exp( 12x2)}

= 12 log(8pi) log Ex/ + 12Ex2

where x E(). It follows that

I(q : p) = 12 log(8pi) log 1 + 2

and hence

I(q : p)

= 1

+ 2

2I(q : p)

2=

1

2+ 2

It follows that I(q : p) is a minimum when =

2.

9. See the paper by Corduneaunu and Bishop (2001).

10. The model is

( 14 +14,

14,

14 (1 ), 14 (1 ) + 14 )

and the values quoted are x1 = 461, x2 = 130, x3 = 161 and x4 = 515. An Rprogram for the ABC-REJ algorithm is

N

- for (j in 1:N) {etatrial
Let Lj(j) denote the likelihood of j given model Mj . Then the acceptance probabil-ity for the inclusion of a parameter (either or set equal to u) is

min

{1,

Li(i)

Lj(j) pi(i,Mi)pi(j ,Mj)

1/21/2 1

exp( 12u2/2)/

2pi2

}.

43