173
Tutorials T - 1

Bayesian tutorials

Embed Size (px)

Citation preview

Page 1: Bayesian tutorials

Tutorials

T - 1

Page 2: Bayesian tutorials

1. Outline

• We treat here clinical trial examples, problems in biostatistics and in epidemiology

analyzed in a Bayesian way using First Bayes, R, and WinBUGS (and CODA &

BOA).

• Contents:

• Section 2: First Bayes applications

• Section 3: Applications in R or S+

• Section 4: WinBUGS applications

T - 2

Page 3: Bayesian tutorials

2. First Bayes

T - 3

Page 4: Bayesian tutorials

2.1. Notes on using First Bayes:

• Click on First Bayes icon.

• Select Data from File menu.

• Select file Firenze 2003.1bd.

• Select strokepilot, strokeinterim1, plaque, cholesterol, pancreas and load them

into memory

• Quit Data window.

• Use the Analysis menu to select an appropriate likelihood for the data.

• Enter a prior, followed by Next and then fill out the parameter values for the prior.

T - 4

Page 5: Bayesian tutorials

2.2. Analyzing pilot stroke data (strokepilot)

1. In the pilot study on stroke, 10 of 30 patients who were on active treatment suffered

from a severe bleeding accident.

2. Use a vague prior to derive the posterior distribution.

3. Summarize the posterior information by the posterior mean, median, mode and SD.

Also, find the 95% HPD interval.

4. What is the posterior probability that the incidence of SBAs is lower than 0.5?

5. Plot the posterior distribution, together with the prior and the likelihood.

6. What is the probability to see more than 50% of the patients under active treatment

with a SBA in the first interim analysis?

T - 5

Page 6: Bayesian tutorials

2.3. Analyzing 1st interim stroke data (strokeinterim1)

• At the first interim analysis of the main study 5 out of 20 stroke patients on active

treatment suffered from a severe bleeding accident.

1. Use the pilot stroke data as a prior to derive the posterior distribution.

2. Summarize the posterior information.

3. Assume that the pilot data actually come from 2 pilot studies, each with 15 patients.

One study showed 8 patients on active treatment showing a SBA, the second study

showed 2 patients on active treatment with a SBA. Further, it is believed that both

prior informations are equally valid for the 1st interim analysis. Specify the prior in

First Bayes and derive the posterior distribution given the 1st interim analysis results.

T - 6

Page 7: Bayesian tutorials

2.4. A clinical trial on a new mouthwash to reduce plaque (plaque)

• Consider a trial that tests whether daily use of a new mouthwash before tooth brush-

ing reduces plaque when compared to using tap water only. The majority of the pre-

vious trials (say 95%) on similar products showed an overall reduction in plaque be-

tween 5 to 15%. A small study is completed and the results are summarized in

plaque.

1. Use the above information to specify a prior for the percent reductions. Check that

the prior distribution satisfies the desired property.

2. Summarize the posterior information.

T - 7

Page 8: Bayesian tutorials

2.5. IBBENS study (cholesterol)

• Replay the analyses of Chapter 3.

T - 8

Page 9: Bayesian tutorials

2.6. PANCREAS study (pancreas)

• Replay the analyses of Chapter 3.

T - 9

Page 10: Bayesian tutorials

3. R or S+

T - 10

Page 11: Bayesian tutorials

3.1. Risk of leukaemia following Hodgkin’s disease (example Chapter 3)

• Write a program (or apply the program) in R (or S+) to obtain posterior information

on the log odds ratio.

• Apply the two approaches.

• Sample also from the posterior distribution to obtain the posterior information of the

odds ratio.

T - 11

Page 12: Bayesian tutorials

3.2. Use of prior information in the analysis of a clinical trial: predictive power and Bayesian stopping rules

3.2.1 The problem

• DerSimonian (SIM, 1996, 15, 1237-1248) considers using prior information from a

meta-analysis on the use of calcium supplementation for the prevention of pre-

eclampsia in pregnant women for the planning of a new trial (CEP).

• Pre-eclampsia: serious complication of pregnancy happening late in gestation and

characterized by an elevation of blood pressure, development of proteinuria. In US

this happens in 5-10% of all pregnancies.

T - 12

Page 13: Bayesian tutorials

• Meta-analysis: two previous trials (MEDLINE) with similar conditions as the clinical

trial yield median odds-ratio of 0.51 with 99% C.I. of [0.13, 2.01] corresponding to

mean log odds-ratio (θ) of -0.68 (µ0) and variance of 0.54² ( 20σ ).

• Clinical trial (CEP):

• Placebo-controlled RCT

• Nulliparous women, gestational age 13-21 wks

• Expected average dietary calcium intake about 1200mg/day,

• Calcium supplementation of 2000mg/day

• 5 centres, from 1-4-1992 to 31-3-1995 and F.U. until 30-10-1995

• 4% pre-eclampsia assumed in placebo, assumed effect: θ = -0.45

• α = 0.05 (2-sided), 1-β = 0.85

• 4500 patients are needed

( )ˆ≡2σ var θ = 0.023 • With 4500 patients (and assuming 4% in each treatment group):

T - 13

Page 14: Bayesian tutorials

3.2.2 Bayesian inference

• Prior distribution: ( ) ( )20 0 0p ,θ = φ θ µ σ

• Likelihood: ( ) ( )2ˆ ˆp ,θ = φ θ θ σ ( 2σ = 0.023)

• Posterior distribution: ( ) ( )2 2 2 2

20 0 02 2 2 2

0 0

ˆˆp , ,⎛ ⎞σ µ + σ θ σ σ

θ θ = φ θ ≡ φ θ µ σ⎜ ⎟σ + σ σ + σ⎝ ⎠

• Prior predictive distribution: ( ) ( )2 2ff f 0 0p ,θ = φ θ µ σ + σ ( 2

fσ = variance of future

estimate of θ)

• Posterior predictive distribution: ( )2 2 2 2

20 0 0f f f2 2 2 2

0 0

ˆˆp ,⎛ ⎞σ µ + σ θ σ σ

θ θ = φ θ + σ⎜ ⎟σ + σ σ + σ⎝ ⎠

T - 14

Page 15: Bayesian tutorials

3.2.3 Bayesian prediction assuming enthusiastic and skeptical priors

• Two priors can be used to predict the outcome of the study:

• Enthusiastic prior: use results of meta-analysis, i.e. N(-0.68,0.54²)

• Skeptical prior: assume N(0,0.54²)

LOG ODDS RATIO

PR

IOR

DIS

TRIB

UTI

ON

-3 -2 -1 0 1 2

0.0

0.2

0.4

0.6

ENTHUSIASTIC PRIOR SKEPTICAL PRIOR

T - 15

Page 16: Bayesian tutorials

Predictive power:

• The result will be significantly in favor of a beneficial calcium effect if the 95% inter-

val ˆ ˆ1.96 , 1.96⎡ ⎤θ − σ θ + σ⎦ is on the negative side and excludes 0. ⎣

• The probability that ˆ 1.96 0θ + σ < is calculated under the prior predictive distribution

to yield the predictive power at the start of the trial. Thus the predictive power is ac-

tually the expected power at the start of the trial using the prior distribution express-

ing the uncertainty of the true log odds-ratio. Thus in our calculations 2 2fσ ≡ σ .

• The predictive power is in our case 02 20

1.96⎛ ⎞− σ − µΦ⎜ ⎟⎜ ⎟σ + σ⎝ ⎠

.

• Problem: the variance of θ̂ depends on the obtained results and is not constant.

However, we will ignore this problem here.

T - 16

Page 17: Bayesian tutorials

Under the enthusiastic prior:

• ( )0 0P p 0 0.90 = θ < =

• For the CEP trial . Using the prior predictive distribution the predictive

power of a “firm” positive result is

2f 0.023σ =

( )fp 0 0.75θ < = .

Under the skeptical prior:

• ( )0 0P p 0 0.50 = θ < =

• The predictive power of a “firm” positive result reduces to 0.30.

T - 17

Page 18: Bayesian tutorials

3.2.4 Bayesian stopping rules assuming enthusiastic and skeptical priors

• Stopping rule: lower and upper boundaries for stopping the (CEP) trial specified

prior to running the trial

• In a Bayesian context we could specify stopping the CEP trial

• For efficacy, when ( )P 0 data 0.025θ > = (1)

• For inferiority, when ( )P 0 data 0.025θ < = (2)

• For each prior we can then determine the under- and upper limit such that (1) and

(2) applies, respectively.

• For the under limit we obtain: ( )2 2 2 2⎤und 0 0 0ˆ 1.96⎡θ = −σ µ − σ σ + σ σ⎣ ⎦ and for the upper

limit ( )2 2 2 2und 0 0 0

ˆ 1.96⎡ ⎤θ = −σ µ + σ σ + σ σ⎣ ⎦ .

T - 18

Page 19: Bayesian tutorials

• We show below the Bayesian boundaries under the enthusiastic and the skeptical

prior and additionally the O’Brien-Fleming (group-sequential) boundaries.

NUMBER OF COMPLETED PREGNANCIES

ES

TIM

ATE

OF

LOG

OD

DS

RA

TIO

1000 2000 3000 4000

-1.5

-1.0

-0.5

0.0

0.5

1.0

1.5

BE

BS

OBF

T - 19

Page 20: Bayesian tutorials

3.2.5 Exercises

• Explain why the skeptical prior N(0,0.54²) is equivalent to having already conducted

a trial in which about 8% of the total planned sample size for CEP had been entered

and no treatment was observed.

• Spiegelhalter, Freedman and Parmar (JRSSA, 1994, 157, 357-416) advise to use a

community of priors in monitoring and reporting a clinical trial. These should include,

for monitoring, a reference (non-informative), skeptical and enthusiastic prior. Spec-

ify a skeptical prior with mean 0 such that the prior probability that P(θ < -0.45) =

0.05, an enthusiastic prior with the same variance such that P(θ > -0.45) = 0.05. Re-

peat the above analyses with the three priors.

T - 20

Page 21: Bayesian tutorials

3.3. Bleeding accidents in a stroke trial: sampling exercise

• Write a program in R (or S+) to obtain samples from the posterior distribution

B(11,21) after the pilot stroke data have been observed. Sample also the odds.

• Sample first from the prior distribution of the pilot stroke data (uniform prior) and use

the rejection/acceptance method to sample from the posterior.

• Suppose the prior distribution of the stroke trial data should be Beta(5,5). Sample

from the posterior distribution by using the sample of the previous posterior distribu-

tion.

T - 21

Page 22: Bayesian tutorials

3.4. Cross-over trial for 2 antihypertensive drugs

• Replay the analysis in Chapter 5 using the program “bivariate figure posterior cross-

over.SSC”.

T - 22

Page 23: Bayesian tutorials

3.5. Bayesian hierarchical modeling (Chapter 6)

1. Replay the Bayesian hierarchical model of the tumor data using the program “tu-

mor.SSC”.

2. Replay the Bayesian meta-analysis of 22 clinical trials on beta-blockers using the

program “meta-analysis on beta-blockers.ssc”.

3. Perform a similar meta-analysis (by changing the previous S+ program) on the fol-

lowing study results collected by Sillero-Arenas et al. (Obstretics & Gynecology,

79,2, 286-294, 1992) on the relationship of hormone replacement therapy and breast

cancer. For a table of the results, see next page. For this analysis import the SAS

data set ‘sillero.SD2”.

T - 23

Page 24: Bayesian tutorials

4. Perform a sensitivity analysis by slightly varying the prior distribution of τ in the

above meta-analysis. This could be done by taking inverse gamma distributions.

Study Log-odds ratio Variance Study Log-odds ratio Variance

1 0,10436 0,299111 19 0,01980 0,0246112

-0,03046 0,121392 20 0 0,0028903 0,76547 0,319547 21 -0,04082 0,0158634 -0,19845 0,025400 22 0,02956 0,0670695 -0,10536 0,025041 23 0,18232 0,0106776 -0,11653 0,040469 24 0,26236 0,0179187 0,09531 0,026399 25 0,32208 0,0738968 0,26236 0,017918 26 0,67803 0,4894159 -0,26136 0,020901 27 -0,96758 0,194768

10 0,45742 0,035877 28 0,91629 0,05184611 -0,59784 0,076356 29 0,32208 0,11017912 -0,35667 0,186879 30 -1,13943 0,08617313 -0,10536 0,089935 31 -0,47804 0,10352214 -0,31471 0,013772 32 0,16551 0,00415215 -0,10536 0,089935 33 0,46373 0,02315016 0,02956 0,004738 34 -0,52763 0,05038417 0,60977 0,035781 35 0,10436 0,00340718 -0,30111 0,036069 36 0,55389 0,054740

T - 24

Page 25: Bayesian tutorials

3.6. Gibbs sampling the beta-binomial distribution (Chapter 7)

• Draw a sample from the beta-binomial distribution/

• using the Method of Composition of Chapter 5.

• using the Gibbs approach of Chapter 7. Show also the trace plot of the sampled

values. Compare the sample to the true beta-binomial distribution.

T - 25

Page 26: Bayesian tutorials

3. WINBUGS

T - 26

Page 27: Bayesian tutorials

4.1. Notes on using WINBUGS

• Click on the WinBUGS14 icon.

• WINBUGS files have extension odc.

• Clicking on Help on the WINBUGS menu bar you will find Examples Vol I and Ex-amples Vol II. These are files with worked out examples in Compound Documents on a variety of problems. Whenever you have a new problem it is advisable to search

in these 2 documents for an example treating a similar problem and modify it to your

needs.

T - 27

Page 28: Bayesian tutorials

4.2. The osteoporosis regression example

• Open the file “osteo.odc”.

• If necessary look at Section 7. Tutorial, start at 7.3 to run this model in WINBUGS.

• Before you run the program, switch off the blocking mode for fixed effects. In a sec-

ond run switch on the blocking mode again and observe the difference.

• Change the normal distribution of the response into a t-distribution with 4 degrees of

freedom and reanalyze the model. Try out also the logistic distribution. This is an ex-

ample a Bayesian sensitivity analysis.

• Open file “osteo with doodle.odc”. Run the regression model now from the Doodle.

T - 28

Page 29: Bayesian tutorials

• The file “osteomul.odc” allows giving an informative prior for one of the regression

coefficients. Try it out.

• The data file “osteop.txt” contains more regressors than in the previous WB-

programs. Construct a linear regression model in WB with all these regressors to

predict BMCTB.

T - 29

Page 30: Bayesian tutorials

4.3. Beta-blocker meta-analysis

• Open the file “beta-blocker.odc”.

• Carry out a WB run for this model and record the posterior estimate and 95% C.I. in-

terval for the mean log odds-ratio. Further, can you see that there is homogeneity in

the study results?

• Check the cross-correlation of some of the parameters. If necessary use overrelaxa-tion to improve the convergence of the chain.

• Create 2 extra chains for the initial values and check convergence with the Brooks-

Gelman-Rubin diagnostics.

• Examine the posterior estimates of the baseline risks and study odds ratios graphi-

cally.

T - 30

Page 31: Bayesian tutorials

• Rank the studies according to their baseline risk.

• Suppose a new study with twice 100 patients are planned to set up. What is the pre-

dictive distribution of the number of deaths in each arm?

• Replace the normal distribution of the baseline risks and of the log odds ratios by a t-

distribution with 4 degrees of freedom. Compare the models with the Deviance In-

formation Criterion.

• Write a WB model for the absolute risk reduction (ar) assuming that the variance is

known equal to sample variance and make posterior inferences on the number

needed to treat (NNT = 1/ar).

T - 31

Page 32: Bayesian tutorials

4.4. Survival analyses from WB manual

4.4.1 Mice: Weibull regression (WB Manual I)

• Observe that one states in the manual that for a censored observation the survival

distribution is a truncated Weibull. This statement is misleading since the authors ac-

tually mean that for a censored observation its contribution to the likelihood is S(ti).

T - 32

Page 33: Bayesian tutorials

4.4.2 Kidney: Weibull regression with random effects (WB Manual I)

• The difference between this model and previous model is that there are two survival

times for each individual. An assumption that is often taken is that conditional on the

individual‘s (unmeasured) characteristics the two observation times are independent.

The unobserved characteristics are represented by a random effect bi (behind the ex-

ponent). In survival analysis exp(bi) is called the frailty parameter.

• Examine:

• The change in the parameter estimates, the 95% C.I.’s and the model evalua-

tion when the survival times are assumed to be independent.

• How the parameter estimates change if the normal distribution for the frailty

component is changed to, say, a t-distribution.

• The sensitivity of the outcome when the prior distribution for τ is changed to a

uniform.

T - 33

Page 34: Bayesian tutorials

4.4.3 Leuk: Cox regression (WB Manual I)

• A Counting Processes approach is taken to perform a Cox regression approach. In a

frequentist approach the estimates of the regression coefficients are obtained from

maximizing a partial likelihood (which avoids the specification of the baseline hazard

function), which is not an ordinary likelihood. WINBUGS needs though a likelihood,

which implies that the baseline hazard function needs to be specified.

• The hazard function is specified non-parametrically as a step function where jumps

are made at a (fixed) set of time points (observe that in the WB program the time

points actually coincide with the observed time points). The prior distribution can then

be taken in a non-informative manner on the length of the jumps (see WB manual I).

• Observe that in the WB program the use of “eps” is not necessary when defining Y[I,j].

T - 34

Page 35: Bayesian tutorials

• Examine:

• The change in the regression estimates when the different time points for the

baseline hazard are taken.

• The change in the regression estimates when the prior distribution on the

baseline hazard varies.

T - 35

Page 36: Bayesian tutorials

4.4.4 LeukFr: Cox regression with random effects (WB Manual I)

• The previous WB program is modified by adding a random effect to the regression

part of the model.

• Examine:

• The change in the regression estimates when Clayton’s approach is taken (by

assuming that the frailty has a Gamma distribution).

T - 36

Page 37: Bayesian tutorials

4.5. Performance of ecmo-machines

4.5.1 Description of the problem and the data

• An extra-corporal membrane oxygenate machine (ecmo-machine) works as an artifi-

cial lung, i.e. a circuit is created outside the body of an individual who undergoes an

operation to oxygenate the blood. The purpose is to help the lungs. This oxygenation

procedure is typically administered to patients in a bad physical condition and for

whom there is no other option.

• The normal usage is 6-8 hrs at maximum, but here we are looking at patients who

need to be on the machine for several days, even weeks.

T - 37

Page 38: Bayesian tutorials

• The ecmo-machine can terminate for the following reasons:

• the time on the artificial lung was sufficient for the healing process – censoring (by good

health status) (leak/fail = 0);

• the patient got worse and died – censoring (by poor health status) (leak/fail = 0)

• the machine started leaking – event (leak/fail = 1)

• More than one ecmo-machine might be needed for one patient: the standard routine

is to employ one machine but replace by another when it starts leaking (defect).

• There are 6 different types of machines, but we have classified them into two types:

medos or others.

• The above implies that for each patient there are recurrent events, each time the

ecmo machine fails there is an event.

T - 38

Page 39: Bayesian tutorials

• The question which of two classes of ecmo machines performs better: ecmo or the

other types?

T - 39

Page 40: Bayesian tutorials

4.5.2. WINBUGS program for lognormal frailty

• The program “oxygen survival lognormal.odc” is based on a Cox regression ap-

proach with random effects. The Counting Processes approach of the previous ex-

ample is used; whereby the random effect has a normal distribution hence the frailty

component has a log-normal distribution. The hazard function is a step function with

jumps when events occur. Therefore also the two survival functions: for medos

(S.medos) and for the other machines (S.others) are step functions.

• Two chains were run. Estimation was done after 10,000 iterations in each chain

(graphically monitored G-R diagnostics) on 10,000 iterations in each chain.

• The program stopped with an error-message but restarting it resulted in the following

estimates.

T - 40

Page 41: Bayesian tutorials

• The DIC equals 154.94, pD = 116.5 taking into account that the random effects are

estimated too.

• The posterior summary statistics of beta.medos (regression coefficient of medos)

shows that the medos machine has a significantly lower risk of failing than the other

ecmo machines.

T - 41

Page 42: Bayesian tutorials

4.5.3. WINBUGS output for lognormal frailty

• The following table was obtained:

node mean sd MC error 2.5% median 97.5% start sample

beta.medos -5.405 1.869 0.07207 -9.61 -5.138 -2.472 10001 19998

S.medos[1] 1.0 7.577E-5 1.273E-6 0.9999 1.0 1.0 10001 19998

S.medos[2] 1.0 1.834E-4 4.099E-6 0.9997 1.0 1.0 10001 19998

S.medos[16] 0.9958 0.008664 2.549E-4 0.973 0.9989 1.0 10001 19998

S.medos[17] 0.9935 0.0122 3.516E-4 0.9598 0.998 1.0 10001 19998

S.others[1] 0.9997 9.627E-4 2.534E-5 0.9975 1.0 1.0 10001 19998

S.others[2] 0.9989 0.002489 8.494E-5 0.9923 0.9998 1.0 10001 19998

S.others[16] 0.7796 0.1639 0.005828 0.3836 0.8154 0.9871 10001 19998

S.others[17] 0.6545 0.2216 0.006264 0.1439 0.6903 0.9702 10001 19998

S.others[17] 0.6545 0.2216 0.006264 0.1439 0.6903 0.9702 10001 19998

T - 42

Page 43: Bayesian tutorials

node mean sd MC error 2.5% median 97.5% start sample

b[1] -0.04833 3.108 0.02197 -6.434 -0.02426 6.212 10001 19998

b[2] -0.4596 2.792 0.02222 -6.661 -0.2624 4.547 10001 19998

b[57] -0.2951 2.921 0.02337 -6.559 -0.1753 5.336 10001 19998

b[58] -0.05781 3.087 0.02299 -6.389 -0.0316 6.14 10001 19998

b[59] -0.28 2.954 0.02061 -6.471 -0.1931 5.394 10001 19998

b[60] -0.09909 3.088 0.02251 -6.44 -0.0561 6.034 10001 19998

b[61] -0.3057 2.965 0.02328 -6.596 -0.2024 5.276 10001 19998

sigma 2.995 1.007 0.05052 1.478 2.837 5.328 10001 19998

T - 43

Page 44: Bayesian tutorials

4.5.4. WINBUGS program & output for gamma frailty

• The program “oxygen survival gamma.odc” replaces the previous lognormal distri-

bution with a gamma(1/γ,1/γ) distribution according to Clayton (Biometrics, 1991).

Clayton suggested to take a non-informative gamma(µ,η) prior with µ=0,η=0. We re-

placed this by uniform prior on [0,100], because the WB-program protested when

generating initial values for the frailties.

• Now the posterior mean for beta.medos (+ 95% C.I.) becomes -3.51, [-6.27, -1.41].

The posterior median for γ is 5.8 with 95% C.I. equal to [0.93, 21.24].

• The deviance information criterion is now 140.4, quite lower than that of the previous

analysis. Now pD = -6.0 indicating non-logconcavity of the likelihood.

T - 44

Page 45: Bayesian tutorials

• Write yourself the program “oxygen survival gamma.odc” starting from “oxygen survival lognormal.odc”.

T - 45

Page 46: Bayesian tutorials

4.6. Hormone-replacement meta-analysis

• Run the WB-program “sillero meta-analysis with doodle.odc” from the Doodle.

• Give 5 initial overdispersed chains. This can be done by fist looking at a run with one

chain and taking extreme values (judged from the posterior density plots). Apply BOA

on the five chains to assess convergence.

• For installation of BOA, look at the BOA manual. For S+-users an S+-program has

been written assuming five chains (in this case for the osteoporosis example, but this

can be easily modified) which produces all BOA output to check convergence.

• Extend the WB-program “sillero meta-analysis.odc” to include the diagnostics of

Chapter 8.

T - 46

Page 47: Bayesian tutorials

4.7. Re-analyses of the prostate cancer data

4.7.1 A random effects model with independent normal random effects

• In the prostate cancer example, 54 subjects were followed over a period varying from

a couple of years to more than 25 years. Looking at the evolutions of the cancer pa-

tients below and reflecting on the nature of cancer, one could wonder whether the

evolution of log(PSA+1) for these patients can be modeled with a broken line linear

regression model.

WINBUGS program

• The WB program “prostate change point independent normal.odc” estimates a

single change-point for all cancer cases. The random effects model assumes the

random intercept to be independent of the random slope.

T - 47

Page 48: Bayesian tutorials

Years before diagnosis

ln(1

+PS

A)

0 5 10 15 20 25 30

01

23

4

Controls

Years before diagnosis

ln(1

+PS

A)

0 5 10 15 20 25 30

01

23

4

BPH cases

Years before diagnosis

ln(1

+PS

A)

0 5 10 15 20 25 30

01

23

4

L/R cancer cases

Years before diagnosis

ln(1

+PS

A)

0 5 10 15 20 25 300

12

34

Metastatic cancer cases

T - 48

Page 49: Bayesian tutorials

WINBUGS output

• We have taken 3 chains with a burn-in sample of 10,000 iterations in total. An extra

of 20,000 iterations deliver the estimates.

• Most of the parameters show good convergence with the Brooks-Gelman-Rubin con-

vergence diagnostic. However, some of the change points are very hard to deter-

mine.

• The posterior estimates are shown on the next pages.

T - 49

Page 50: Bayesian tutorials

node mean sd MC error 2.5% median 97.5% start sample

alpha.b 0.581 0.175 0.008701 0.2352 0.5844 0.9119 10001 30000

alpha.c 0.4401 0.2223 0.01168 0.01579 0.4395 0.8641 10001 30000

b0.m 0.8217 0.1302 0.006028 0.5772 0.8178 1.089 10001 30000

b1.m -0.264 0.07373 0.0022 -0.4074 -0.2643 -0.118 10001 30000

beta.age 0.02085 0.01093 4.819E-4 3.572E-4 0.02034 0.04362 10001 30000

beta2 -0.7929 0.1218 0.006017 -1.02 -0.7933 -0.5498 10001 30000

beta[1] -0.6779 0.1283 0.001833 -0.9293 -0.6774 -0.4271 10001 30000

beta[50] 0.1228 0.2125 0.006048 -0.2905 0.1229 0.5384 10001 30000

beta[51] -1.547 0.1894 0.00528 -1.924 -1.546 -1.182 10001 30000

beta[52] -1.33 0.2138 0.005599 -1.758 -1.33 -0.9109 10001 30000

beta[53] -0.448 0.2282 0.005663 -0.8942 -0.4474 2.932E-4 10001 30000

beta[54] -0.4747 0.237 0.005813 -0.9481 -0.4718 -0.01411 10001 30000

T - 50

Page 51: Bayesian tutorials

node mean sd MC error 2.5% median 97.5% start sample

k[37] 7.801 4.271 0.0515 2.0 7.0 15.0 10001 30000

k[38] 5.799 1.718 0.05471 2.0 6.0 8.0 10001 30000

k[39] 7.341 4.425 0.04893 1.0 6.0 14.0 10001 30000

k[49] 6.387 2.672 0.04096 2.0 6.0 13.0 10001 30000

k[50] 3.91 1.686 0.0522 1.0 4.0 7.0 10001 30000

k[51] 5.788 3.781 0.1978 1.0 9.0 9.0 10001 30000

k[52] 1.45 0.9166 0.03258 1.0 1.0 4.0 10001 30000

k[53] 3.902 1.71 0.05268 1.0 4.0 7.0 10001 30000

k[54] 1.874 1.267 0.05066 1.0 1.0 5.0 10001 30000

sigma2.b0 0.2062 0.06102 0.001905 0.1146 0.197 0.35 10001 30000

sigma2.b1 0.1731 0.04256 6.804E-4 0.1063 0.1676 0.2729 10001 30000

sigma2.r 0.05115 0.003939 3.842E-5 0.04404 0.05093 0.05941 10001 30000

T - 51

Page 52: Bayesian tutorials

• The parameter beta2 pertains to the part after the change point k (thus most likely

when the cancer has initiated). For the interpretation of the regression coefficient one

must realize that the time-scale has been reversed: we talk here about the years be-

fore diagnosis. Therefore, a negative regression coefficient implies that there is an

increase of the log(PSA+1) towards the end.

• For the first 36 cases the change point has a uniform posterior distribution over the

number of observed time-points). For the 18 cancer cases (shown in the table) the

change-point has a more peaked posterior distribution, but sometimes bi-modal. See

also the densities below.

T - 52

Page 53: Bayesian tutorials

• Some posterior densities of the change point:

k[43] chains 1:3 sample: 30000

0 2 4 6 8

0.0 0.2 0.4 0.6

k[44] chains 1:3 sample: 30000

0 5 10

0.0 0.05 0.1 0.15

T - 53

Page 54: Bayesian tutorials

4.7.2 A random effects model with correlated normal random effects

• Random effects are most often assumed to be correlated. In the next WB program

this assumption is built-in.

WINBUGS program

• The WB program “prostate change point correlated normal.odc” estimates a sin-

gle change-point for all cancer cases assuming that the random effects model are

correlated.

T - 54

Page 55: Bayesian tutorials

WINBUGS output

• We have taken 3 chains with a burn-in sample of 10,000 iterations in total. An extra

of 20,000 iterations deliver the estimates.

• The posterior estimates are shown on the next pages.

• There are some differences compared to the previous analysis. For instance, the re-

gression coefficient for benign hyperplasia cases decreases but this is compensated

by almost comparable increase of the mean for the random intercept.

• Also some of the posterior distributions of the change points undergo a change.

T - 55

Page 56: Bayesian tutorials

Node mean sd MC error 2.5% median 97.5% start sample

alpha.b 0.3233 0.1262 0.005821 0.07905 0.3228 0.577 10001 30000

alpha.c 0.4217 0.1555 0.00754 0.1308 0.4145 0.7462 10001 30000

beta.age 0.01781 0.007564 2.742E-4 0.003072 0.01782 0.03258 10001 30000

beta2 -0.6182 0.1088 0.004296 -0.8332 -0.6158 -0.4105 10001 30000

mub[1] 0.9914 0.1288 0.004636 0.7357 0.9912 1.244 10001 30000

mub[2] -0.3193 0.07282 0.00157 -0.4623 -0.3195 -0.1769 10001 30000

k[37] 8.258 4.882 0.03948 1.0 8.0 15.0 10001 30000

k[38] 4.464 1.828 0.05366 1.0 4.0 8.0 10001 30000

k[39] 7.655 4.679 0.04148 1.0 7.0 14.0 10001 30000

T - 56

Page 57: Bayesian tutorials

k[49] 6.259 2.636 0.03211 2.0 6.0 13.0 10001 30000

k[50] 2.688 1.553 0.04466 1.0 2.0 6.0 10001 30000

k[51] 5.821 1.811 0.05432 2.0 6.0 8.0 10001 30000

k[52] 4.467 1.766 0.04731 1.0 5.0 7.0 10001 30000

k[53] 4.896 1.627 0.0379 2.0 5.0 7.0 10001 30000

k[54] 2.098 1.359 0.0411 1.0 2.0 6.0 10001 30000

sigma2b[1,1] 0.3956 0.09892 0.002175 0.2387 0.3832 0.6249 10001 30000

sigma2b[1,2] -0.2362 0.05989 9.47E-4 -0.3735 -0.2291 -0.1398 10001 30000

sigma2b[2,1] -0.2362 0.05989 9.47E-4 -0.3735 -0.2291 -0.1398 10001 30000

sigma2b[2,2] 0.1858 0.04356 3.898E-4 0.1169 0.1803 0.2854 10001 30000

sigmab[1] 0.6242 0.07689 0.00172 0.4886 0.6191 0.7905 10001 30000

sigmab[2] 0.4282 0.04938 4.463E-4 0.3419 0.4246 0.5342 10001 30000

corrb -0.8701 0.04772 0.001092 -0.9416 -0.8775 -0.7566 10001 30000

sigma2.r 0.05059 0.003854 2.938E-5 0.04361 0.0504 0.05873 10001 30000

T - 57

Page 58: Bayesian tutorials

k[43] chains 1:3 sample: 30000

0 2 4 6 8

0.0 0.05 0.1 0.15 0.2

k[44] chains 1:3 sample: 30000

0 5 10

0.0 0.05 0.1 0.15

T - 58

Page 59: Bayesian tutorials

4.7.3 A random effects model with correlated normal random effects having a t-distribution

• A simple change in the WB-program turns the normal assumptions of the measure-

ment error and of the random effects into the assumption of a t-distribution.

• Most of the posterior estimates stay the same, but there are changes e.g. the poste-

rior mean becomes -0.43 with 95% C.I. equal to [-0.63, -0.27].

• Write this WB-program and put it in “prostate change point correlated t.odc”.

T - 59

Page 60: Bayesian tutorials

4.8. Relating baseline risk to treatment benefit in a meta-analysis

4.8.1 The problem

• Arends et al. (SIM, 2000, 19, 3497-3518) examined some frequentist methods to

analyze the relationship between the baseline risk for morbidity (or mortality) with the

effect of an active treatment in the context of a meta-analysis. More specifically, they

looked at three published studies which related the estimated treatment effects

against the estimated measures of risk in the control groups when performing a

meta-analysis.

T - 60

Page 61: Bayesian tutorials

4.8.2 Meta- analysis 1

• Here we look at a meta-analysis on the effect of tocolysis therapy on pre-term birth.

Fourteen (14) placebo-controlled trials evaluating the effect of tocolysis with β-mimics

to delay pre-term deliveries in high risk mothers were evaluated in a meta-analysis.

y n x m y n x m0.0 14 6.0 16 5.0 49 16.0 506.0

14 11.0 15 4.0 37 9.0 390.0 12 8.0 13 1.0 15 5.0 172.0 16 3.0 15 6.0 33 15.0 306.0 19 11.0 19 15.0 54 25.0 522.0 15 2.0 18 11.0 131 6.0 450.0 15 10.0 15 0.0 14 0.0 5

• Each row corresponds to a study. The 1st column (y) = number of pre-term births in the treatment group out

of n (2nd column) treated women, the 3rd column (x) = number of pre-term births in the control group out of m

(4th column) control women.

T - 61

Page 62: Bayesian tutorials

• Treatment effect was measured by the log odds ratio ( ) ( )( )− −log y m x x n y . When

there is heterogeneity of the treatment effect among the different studies one needs

to examine the determinants of this lack of homogeneity. An important question is

whether the treatment effect depends on the baseline risk in the population under in-

vestigation in the respective study. The latter could be estimated by the proportion of

pre-term births in the control group, on the original scale or on say the logit-scale.

The following scattergram depicts the log odds ratio and the logit of the baseline risk.

There seems to be a clear negative trend, as seen from the weighted least-squares

regression analysis.

T - 62

Page 63: Bayesian tutorials

LOGIT OF BASELINE RISK

LOG

OD

SS

RA

TIO

-2 -1 0 1

-4-3

-2-1

0 Weighted LS Regression

Bayesian Analysis

T - 63

Page 64: Bayesian tutorials

• To establish the negative trend one often applies a weighted least squares analysis,

like the one shown in the figure. There are several problems with this approach:

• The baseline risk is measured with error and this will bias the slope downward to

zero.

• The regressor and the response are functionally related to each other, so that,

even when there is no relationship between treatment effect and baseline risk a

relationship could appear.

• We look here at a Bayesian approach. It will boil down to specifying a hierarchical

model and analyzing it with WINBUGS. The following 3 model components are

needed:

T - 64

Page 65: Bayesian tutorials

4.8.3 The model

1. Underlying regression model: Model for the regression of the true treatment effect

ηi on the true baseline measure ξi:

η = α + βξ + εi i i where ( )ε τ∼ 2i N 0, .

The residual variance τ2 describes the heterogeneity in true treatment effects (or true

risks under treatment) in populations with the same true baseline risk.

2. Baseline risks model: Model for the distribution of the true baseline risk measures

ξi:

ξ ∼i G , with G a parametric model.

T - 65

Page 66: Bayesian tutorials

3. Measurements errors model: Model for the “measurement errors”:

( ) ( )ξ η ξ η ∼i i i iˆ , given , F, ˆ

with F a parametric model.

T - 66

Page 67: Bayesian tutorials

• For the present problem the hierarchical model is given by:

• For the ith trial, Xi and Yi are the numbers of pre-term deliveries out of mi placebo

women and ni women treated with the active treatment, respectively.

• The ξi stand for the true placebo logits and the ηi stand for the true log odds ratios.

• The distribution of the logit of the baseline risks is often taken to be the normal, but

other distributions are certainly possible, e.g. a mixture of normal distributions was

considered by Arends et al. (2000).

T - 67

Page 68: Bayesian tutorials

• The exact measurement model is taken to be the Binomial, more specifically we as-

sume that:

( )( )

( )( )

⎛ ⎞ξ⎜ ⎟+ ξ⎝ ⎠⎛ ⎞η + ξ⎜ ⎟+ η + ξ⎝ ⎠

ii i

i

i ii i

i i

expX Bin m ,

1 exp

expY Bin n ,

1 exp

.

• It is of interest to know the importance of the regression coefficient β, when 0 this

would imply no dependence on the baseline risk.

T - 68

Page 69: Bayesian tutorials

WINBUGS program

• The WB program “meta-analysis 1 taking into account baseline risk normal dist.odc” estimates:

• The regression line expressing the relationship between baseline risk and

effect of treatment.

• The value of the baseline risk for which there is no treatment effect

(eqpoint).

• A confidence and prediction band for the expected treatment effect given a

true baseline risk.

• Observe that in the WB program the response logit(ηi) in stead of the log odds ratio.

• We have taken 1 chain with a burn-in sample of 10,000 iterations. An extra 10,000 it-

erations deliver the estimates.

T - 69

Page 70: Bayesian tutorials

WINBUGS output

• Some of the posterior estimates are:

node mean sd MC error 2.5% median 97.5% start sample

alpha.real -1.509 0.3936 0.01352 -2.398 -1.475 -0.8383 20001 20000

beta -0.2629 0.4577 0.0172 -1.131 -0.2792 0.7623 20001 20000

eqpoint 1.12 446.2 3.14 -36.7 -3.328 32.99 20001 20000

sigma.eta 0.4113 0.3845 0.01534 0.0323 0.2991 1.392 20001 20000

sigma.ksi 0.8364 0.276 0.006358 0.4248 0.7936 1.49 20001 20000

• Observe that posterior mean of eqpoint is very different from posterior median since

beta is sometimes sampled close to zero. Also the MC error is very large.

T - 70

Page 71: Bayesian tutorials

• The figure in 4.8.2 shows that the regression line obtained from a Weighted Least

Squares analysis is biased away from 0 (as expected since a negative correlation is

created). The Bayesian estimated (mean) slope is shrunk towards zero (not signifi-

cant anymore, in a Bayesian sense). The shrinkage is large due to some small trials

with relatively large within-study variance compared to the between-study variance.

• The two standard deviations, sigma.eta and sigma.kse, express the within-study and

between-study variations. The standard deviation of the observed baseline risk (log

odds ratio) is 1.22, while the standard deviation of the true baseline risk is 0.84.

• DIC = 131.4 and pD = 17.2.

T - 71

Page 72: Bayesian tutorials

4.8.4 Meta-analysis 1: sensitivity analysis

• The last study was not considered by Arends et al., do you think there is a reason

for?

• Replace the normal distribution of the baseline risk by a t(4)-distribution or by a mix-

ture of (2) normal distributions. Do the conclusions change?

T - 72

Page 73: Bayesian tutorials

4.8.5 Meta-analysis 2

• Arends et al. (SIM, 2000, 19, 3497-3518) looked also at a meta-analysis where the

total number of person-years per group is given in stead of the number of events and

the sample size.

• A published meta-analysis of clinical trials compared drug treatment to placebo or no

treatment with respect to (cardiovascular) mortality in middle-aged patients with mild

to moderate hypertension. Twelve trials, which showed considerable variation in the

risk of mortality in the control groups, were included in the meta-analysis.

• In this case the research question was whether drug treatment prevents death in mild

to moderate hypertensive patients and whether the size of the treatment effect de-

pends on the event rate in the control group.

• The data are shown on the next page.

T - 73

Page 74: Bayesian tutorials

Study Treatment group Control group

Deaths/Number ofperson-years

Mortality rate/1000 person-years

Deaths/Number of person-years

Mortality rate/1000 person-years

1 10/595.2 16.8 21/640.2 32.8

2

2/762.0 2.6 0/756.0 0.0

3 54/5635.0 9.6 70/5600.0 12.5

4 47/5135.0 9.2 63/4960.0 12.7

5 53/3760.0 14.1 62/4210.0 14.7

6 10/2233.0 4.5 9/2084.5 4.3

7 25/7056.1 3.6 35/6824.0 5.1

8 47/8099.0 5.8 31/8267.0 3.7

9 43/5810.0 7.4 39/5922.0 6.6

10 25/5397.0 4.6 45/5173.0 8.7

11 157/22162.7 7.1 182/22172.5 8.2

12 92/20885.0 4.4 72/20645.0 3.5

T - 74

Page 75: Bayesian tutorials

• Now, the ξi stand for the true control log-mortality rates while the ηi stand for the true

treatment log-mortality rates.

• The exact measurement model is now:

( )( )( )( )

i i i

i i i

X Poisson m exp

Y Poisson n exp

ξ

η

∼.

• In this case, it is of interest to know whether the regression coefficient β is equal to 1,

in that case there would be no dependence on the baseline risk.

• Observe that the above model assumes that the events occur over the individuals in

an independent way, which is often crudely assumed in epidemiology (see e.g. Clay-

ton, Statistical models in epidemiology, 1993, pp. 40-43).

T - 75

Page 76: Bayesian tutorials

WINBUGS program

• The WB program “meta-analysis 2 taking into account baseline riskpoisson.odc” estima

tes:

• We have taken 1 chain with a burn-in sample of 10,000 iterations for each of the

chains. An extra 10,000 iterations deliver the estimates.

T - 76

Page 77: Bayesian tutorials

WINBUGS output

• Some of the posterior estimates are:

node mean sd MC error 2.5% median 97.5% start sample

alpha.real -1.634 0.6681 0.02119 -2.927 -1.641 -0.2685 10001 10000

beta 0.6864 0.1361 0.004374 0.423 0.6847 0.9659 10001 10000

sigma.eta 0.1335 0.08709 0.002905 0.02607 0.1161 0.3495 10001 10000

sigma.ksi 0.8009 0.2307 0.006495 0.4758 0.7625 1.346 10001 10000

• DIC = 170.4 and pD = 17.2.

• Now the within-study variability (sigma.eta) is much smaller than the between-study

variability (sigma.ksi).

T - 77

Page 78: Bayesian tutorials

4.8.6 Meta-analysis 2: sensitivity analysis

• Perform a sensitivity analysis varying the different aspects of the model. Evaluate the

models with the Deviance Information Criterion.

• Check the convergence with the BOA program.

T - 78

Page 79: Bayesian tutorials

4.9. Estimating disease prevalence (in the absence of a gold standard)

4.9.1 One diagnostic test

4.9.1.1. Description of the problem

• The goal is to estimate the prevalence of a disease using one diagnostic test.

• Suppose (in theory) one would have the table (D = disease, T = test)

D+ D- Total

T+ a b a+b

T- c d c+d

Total a+c b+d- N

• We consider three cases, but only the last is practically relevant.

T - 79

Page 80: Bayesian tutorials

Three cases

• Case 1: diagnostic test is a gold standard, this implies that sensitivity (S) = 1 &

specificity (C)

= 1.

• Case 2: diagnostic test is not a gold standard, but a gold standard is available.

• Case 3: diagnostic test is not a gold standard & (no gold) standard test is available.

T - 80

Page 81: Bayesian tutorials

Case 1: diagnostic test is a gold standard (S = 1 & C = 1)

• Estimate of ( )P D+π ≡ and of ( )P D− is immediately available from an epidemiological

study or a field experiment. Namely, ( )ˆ a c N a Nπ = + ≡ .

• Problem: gold standard is rarely available.

T - 81

Page 82: Bayesian tutorials

Case 2: diagnostic test is not a gold standard, but a gold standard is available

• Estimate sensitivity and specificity of a diagnostic test by relating to a gold standard

in a controlled experiment, then estimate S by a/(a+c) and C by d/(b+d).

• In an epidemiological study, use estimates of S and C to calculate an estimate of the

prevalence

( )( ) ( )ˆ P T C 1 S C 1 , (1)+π = + − + −

whereby ( ) ( )P T a b N+ = + .

• Problem: Gold standard is rarely available.

• Observe that sensitivity and specificity must be the same in the controlled experiment

as in the epidemiological study.

T - 82

Page 83: Bayesian tutorials

Case 3: diagnostic test is not a gold standard & (no gold) standard test is available

• S & C are not known and only marginal table is available:

Total

Test+ a+b ≡ m

Test- c+d ≡ n

Total N

• Implication: more parameters to estimate than data available.

• A Bayesian approach is necessary to get estimates for the prevalence (and sensitivity

& specificity).

T - 83

Page 84: Bayesian tutorials

4.9.1.2. Bayesian approach

• Assume that the following table underlies the data, with Y1 the missing number of

true positives and Y2 the missing number of false negatives:

D+ D- Total

T+ Y1 m-Y1 m

T- Y2 n-Y2 n

Total Y1+Y2 N-( Y1+Y2) N

• The Y1 and Y2 are called latent data and an analysis using such data is called a “la-tent class analysis”.

T - 84

Page 85: Bayesian tutorials

• To write down the likelihood (of the observed and latent data) we express the prob-

abilities in terms of the prevalence (π), sensitivity (S) and specificity (C):

( ) [ ] ( ) ( )( ) ( )2 1 21 Y m Y n YY1 2m,n,Y ,Y ,S,C S 1 S 1 1 C 1 C− −

π = π π − − π − − π⎡ ⎤ ⎡ ⎤ ⎡ ⎤⎣ ⎦ ⎣ ⎦ ⎣ ⎦ .

• Prior information on the parameters π, S and C is absolutely necessary since there

are more parameters than data. Prior information can be represented by a beta den-

sity Beta(α,β).

T - 85

Page 86: Bayesian tutorials

4.9.1.3. Example

• Prevalence of Strongyloides infection from Cambodian refugees who arrived in Mont-

real, Canada, during an 8-month period (Joseph, Gyorkos, Coupal Am J Epidemiol

1995; 141:263-72).

• Diagnostic test = stool examination (standard test in parasitology).

• Observed data: marginal totals

T+ m = 40

T- n = 122

Total 162

T - 86

Page 87: Bayesian tutorials

4.9.1.4. Bayesian analysis

Prior information

• Prevalence: very little known, a uniform prior was chosen (Beta(1,1)).

• Panel experts from McGill Centre for Tropical Diseases determined equally tailed

95% probability intervals for sensitivity and specificity (from a review of the literature

and clinical opinion), see below (with associated Beta parameters):

95% range α β

S 5-45 4.44 13.31

C 90-100 71.25 3.75

• Beta densities can be obtained by matching the endpoints of the 95%-ranges to the

theoretical ranges from the Beta density.

T - 87

Page 88: Bayesian tutorials

WINBUGS program

• Joseph et al. (1995) have written an S+ program based on the full conditional distribu-

tions of the parameters, i.e. Y1, Y2, π, S and C.

• In the file “prevalence 1 test a.odc” an equivalent WB program is given.

• In the file “prevalence 1 test b.odc” the latent (missing) data are also determined.

• The WB program has also a goodness-of-fit statistic G0, and a Bayesian P-value

which compares the fit of the observed data to the data generated under the assumed

model.

• Three chains were run, a burn-in sample in each chain of 1000 iterations were taken

and the estimation was done on 2000 iterations in each chain.

T - 88

Page 89: Bayesian tutorials

WINBUGS output

• The following table was obtained:

node mean sd MC error 2.5% median 97.5% start sample

prev 0.7424 0.16 0.00519 0.4133 0.7553 0.9866 1001 6000

sens 0.3188 0.06658 0.00213 0.2241 0.3060 0.4815 1001 6000

spec 0.9469 0.027 4.988E-4 0.8825 0.9510 0.9863 1001 6000

y1 37.58 2.797 0.06611 30.0 38.0 40.0 1001 6000

y2 82.91 24.35 0.7814 34.0 85.0 120.0 1001 6000

bayesp 0.5602 0.4964 0.006921 0.0 1.0 1.0 1001 6000

T - 89

Page 90: Bayesian tutorials

4.9.2 Two or more diagnostic tests

4.9.2.1. Description of the problem & example

• Usually more than diagnostic test is available. The purpose is then to get an improved

estimate of the prevalence using all tests.

• For example in the Cambodian refugees example, the following table was available:

T1+ T1- Total

T2+ 38 87 125

T2- 2 35 37

Total 40 122- 162

• The first test is the examination of the stool; the second test is a serology test. Neither

of the tests can be considered as a gold standard.

T - 90

Page 91: Bayesian tutorials

4.9.2.2. Bayesian approach

• As for one diagnostic test, one can assume that there are latent data completing the

2x2-table to the 2x2x2-table. Assuming conditional independence (conditional on

the disease) of the tests, namely

( ) ( ) ( )1 2 1 2P T & T D P T D P T D+ + + + + + += ,

one derives again the likelihood of the (observed and latent) data.

• For the prevalence, again a uniform prior was chosen. Further, for the sensitivity and

specificity the experts derived a similar table as for the first test.

95% range α β

S 65-95 21.96 5.49

C 35-100 4.1 1.76

T - 91

Page 92: Bayesian tutorials

• Joseph et al. (1995) have written an S+ program based on the full conditional distribu-

tions of the parameters. In the file “prevalence 2 tests_1.odc” an equivalent WB pro-

gram is given.

• Three chains were run, a burn-in sample in each chain of 1000 iterations were taken

and the estimation was done on 2000 iterations in each chain. The following table is

obtained (next page).

• It is clearly seen that the second test increases the information regarding the preva-

lence considerably, but it also sharpens the posterior information regarding the sensi-

tivity and specificity of the first diagnostic test.

T - 92

Page 93: Bayesian tutorials

• WB results from both diagnostic tests combined:

node

mean sd MC error 2.5% median 97.5% start sample

prev 0.7262 0.1108 0.003242 0.467 0.7444 0.9016 2001 6000

sens1 0.3008 0.05664 0.001411 0.2085 0.2950 0.4282 2001 6000

sens2 0.8803 0.04259 8.418E-4 0.7848 0.8839 0.9503 2001 6000

spec1 0.9578 0.02096 3.218E-4 0.909 0.9610 0.9887 2001 6000

spec2 0.6793 0.1625 0.004227 0.3723 0.6856 0.9557 2001 6000

T - 93

Page 94: Bayesian tutorials

4.9.2.3. An alternative Bayesian approach

There are problems with the previous approaches:

• The sensitivity and specificity as determined in a controlled experiment cannot be used to esti-

mate the prevalence of the disease since they are not intrinsic values, but depend on e.g. the

geographical area. However, specificity can be estimated in real life by performing an epidemi-

ological study in an area which is known to be disease-free. But, even if sensitivity could be es-

timated from a controlled experiment, say by randomly infecting the subjects (when dealing with

animals), such experiments must be done on a small scale since they are often quite costly.

• Conditional independence of diagnostic tests almost never holds.

• Since no hard data are available on sensitivity, prior information on this parameter

must be obtained from eliciting experts, but that is not trivial. For all of these reasons

another approach is taken now. We start with an example in veterinary medicine.

T - 94

Page 95: Bayesian tutorials

4.9.2.4. Example in veterinary medicine

• Porcine cysticercosis is a major problem in many areas, being a debilitating and po-

tentially lethal zoonosis (Garcia & Del Brutto, 2000; Phiri et al., 2002).

• Relatively accurate estimates of prevalence of cysticercae in fattening pigs are essen-

tial to appraise the risk for human infection.

• Several diagnostic tests are used, but none is a gold standard and exact information

about test sensitivity and specificity is unavailable.

• A total of 868 traditionally kept pigs from Zambia were tested with 2 diagnostic tests:

palpation of the tong (TONG), visual inspection of the carcass (VISUAL).

T - 95

Page 96: Bayesian tutorials

• The following table of marginal totals is obtained

TONG VISUAL Number of pigs

- - 744-

+ 9+ - 3+ + 112

T - 96

Page 97: Bayesian tutorials

4.9.2.5. A different parameterization

• In theory the following 2 tables are available

D

TONG VISUAL D TONG VISUAL

- - - - - --

- + - - +- + - - + -- + + - + +

• This table implies probabilities ( )1 2P T & T D+ + + , ( )1 2P T & T D+ − + , …, ( )1 2P T & T D− − − .

There are 8 -1 = 7 parameters to estimate with only 4 cell frequencies. Hence 3 pa-

rameters need to be fixed or, in a Bayesian context, prior distributions could be given.

T - 97

Page 98: Bayesian tutorials

• It is advantageous to reformulate these probabilities in: (1) ( )P D+ , (2) ( )1P T D+ + ,

(3) ( )1P T D− − , (4) ( )2 1P T D & T+ + + , (5) ( )2 1− , (6) P T D & T+ + ( )2 1P T D & T− − − , (7)

( )2 1P T D & T− − +

• The prevalence of D but also the sensitivities and specificities can be formulated in

these probabilities:

p = (1), Se1 = (2), Sp1 = (3), Se2 = (2)⋅(4)-[1-(2)] (5), Sp2 = (3) (6)-[1-(3)] (7)

• Put now probabilistic constraints (prior distributions) using expert knowledge on

some of the parameters to obtain parameter estimates.

T - 98

Page 99: Bayesian tutorials

Problem: on which parameters to put a prior distribution?

• Two cases:

• Expert(s) 1: Se1 between 0.3 and 0.7, Sp1 = 1

Se2 more than 0.8, Sp2 = 1

and assuming conditional independence

• Expert(s) 2: no opinion on Se1, Sp1 = 1 & no opinion on Se2, Sp2 = 1 probability (4) more than 0.9, probability (5) less than 0.1 (visual carcass inspection result is highly correlated with the tongue palpation)

• WINBUGS programs: prevalence 2 tests_2.odc & prevalence 2 tests_3.odc

T - 99

Page 100: Bayesian tutorials

Remarks

• The constraints Sp1 = 1, Sp2 = 1 imply that certain probabilities drop from the model.

• To verify which of the two “experts’ opinions” are consistent with the data, the Devi-

ance Information Criterion (DIC) can be used.

• The effective number of parameters that are estimated when the parameters are sub-

ject to probabilistic constraints could be obtained from pD.

T - 100

Page 101: Bayesian tutorials

WINBUGS program

• To compare the two models (expert 1 & expert 2) for each WB run we invoke the DIC

option. This allows to choice the best model (prior + likelihood) and to estimate the

number of effective parameters estimated.

• Three chains were run; estimation was done after 10,000 iterations in each chain

(graphically monitored G-R diagnostics) on 4000 iterations in each chain.

T - 101

Page 102: Bayesian tutorials

WINBUGS output

• The following tables were obtained:

Expert 1

node mean sd MC error 2.5% median 97.5% start sample

prev 0.1455 0.01198 1.111E-4 0.1229 0.1452 0.1698 10001 12000

sens1 0.6926 0.007363 1.14E-4 0.673 0.6949 0.6998 10001 12000

sens2 0.9548 0.02186 2.518E-4 0.9043 0.9583 0.9871 10001 12000

Expert 2

node mean sd MC error 2.5% median 97.5% start sample

prev 0.3417 0.1807 0.007197 0.1692 0.2699 0.8597 10001 12000

sens1 0.4727 0.1787 0.006841 0.1532 0.4913 0.7681 10001 12000

sens2 0.4938 0.1833 0.007082 0.1611 0.5167 0.7838 10001 12000

T - 102

Page 103: Bayesian tutorials

• Results on DIC and pD:

Dbar Dhat pD DIC

Expert 1

54.912 53.125 1.787 56.698

Expert 2

16.415 21.498 -5.083 11.332

• The Bayesian P-value was 0 for the first model; for the second model P = 0.51.

T - 103

Page 104: Bayesian tutorials

Remarks

• The value of pD is negative for the second model. A possible reason is that the non-

log concavity of the likelihood. However, when looking in the expression for pD one

can see that only the posterior mean of the multinomial probabilities is needed. Actu-

ally, these probabilities are non-linear functions of the parameters. Therefore, one

should evaluate DIC and pD at the posterior mean of the multinomial probabilities and

not at the posterior mean of the parameters of the model. However, this calculation

needs to be done outside of WB, using the deviance obtained from WB. We have

written an S+-program “DIC and pD for 2 diagnostic tests.ssc” to do this.

• We have recalculated the DIC and pD for the two models. For the first model we ob-

tain: DIC = 56.71 and pD = 1.78. For the second model we obtain: DIC = 19.16 and pD

= 2.78.

T - 104

Page 105: Bayesian tutorials

• The results imply the following:

• Models 1 and 2 show a clear difference in the estimates for the prevalence

and sensitivity & specificity.

• The value of DIC is much higher for the first model than for the second model,

which implies that the second model fits the data better. This is also reflected

in the Bayesian P-value.

T - 105

Page 106: Bayesian tutorials

4.9.2.6. More than 2 diagnostic tests

• The above approaches can be extended to more than 2 diagnostic tests.

• In the previous example two other tests: Ag-ELISA and Ab-ELISA were also meas-

ured. The model that uses expert 2 prior information and some strong priors on the

specificity of the latter two tests has the lowest DIC (= 70.25) with a reasonable pD =

9.98 (using the S+-program DIC and pD for 4 diagnostic tests.ssc) compared to six

other models among which were the models with a prior belief as expert 1.

T - 106

Page 107: Bayesian tutorials

Advantages of using a Bayesian approach

• There are more parameters than the data allow to be estimated. Therefore con-

straints on the parameters are needed. The constraints can be deterministic (like S =

1), but prior information is rarely always clear cut and therefore probabilistic con-

straints (like encountered) are more realistic.

• In conclusion, the Bayesian approach is a logical choice.

T - 107

Page 108: Bayesian tutorials

4.9.2.7. Exercises

• Replay the analyses using the above indicated WB-programs.

T - 108

Page 109: Bayesian tutorials

4.10. Modeling extra-Poisson variation

4.10.1. Description of the problem

• A longitudinal dental study, performed in Flanders, examined the teeth of about 5000

schoolchildren. At yearly intervals, starting at the age of seven, the dmft-score,

which is the sum of the number of decayed, missing and filled primary teeth, was

determined.

• Here we examine this score on the first year and for a subgroup of 276 children, ran-

domly selected from the total study group.

• The dmft-score is a count and thus a candidate for having a Poisson distribution.

However, when one tooth is decayed there is a high probability that the neighbouring

teeth also become decayed when not treated quickly. This creates a correlation be-

tween the 0s and 1s that are added up to create the count.

T - 109

Page 110: Bayesian tutorials

• The mean dmft-score is equal to 2.56, the variance equals 10.0. Hence there is

clearly is extra-Poisson variation (see also figure below).

0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

dmft

T - 110

Page 111: Bayesian tutorials

4.10.2. Candidate models

• With WINBUGS it is relatively easy to fit a variety of models:

• Model 1: Poisson model with θ the mean of the distribution.

• Model 2: Poisson-Gamma model (see WB-Manual Examples – Vol 1, exam-

ple 2). Hence θ has a Gamma(α,β) distribution.

• Model 3: Zero-inflated Poisson model, which is a mixture a Poisson model

and a degenerated distribution at 0.

• Model 4: mixture of 4 Poisson distributions. This model is inspired by a

C.A.M.A.N. analysis showing the need for 4 Poisson distributions.

• Model 5: mixture of 4 Poisson-Gamma models. This is the generalization of

the previous mixture.

T - 111

Page 112: Bayesian tutorials

4.10.3. WINBUGS programs

• The file “poisson model x on dmft scores.odc” contains the WB program for model

x.

• For all models we calculated the deviance, and when possible also DIC and pD.

• For all models we took 3 parallel chains with a burn-in of at least 2000 iterations for

each chain. 2000 extra iterations were run to estimate the parameters and to calcu-

late DIC and pD. The Brooks-Gelman-Rubin diagnostics were used to assess conver-

gence.

T - 112

Page 113: Bayesian tutorials

4.10.4. Results

Model 1: Poisson model

node mean sd MC error 2.5% median 97.5% start sample

deviance 1594.0 1.439 0.02037 1593.0 1593.0 1598.0 2001 6000

theta 2.564 0.09815 0.001194 2.372 2.562 2.761 2001 6000

Dbar Dhat pD DIC

1593.610 1592.620 0.997 1594.610

• The Poisson models corresponding to values from the 95% C.I. show a bad fit to the

histogram.

T - 113

Page 114: Bayesian tutorials

Model 2: Poisson-Gamma model

node mean sd MC error 2.5% median 97.5% start sample

deviance 753.9 23.88 0.6142 709.5 753.2 802.4 4001 6000

alpha 0.5486 0.07184 0.003158 0.4195 0.5452 0.6992 4001 6000

beta 0.2137 0.03407 0.001417 0.1522 0.2117 0.2845 4001 6000

Dbar Dhat pD DIC

753.901 613.968 139.933 893.834

• The gamma distribution G(0.55,0.21) is shown on next page. Further, a sample of

the corresponding Poisson-Gamma distribution shows a fairly good fit to the original

histogram of dmft-scores.

T - 114

Page 115: Bayesian tutorials

POSTERIOR GAMMA DISTRIBUTION

POISSON MEAN

DE

NS

ITY

0 5 10 15

0.0

0.2

0.4

0.6

0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

OBSERVED DMFT-SCORES

dmft

0 5 10 15 20 25

0.0

0.1

0.2

0.3

0.4

0.5

0.6

SAMPLED DMFT-SCORES

dmft

T - 115

Page 116: Bayesian tutorials

Model 3: Zero-inflated Poisson model

node mean sd MC error 2.5% median 97.5% start sample

deviance 840.2 13.2 0.1775 820.6 838.5 872.7 2001 6000

p 0.4133 0.02975 3.967E-4 0.3553 0.4135 0.4726 2001 6000

theta 4.369 0.1678 0.00213 4.043 4.368 4.705 2001 6000

• Observe that DIC and pD cannot be determined for a mixture distribution.

• According to the deviance this model is inferior to the Gamma-Poisson mixture model.

T - 116

Page 117: Bayesian tutorials

Model 4: mixture of 4 Poisson distributions

node mean sd MC error 2.5% median 97.5% start sample

deviance 750.7 26.74 0.8008 701.1 749.6 805.0 8001 12000

theta[1] 0.1577 0.09243 0.004099 0.009939 0.1524 0.3549 8001 12000

theta[2] 1.558 0.9824 0.06113 0.1602 1.516 3.742 8001 12000

theta[3] 3.989 1.515 0.095 1.423 3.929 6.539 8001 12000

theta[4] 6.546 0.614 0.02318 5.53 6.483 7.965 8001 12000

• Observe that DIC and pD cannot be determined for a mixture distribution.

• According to the deviance this model is a competitor to the Gamma-Poisson mixture

model. This also shows when the histogram of dmft-scores is compared to a sample

of this distribution.

T - 117

Page 118: Bayesian tutorials

• Based on the posterior median values, the four Poisson components are given be-

low:

0 5 10 15 20

0.0

0.2

0.4

0.6

0.8

1.0

1ST COMPONENT

DMFT SCORE

PR

OB

0 5 10 15 20

0.0

0.1

0.2

0.3

0.4

0.5

2ND COMPONENT

DMFT SCORE

PR

OB

0 5 10 15 20

0.0

0.05

0.10

0.15

0.20

3RD COMPONENT

DMFT SCORE

PR

OB

0 5 10 15 20

0.0

0.05

0.10

0.15

4TH COMPONENT

DMFT SCORE

PR

OB

T - 118

Page 119: Bayesian tutorials

Model 5: mixture of 4 Poisson-Gamma distributions

node mean sd MC error 2.5% median 97.5% start sample

deviance 750.7 26.45 0.6036 699.5 750.3 803.3 20001 12000

theta[1] 0.1693 0.0915 0.003451 0.01351 0.1666 0.3587 20001 12000

theta[2] 1.769 0.9542 0.05261 0.2249 1.701 3.843 20001 12000

theta[3] 4.107 1.521 0.08626 1.45 4.067 6.724 20001 12000

theta[4] 6.597 0.6411 0.02383 5.589 6.52 8.125 20001 12000

• Observe that DIC and pD cannot be determined for a mixture distribution.

• The improvement is little with respect to the Poisson-Gamma model.

T - 119

Page 120: Bayesian tutorials

4.10.5. Advantages of using a Bayesian approach

• The advantage of using a Bayesian approach lies here more in the use of a flexible

software package which allows fitting different and sometimes complex models rela-

tively easy.

• In conclusion, the Bayesian approach is partly a practical choice.

T - 120

Page 121: Bayesian tutorials

4.10.6. Exercises

• Replay the analyses using the above indicated WB-programs.

T - 121

Page 122: Bayesian tutorials

4.11. Examining a geographical trend in caries experience

4.11.1. Description of the problem

• In the Signal Tandmobiel study of 4.10 we examined the geographical spread of car-

ies experience as measured by the dmft-score of the first year’s results.

• Now we evaluate a subgroup of 600 children, randomly selected from the total study

group.

• Besides the epidemiological dental data measured in schools also 3 calibration exer-

cises were done in separate meetings. Sixteen dental examiners were involved in

the total study. In the calibration exercises the examiners assigned dmft-scores to

children involved in the exercise and these scores can be compared to the scores of

the reference examiner (≈ gold standard).

T - 122

Page 123: Bayesian tutorials

• The caries experience in Flanders (1st year’s results) can be geographically repre-

sented by the following map (based on ordinal score of dmft (split up in 3 classes)):

W e s t

F la n d e r s

E a s t

F la n d e r s

B r a b a n t

A n t w e r p

L i m b u r g

[ 0 . 4 9 , 1 . 1 0 ] ( 1 . 1 0 , 1 . 4 0 ] ( 1 . 4 0 , 2 . 2 5 ]

K E Y

• An East-West gradient in caries experience seems to be present, with more caries in

the East.

T - 123

Page 124: Bayesian tutorials

T - 124

• However, when we geographically present the scoring behavior of the dental exam-

iners according to the schools in which they we see the same trend:

( - 1 5 % , - 5 % ] ( - 5 % , 5 % ] ( 5 % , 1 8 % )

K E Y

W e s t

F la n d e r s

E a s t

F la n d e r s

B r a b a n t

A n tw e r p

L i m b u r g

Page 125: Bayesian tutorials

Question:

Is the East-West trend due to a genuine geographical trend in caries experience?

Or is it due to the bias in scoring of the dental examiners?

T - 125

Page 126: Bayesian tutorials

4.11.2. Considered ordinal logistic regression models

• Split dmft-score into 4 ordinal classes of Y: Y = 1 dmft = 0, Y = 2 dmft = 1, Y =

3 1 < dmft < 4, Y = 4 dmft > 4.

• Ordinal logistic model with Y as response and x- & y-co-ordinate of the school (and

other chil nt trend (in

Bayesian sense) in the x-direction. School is also included as a random effect.

• Previous ordinal logistic model supplemented with indicator variable for each ex-

aminer. The result is a much reduced effect in the x-direction.

• First ordinal logistic model whereby the score of the dental examiner is corrected for possible examiner bias with the correction factor estimated from the calibration

data.

d-specific variables) as regressors. The result is a significa

T - 126

Page 127: Bayesian tutorials

Logistic regression model corrected for examiner bias

Binary case (Y = 0 or 1 (no caries caries))

Denote the score from the dental examiner by Y• E and the (true) score by YG, then

( ) ( ) ( )E 0 0 1 GP Y 1 x 1 P Y 1 x= = γ + − γ − γ = , (1)

whereby ( )0 E GP Y 1 Y 0γ = = = and ( )1 E GP Y 0 Y 1γ = = = . Thus γ0 is (1-specificity) and

is (1-sensitivity) for the dental examiner (vis-à-vis the gold standard). γ1

• I f for the gold standard ( )GP Y 1 x= is a (binary) logistic regression model, then for the

dental examiner the relationship is ( ) ( ) ( )TP Y 1 x 1 F x= = γ + − γ − γ β , whereby E 0 0 1

( ) ( )( )

exp xF x

1 exp x=

+.

T - 127

Page 128: Bayesian tutorials

• The parameters γ0 and γ1 can be estimated from a table like

d standard

0 1

gol

0 m00 m01

1 m10 m11

examiner

T - 128

Page 129: Bayesian tutorials

Ordinal case (Y = 1, 2, 3, 4)

• Replace binary logistic regression model by ordinal logistic regression model.

• The corrected model becomes

( ) ( ) ( )T

y 1P Y F x

GE yE Gy yG=

4

j x= = π φ β ,

whereby odel for the gold standard predict-

ing YG = y

( )G

TyF xβ the ordinal logistic regression m

G, and ( )E Gy yπ φ the probability of scoring YE = yE given YG = yG.

• Typically validation sets (and calibration exercises) are rather small in nature, yielding

sparse data. That is why the probability ( )E Gy yπ φ is also modelled.

T - 129

Page 130: Bayesian tutorials

4.11.3. Bayesian approach to estimate the regression parameters in the geo-graphical dental study

• Our model consists of two parts:

Ordinal logistic regression model for the ordinal caries scores of the 600 children

participating in main part of the study.

2. A model for the (sparse) calibration data. Several models were tried out. The

model used here assumes that the dental examiner’s score is located for all values

of the score in-between the worst sco

same way. That is: dental examiner j = w gold standard + (1-w ) worst scorer.

• s were t ative.

1.

rer and the gold standard and that in the

j j

The prior distributions for the parameter aken as non-inform

T - 130

Page 131: Bayesian tutorials

4.11.4. WINBUGS programs

In the file “• geographical dental study.odc” two WB programs are given:

istic regression model with covariates age and gen-

der of the child, the x- and y-co-ordinates of the child’s school and a normal ran-

2. Previous model augmented with the correction for examiner bias, whereby the

correction parameters are estimated from the calibration exercises taking into

-parameters are

only estimated from the calibration data.

1. Bayesian ordinal random log

dom school effect.

account the variability with which the correction parameters are determined. Ac-

tually 2 parallel Markov chains are run, one for the epidemiological data and

one for the calibration data.

• Observe that the WB function cut has been used to ensure that the γ

T - 131

Page 132: Bayesian tutorials

4.11.5. Results

• Ordinal logistic regression without correction:

• 5 chains, with burn-in part of 500x5 iterations. Convergence diagnostics (us-

ing BOA) were applied on the next 500x5 iterations. Estimation was done on

next 1000x5 ite

rations.

• O

• 5 chains, with burn-in part of 1000x5 iterations. Convergence diagnostics

(using BOA) were applied on next 1000x5 iterations. Estimation was done

on next 2000x5 iterations.

sense were also calculated.

rdinal logistic regression with correction:

• DIC and pD were calculated for all models. Bayesian P-values to indicate the signifi-

cance of each regressor in the Bayesian

T - 132

Page 133: Bayesian tutorials

Ordinal logistic regression without correction:

• Logistic

Table of estimates

regression parameters

node mean sd MC error 2.5% median 97.5% start sample

l .004108 -0.4304 -0.2023 0.03483 5001 10000 ambda[1] -0.2024 0.1195 0

pvalue 0.9545 0.2084 0.004954 0.0 1.0 1.0 5001 10000

lambda[2] 0.2403 0.1188 0.004059 0.01441 0.2394 0.4783 5001 10000

pvalue 0.0183 0.134 0.002541 0.0 0.0 0.0 5001 10000

lambda[3] 1.312 0.1316 0.003908 1.059 1.311 1.576 5001 10000

pvalue 0.0 0.0 4.472E-13 0.0 0.0 0.0 5001 10000

T - 133

Page 134: Bayesian tutorials

node mean sd MC error 2.5% median 97.5% start sample

x-coord 0.2023 0.0849 0.001173 0.03893 0.2 0.3724 5001 10000

pvalue] 0.0082 0.09018 8.914E-4 0.0 0.0 0.0 5001 10000

y-coord 0.00372 0.08503 0.001159 -0.162 0.002493 0.1724 5001 10000

pvalue 0.4884 0.4999 0.006431 0.0 0.0 1.0 5001 10000

g -0 - ender .02628 0.1549 0.004162 0.3295 -0.02743 0.2766 5001 10000

pvalue 0.5673 0.4955 0.01141 0.0 1.0 1.0 5001 10000

age 0 351 03 359 .2522 0.1981 0.002203 -0.1 0.25 0.6 5001 10000

pvalue 0.1033 0.3044 0.003283 0.0 0.0 1.0 5001 10000

si 0 801gma² .08498 0.08339 0.004742 0.00 6 0.05602 146 0.3 5001 10000

pvalue 0.0 0.0 4.472E-13 0.0 0.0 0.0 5001 10000

• We observe that there is a significant increase in caries experience in the x-direction

implying an increase of caries in East-Flanders. However, it is not clear whether this

is due to a genuine trend or due to the scoring behavior of the dental examiners.

T - 134

Page 135: Bayesian tutorials

• Model diagnostics

Dbar Dhat pD DIC lambda 17.604 17.600 0.004 17.608

z 1498.480 1478.890 1 19.598 15 8.080total 1516.090 1496.490 19.602 1535.690

T - 135

Page 136: Bayesian tutorials

Ordinal logistic regression with correction:

Table of estimates

• Logistic regression par r

node mean sd MC error 2.5% median 97.5% start sample

amete s

lambda[1] -0.2381 0.1667 0.003723 -0.5726 -0.2367 0.08698 10001 20000

pvalue 0.927 0.2602 0.003799 0.0 1.0 1.0 10001 20000

lambda[2] 0.2466 0.1577 0.003576 -0.06584 0.2475 0.5539 10001 20000

pvalue 0.0596 0.2367 0.00389 0.0 0.0 1.0 10001 20000

lambda[3] 1.281 0.1817 0.003544 0.9266 1.28 1.642 10001 20000

pvalue 0.0 0.0 3.162E-13 0.0 0.0 0.0 10001 20000

T - 136

Page 137: Bayesian tutorials

node mean sd MC error 2.5% median 97.5% start sample

x-coord 0.2199 0.09172 0.001731 0.0455 0.2193 0.3996 10001 20000

pvalue 0.0056 0.07462 8.824E-4 0.0 0.0 0.0 10001 20000

y-coord -0.002996 0.0935 0.001782 -0.1864 -0.004425 0.1772 10001 20000

pvalue 0.5185 0.4997 0.00954 0.0 1.0 1.0 10001 20000

ge -0.0 0. 7 - 5 8 1 nder 2264 189 0.004897 0.394 -0.0213 0.353 1000 20000

p 5 1 value 0. 461 0.4979 0.01134 0.0 1.0 1.0 1000 20000

a 0.2 1356 958 .7282 1 ge 939 0.2188 0.003699 -0. 0.2 0 1000 20000

pvalue 0.08665 0.2813 0.004691 0.0 0.0 1.0 10001 20000

sigma2 0.09045 3 0354 5258 .3679 1 0.1015 0.00527 0.0 0.0 0 1000 20000

p 3 1 value 0.0 0.0 3.162E-1 0.0 0.0 0.0 1000 20000

T - 137

Page 138: Bayesian tutorials

Calibration parameters (first 8 dental examiners)

2 97.5 start

node mean sd MC error .5% median % sample

w[1] 0.3838 0.1751 0.001604 0.05077 0.3864 0.7147 10001 20000

w[2] 0.5591 0.1717 0 20000 .001522 0.1802 0.5768 0.8416 10001

w[3] 0.2977 0.1938 0 0.0146 0.2723 0.7111 10001 20000 .001606

w[4] 0.4359 0.1777 0 20000 .00149 0.07907 0.4426 0.7567 10001

w[5] 0.3566 0.1768 0 0.6991 10001 20000 .001738 0.038 0.3552

w[6] 0.7325 0.1427 0 0.9419 20000 .001323 0.3901 0.7548 10001

w[7] 0.6989 0.1885 0 0.2448 0.7343 0.9585 10001 20000 .001884

w[8] 0.6325 0.1635 0 0.8884 20000 .001467 0.2597 0.654 10001

T - 138

Page 139: Bayesian tutorials

• Model diagnostics

Dbar at I Dh pD D CM 3 8. 5 7 16 .002 14 499 14. 03 1 7.504lambd . 5 1 7a 17 602 17. 90 0.0 3 1 .615 z 0 2 515 5.840 1484.610 21. 32 1 27.070 total 8 7 7 16 6.440 1650.690 35. 48 1 22.190

• The signifi i s a e t after correc-

tion for exa e e o en caries ex-

perience (as if scored only by the gold stand

cant ncrea e in c ries experienc in the x-direc ion remains

min r bias. The model expresses now th relati nship betwe

ard) and x-coordinate, y-coordinate, gen-

der and age + random effect for school.

T - 139

Page 140: Bayesian tutorials

4.11.6. Advantages of using a Bayesian approach

• There were two reasons to choose for a Bayesian approach here:

sixteen examin al

study. The B appro ds pos distributi he correction

terms which can be used in a subsequent study as priors.

• The dental ers will so be involved in a subsequent dental

ayesian ach yiel terior ons of t

• The WINBUGS program allows taking into account the variability with which

the correction parameters are estimated in a relatively simple manner, without

analytic calculations.

T - 140

Page 141: Bayesian tutorials

4.11.7. Exercises

• Replay the analyses using the above indicated WB-program.

T - 141

Page 142: Bayesian tutorials

4.12. A spatial epidemiological analysis on lip cancer in East-Germany

• The risk factors for lip cancer are: smoking, exposure to sunlight and excessive con-

sumption of alcohol.

• Mortality from lip cancer in former East Germany has been under surveillance since

1961. The data for the male population has been described in Möhner, Stabenow and

Eisinger (Atlas der Krebsinzidenz in der DDR 1961-1989). We analyze the data from

the period 1980-1989.

• In total there were 219 sub-regions (see Chapter 7) but here some have been com-

bined to give 195 sub-regions.

4.12.1. Description of the problem

T - 142

Page 143: Bayesian tutorials

• It ha been hypothesized that the mortality rates tend to cluster geographics ally and

that the risk of the disease among those engaged in occupations such as farming,

fishing and forestry is twice as high as the risk of those not engaged in such occupa-

tions. The variable AFF represents the percentage of the population in the sub-region

involved in farming, fishing and forestry.

T - 143

Page 144: Bayesian tutorials

4.12.2. Descriptive maps

When a map has been created before (see Geobugs Manual) observed data as well

as output from a Bayesian analysis can be displayed in a spatial manner. The map

“Germany.map” is needed here, and should be included in the directory of maps.

• We show on the next page the map of East-Germany displaying the standardized

mortality rates according to the 195 sub-regions.

• Other maps, e.g. based on measured covariates can also be displayed.

• From the disease map a clear North-West gradient in relative risk is observed.

T - 144

Page 145: Bayesian tutorials

(49) < 0.79

(50) 0.79 - 1.05

values for smr

N(48) 1.05 - 1.58

(48) >= 1.58

100.0km

T - 145

Page 146: Bayesian tutorials

4.12.3. Candidate models for disease mapping

• As seen in Chapter 7, the disease map based on the SMRs provides not a good basis

for spatial inference. The reason for this is that the SMRs are based on the data of

the sub-region only and hence shows too high variability. For this reason one needs

to base the disease map on a statistical model which takes into account that the SMR

of a sub-region is related to the SMRs of other (neighboring) sub-regions.

• On the next page we consider some possible models.

T - 146

Page 147: Bayesian tutorials

• The following models will be considered:

• Model 1: Poisson-Gamma model: this model has been considered in Chap-

e risks, the

area-specific random effects are decomposed into correlated and uncorrelated

ter 7, but for the 219 sub-regions.

• Model 2: Poisson-lognormal model: this model allows including covariate

information in an easier way.

• Model 3: Besag, York and Mollié model: in this model for relativ

heterogeneity

T - 147

Page 148: Bayesian tutorials

4.12.4. Model 1: Poisson-Gamma model

• The

that

model is the same as given in Chapter 7, i.e. for the jth sub-region we assume

( )j j jO Poisson E∼ θ (Poisson part of the model)

with O the observed counts, Ej j the expected counts (calculated from a

reference distribution), and θj the true (underlying) relative risks

Further, it assumed that

( )j Gamma ,θ ∼ α β (Gamma part of the model)

with α and β the parameters of the prior distribution of the relative risks.

T - 148

Page 149: Bayesian tutorials

• The WB program is in “East-Germany Lip Mortality males Poisson-Gamma model.odc”. Three chains were initiated and convergence was checked after 10,000

c

The estimates are based on extra 5,000 iterations per chain.

• Some of the output is given in the table below:

n

iterations per chain. The Brooks-Gelman-Rubin diagnosti showed convergence.

ode mean sd MC error 2.5% median 97.5% start sample

mean 1.191 0.04566 7.505E-4 1.105 1.19 1.285 10001 15000

var 0.2468 0.04506 0.002104 0.1719 0.2425 0.3466 10001 15000

alpha 5.908 0.9032 0.04491 4.347 5.837 7.859 10001 15000

beta 4.97 0.8101 10001 15000 0.0402 3.577 4.908 6.707

• The value of pD is equal to 122.356 and DIC = 1125.810.

T - 149

Page 150: Bayesian tutorials

• The disease map based on the posterior mean of the relative risks (in quartiles) is

given below.

T - 150

Page 151: Bayesian tutorials

• As seen in Chapter 7, there is shrinkage of the relative risks compared to the SMRs.

• Further, the shrinkage implies a more smooth disease map.

T - 151

Page 152: Bayesian tutorials

4.12.5. Model 2: Poisson-lognormal model

( )• The Poisson part of the model remains, thus j j jO Poisson E∼ θ .

• But now it assumed that

( )j jlog vθ ∼ α + (Lognormal part of the model),

with ( )2j vv N 0,σ∼ .

• Including covariates into the Poisson-lognormal model is easily done. For instance,

one could assume that the mean of ( )jlog θ varies with the covariate AFF. Namely,

j 0 1 jAFFα = β + β × .

This is called an ecological analysis.

T - 152

Page 153: Bayesian tutorials

• The ap of AFF can also be produced withm WB (see next page). A similar North-

South trend in AFF is observed.

• We analyze here:

The impact of AFF on the relative risk.

• East-Germany Lip Mortality males Poisson-lognormal model with AFF.odc”. Three chains were initiated and convergence was checked

• The WB program slightly deviates from the model specified on the previous page in

The North-South gradient after taking into account AFF.

The WB program is in “

after 10,000 iterations per chain. The Brooks-Gelman-Rubin diagnostic showed con-

vergence. The estimates are based on extra 5,000 iterations per chain.

the sense that ( )2j vv N ,α σ∼ or ( )2,j j vv N α σ to improve convergence. ∼

T - 153

Page 154: Bayesian tutorials

T - 154

Page 155: Bayesian tutorials

• On the next two pages we show the map of the posterior mean of θj and of the

Bayesian residual subtracting the impact of AFF on θj.

• Here are some of the posterior estimates:

node mean sd MC error 2.5% median 97.5% start sample

beta0 -0.2868 0.06964 0.001309 -0.4254 -0.2866 -0.1514 10001 15000

beta1 2.234 0.3545 0.006719 1.537 2.232 2.935 10001 15000

sigmatau 0.3829 0.03103 7.798E-4 0.3251 0.3822 0.4456 10001 15000

• The second map shows that there is still a North-South gradient (although less pro-

nounced) after having taken into account the impact of AFF.

T - 155

Page 156: Bayesian tutorials

T - 156

Page 157: Bayesian tutorials

T - 157

Page 158: Bayesian tutorials

4.12.6. Model 3: Besag, York and Mollié model

• This model for relative risks decomposes the area-specific random effects into a

component that takes into account the effects that vary in a structured manner in

space (correlated heterogeneity) and a component that models the effects that

vary in an unstructured way between areas (uncorrelated heterogeneity).

• The model introduced by Clayton and Kaldor (1987) and developed by Besag et al.

(1991) is called the CAR model (conditional autoregressive model) and is given by

( )j j jO Poisson E∼ θ ,

( )j j jlog u vθ ∼ α + + ,

where α is an overall level of the relative risk, uj is the correlated heterogeneity and

vj is the uncorrelated heterogeneity. The u and v parameters are surrogates for un-

known or unobserved covariates.

T - 158

Page 159: Bayesian tutorials

• The assumptions are:

( )2for the uncorrelated heterogeneity: j vv N 0,τ∼

for the correlated heterogeneity:

( )2 2j k u j ju u ,k j, N u ,⎡ ⎤≠ τ τ⎣ ⎦ ∼

j k kj1u u

kkjk

= ωω ∑∑

kj 1ω = , if k, j are adjacent and 0, if not

• Parameters 2vτ and 2

uτ control the variability of v and u. When 2vτ is close to zero

there is practically no unstructured variability, a high value means substantial but un-

structured extra-Poisson variability. When 2uτ is close to zero there is practically no

spatial variation, a high value means a large structured variation.

T - 159

Page 160: Bayesian tutorials

• As prior distributions for 2vτ and 2

uτ gamma distributions were taken.

• cessary to include a file containing an adjacency

matrix which can be generated using the Adjacency Tool from the Map menu. The

• car.normal function.

A more robust version of this function (using the double exponential function) is

one must impose a constraint to ensure that the model is identifiable; Besag and

Kooperberg (1995) show that constraining the random effects to sum to zero and

specifying a separate intercept term with a location invariant Uniform[-∞,∞] prior is

To fit the model in WINBUGS it is ne

weights must also be entered as data.

The WB function to specify the structured heterogeneity is the

given by the car.l1 function.

• The CAR model is improper (the overall mean of the uj’s is not defined). Therefore

T - 160

Page 161: Bayesian tutorials

equivalent to the uncon raine parameterization with no intercest d pt. The improper

(flat) prior is obtained from the dflat() function.

e was checked after 10,000 iterations per

chain. The Brooks-Gelman-Rubin diagnostic showed convergence. The estimates

• or estimates:

The WB program is in “East-Germany Lip Mortality males CAR model.odc”. Two

chains were initiated and convergenc

are based on extra 5,000 iterations per chain.

Here are some of the posteri

node mean sd MC error 2.5% median 97.5% start sample

deviance 92.66 8.558 0.106 76.64 92.39 110.2 5001 10000

sigma2.area.struct 0.2853 0.1295 0.008849 0.08086 0.269 0.5579 5001 10000

sigma2.area.unstr 0.05421 0.02989 0.002096 0.006869 0.05293 0.1152 5001 10000

T - 161

Page 162: Bayesian tutorials

• From the above output we can conclude that 2vτ is much smaller than 2

uτ indicating lit-

tle unstructured variation and a much clearer spatial dependence of the sub-regions.

• The value of pD is equal to 105.4 and DIC = 1091.450.

On the next page we show the map of the posterior mean of θj. A similar smoothing

as with the previous models is observed.

T - 162

Page 163: Bayesian tutorials

T - 163

Page 164: Bayesian tutorials

• As for the Poisson-lognormal model we can also evaluate the impact of AFF on

the relative risks. This can be done as in the Poisson-lognormal model, i.e.:

(1)

with

( )j j j jlog u vθ ∼ α + + ,

j 0 1 jAFFα = β + β ×

• Another way is to assume that

(2)

where βj are specific to the sub-region with a

spatial structure like the uj terms

( )j jlog θ ∼ α ,

j 0 j jAFFα = β + β × and the

T - 164

Page 165: Bayesian tutorials

• es C e was

checked after 10,000 iterations per chain. The Brooks-Gelman-Rubin diagnostic

• Here are some of the posterior estimates:

sd MC error 2.5% median 97.5% start sample

The WB program for the first approach is in “East-Germany Lip Mortality malAR model with AFF.odc”. Two chains were initiated and convergenc

showed convergence. The estimates are based on extra 10,000 iterations per chain.

node mean

beta 0.9 20000 73 0.4338 0.0178 0.1355 0.9722 1.832 10001deviance 93. 0000 73 8.749 0.09018 77.37 93.41 111.7 10001 2sigma2.area.struct 0.2343 0.1387 0.008434 0.04442 0.2057 0.552 10001 20000 sigma2.area.unstr 0.05974 0.00204 0.00113 0.06034 0.1246 10001 20000 0.03381

move much of the random spatial structure; the unstructured random part remained

about the same.

Also now AFF seems to be of importance. However, the inclusion of AFF did not re-

T - 165

Page 166: Bayesian tutorials

• The value of pD is equal to 104.1 and DIC = 1092.270.

On the next page we show the fitted map and the map of residuals, i.e. of j j jˆO E• − θ .

T - 166

Page 167: Bayesian tutorials

T - 167

Page 168: Bayesian tutorials

• The WB program for the second approach is in “East-Germany Lip Mortality males AFF CAR model.odc”. Two chains were initiated and convergence was

checked after 10,000 iterations per chain. The Brooks-Gelman-Rubin diagnostic

showed convergence. The estimates are based on extra 10,000 iterations per chain.

• Here are some of the posterior estimates:

node mean sd MC error 2.5% median 97.5% start sample

deviance 177.0 11.19 0.193 155.7 176.8 199.7 10001 20000 sigma2.area.struct 18.85 4.639 0.1099 11.15 18.45 29.22 10001 20000

• The importance of AFF is now seen in 2uτ and in the map illustrating the impact of

AFF in each sub-region (see below).

T - 168

Page 169: Bayesian tutorials

• The value of pD is equal to 83.71 and DIC = 1238.430. Hence DIC is much higher

than in the previous model. However, observe that this model does not contain un-

structured variability.

T - 169

Page 170: Bayesian tutorials

T - 170

Page 171: Bayesian tutorials

• The WB program including unstructured variability is found in “East-Germany Lip Mortality males AFF CAR + unstr model.odc”. Two chains were initiated and

convergence was checked after 10,000 iterations per chain. The Brooks-Gelman-

Rubin diagnostic showed convergence. The estimates are based on extra 10,000 it-

erations per chain.

• Here are some of the posterior estimates:

node mean sd MC error 2.5% median 97.5% start sample

deviance 95.05 8.758 0.07839 78.66 94.82 112.9 10001 20000 sigma2.area.struct 3.393 1.496 0.06237 1.142 3.149 6.837 10001 20000 sigma2.area.unstr 0.1033 0.02125 5.279E-4 0.06724 0.1015 0.1503 10001 20000

• Compared to the previous model the deviance has decreased considerably. The

value of pD is equal to 109.23 and DIC = 1100.0. Hence by including the unstruc-

tured variability also the value of DIC has decreased considerably.

T - 171

Page 172: Bayesian tutorials

T - 172

Page 173: Bayesian tutorials

4.12.7. Exercises

• Replay the analyses using the available WB-programs.

• Write the program “East-Germany Lip Mortality males Poisson-lognormal model with AFF.odc”.

• Write the program “East-Germany Lip Mortality males AFF CAR model.odc”.

• Use GeoBugs to display your maps.

T - 173