9
http://www.jstor.org Putting Background Information about Relative Risks into Conjugate Prior Distributions Author(s): Sander Greenland Source: Biometrics, Vol. 57, No. 3, (Sep., 2001), pp. 663-670 Published by: International Biometric Society Stable URL: http://www.jstor.org/stable/3068401 Accessed: 22/05/2008 10:02 Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you may use content in the JSTOR archive only for your personal, non-commercial use. Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at http://www.jstor.org/action/showPublisher?publisherCode=ibs. Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed page of such transmission. JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We enable the scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

  • Upload
    others

  • View
    2

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

http://www.jstor.org

Putting Background Information about Relative Risks into Conjugate Prior DistributionsAuthor(s): Sander GreenlandSource: Biometrics, Vol. 57, No. 3, (Sep., 2001), pp. 663-670Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/3068401Accessed: 22/05/2008 10:02

Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless

you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you

may use content in the JSTOR archive only for your personal, non-commercial use.

Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained at

http://www.jstor.org/action/showPublisher?publisherCode=ibs.

Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

page of such transmission.

JSTOR is a not-for-profit organization founded in 1995 to build trusted digital archives for scholarship. We enable the

scholarly community to preserve their work and the materials they rely upon, and to build a common research platform that

promotes the discovery and use of these resources. For more information about JSTOR, please contact [email protected].

Page 2: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

BIOMETRICS 57, 663-670

September 2001

Putting Background Information About Relative Risks into Conjugate Prior Distributions

Sander Greenland

Department of Epidemiology, UCLA School of Public Health, and

Department of Statistics, UCLA College of Letters and Science, 22333 Swenson Drive, Topanga, California 90290, U.S.A.

SUMMARY. In Bayesian and empirical Bayes analyses of epidemiologic data, the most easily implemented prior specifications use a multivariate normal distribution for the log relative risks or a conjugate distribution for the discrete response vector. This article describes problems in translating background information about relative risks into conjugate priors and a solution. Traditionally, conjugate priors have been specified through flattening constants, an approach that leads to conflicts with the true prior covariance structure for the log relative risks. One can, however, derive a conjugate prior consistent with that structure by using a data-augmentation approximation to the true log relative-risk prior, although a rescaling step is needed to ensure the accuracy of the approximation. These points are illustrated with a logistic regression analysis of neonatal-death risk.

KEY WORDS: Bayesian analysis; Data augmentation; Epidemiologic methods; Exponential regression; In- terpretation; Logistic regression; Log-linear models; Odds ratio; Poisson regression; Relative risk; Risk as- sessment; Risk regression.

1. Introduction

Although Bayesian methods have become reestablished in gen- eral statistics (e.g., Gelman et al., 1995; Carlin and Louis, 1996; Leonard and Hsu, 1999; Lindley, 2000), they remain uncommon in epidemiologic applications, in part because epi- demiologists perceive specification of the prior distribution as impractical and in part because few epidemiologists will employ methods not available in the leading software pack- ages. These obstacles are easily overcome provided both the prior and the likelihood for the parameters can be reason- ably approximated by multivariate normal (MVN) distribu- tions (Witte et al., 2000). With very small or sparse data sets, however, better approximations or exact computations may be needed. For the latter purpose, a conjugate prior dis- tribution is much easier to use than an MVN prior; in fact, no special software is needed for exact conjugate analysis (Clogg et al., 1991; Bedrick, Christensen, and Johnson, 1996, 1997). It might then seem that the only obstacle is prior specifica- tion.

The present article concerns accurate use of prior infor- mation about relative risks in the type of discrete-data re- gressions common in epidemiology. Classical use of conju- gate priors in these regressions treat them as adding pseudo- observations (flattening constants) to observed cell counts (Good, 1983, Chapter 9; Clogg et al., 1991). I argue that this approach will usually conflict with true relative-risk priors and is impractical for incorporating prior information about relative risks into the analysis. This leaves the MVN prior for log relative risks as the simplest defensible option for epi-

demiologic regression. This prior can be approximated by a conjugate data-augmentation prior (DAP) and so is easily implemented with ordinary software (Bedrick et al., 1996) (this approach should not be confused with data augmen- tation for posterior simulation (Tanner and Wong, 1987)). It will be shown, however, that this DAP approximation requires a rescaling step to ensure its accuracy. Improved approxi- mate marginal posterior intervals can then be obtained us- ing penalized-likelihood profiles. These points are illustrated in the analysis of a very sparse data set on hospital neonatal mortality.

To establish notation, suppose there are N possible distinct covariate patterns indexed by i = 1,..., N, with yi cases (fail- ures) observed out of a total ni at each pattern. The ni are total counts in logistic regression and are the time-on-test to- tals in Poisson regression (Breslow and Day, 1980, 1987). Let y and n be the column vectors of yi and ni, respectively, and X the N x J design matrix with rows xi; the first component of xi may be identically one. For any pair of N-vectors u and v and scalar function f, let uv, uv, and f(u) be the N-vectors of ele- mentwise products uivi, powers uvi, and functions f(ui); u'v is the usual inner product Ei uivi. Finally, let 7r(u) be the vec- tor logistic transform (l+e-u)-l; note that 1-7r(u) = 7r(-u). The logistic model assumes that E(y) = 7r(XPf)n, with the yi independent binomial given 3; the Poisson-regression model assumes that E(y) = exp(XP3)n, with the yi independent Poisson given 3. These models and the methods below are quite flexible in that X may contain functions of more funda- mental covariates (e.g., products, powers, or spline terms).

663

Page 3: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

Biometrics, September 2001

Table 1 Summary data and ordinary maximum-likelihood odds-ratio estimates (MLE) from the first

year of a study of neonatal mortality (Neutra et al., 1977); 17 deaths out of 2992 total

No. deaths No. subjects 95% LR Regressor x with x $ 0 with x / 0 exp(/3j) limitsb

Nonwhite 5 556 1.9 .51, 6.3 Young age 3 356 1.6 .38, 6.6 Nulliparity 8 1532 1.5 .50, 4.9 Prematurity 10 592 4.9 2.4, 10 Isoimmunization 1 45 3.0 .62, 8.5 Previous abortion 2 468 .72 .12, 2.3 Hydramnios 1 10 60 2.8, 478 Dysfunctional labor 2 518 .50 .035, 2.8 Placental/cord abnormality 1 30 3.1 .15, 20 No monitor 3 694 1.2 .35, 5.9 Multiple birth 3 47 8.2 1.5, 33 Public ward 6 866 .86 .25, 2.8 Premature rupture 1 96 .54 .027, 3.2 Malpresentation 3 117 3.9 .73, 15

a Codes: All variables are indicators except young age (0 = 20+, 1 = 15-19, 2 = under 15), prematurity (0 = no, 1 = 36-38 weeks, 2 = 33-36 weeks; under 33 weeks excluded), isoimmunization (0 = no, 1 = Rh, 2 = ABO), dysfunctional labor (0 = no, .33 = prolonged, .67 = protracted, 1 = arrested). b 95% confidence limits computed from profile-likelihood ratio.

2. A Sparse-Data Regression Problem Table 1 summarizes data from the first year of a study of neonatal care at a teaching hospital (Neutra et al., 1977). The problem considered here is use of these data to predict neona- tal deaths in the ensuing year. These data, though old, com- bine several features of special value for illustrating the points made here. First, they are sparse and unbalanced in the ex- treme: They contain only 17 deaths (Ei yi = 17) among 2992 subjects in 455 covariate patterns scattered across the 110,592 possible combinations, so one should expect inferences to be very sensitive to choice of model and fitting method. Second, despite the sparsity, ordinary maximum-likelihood logistic re- gression with 14 covariates plus an intercept (shown in the

table) converges readily, thus inviting naive use of the ordi- nary asymptotic results. Third, the time period at issue in- volved a rapid, unexplained decline in the death rate, and so individual covariates were of interest as possible explanatory factors in the decline. Finally, prior clinical information indi- cated strongly that at least one coefficient (hydramnios) was badly overestimated, and predictions of total deaths were very sensitive to its value (Greenland, 1993).

Most analysts would reduce the model with some sort of variable-selection algorithm, and typical algorithms select or retain only three variables for the model (prematurity, hy- dramnios, multiple birth). Consonant with simulation studies (e.g., Roecker, 1991), this reduced model provided no bet- ter mortality predictions in the ensuing year than the full maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland, 1993). Thus, Bayesian analyses is used to see if improvement is possible.

2.1 Specifying the Prior One computationally easy option is the multivariate normal prior 3 - MVN(/,, T). An even easier option for the logistic

model is the conjugate prior density

t > i I v,y\b Tf exp(aixi/3) f(3; X,a,b) oc [7(X )a]) (-X/)b = b [1 + exp(xi/)]ci'

(1) where a and b are N-vectors of user-specified flattening con- stants ai, bi, with c = a + b (Clogg et al., 1991); it yields the posterior density

f(3 I y,n;X,a,b) cx [71(X3)1Y+a]'r(-Xp)ny+b, (2)

whose mode and profiles can be found from standard logistic- regression software by entering y + a and n + c as the observed case and total count vectors, respectively. For Poisson regression, the conjugate prior becomes

f(/3; X, a, c) oc exp ( z aixi - cie i) \i

(3)

which yields the posterior density

f(,3 I y, n; X, a, c) oc exp (yi + ai)xi3- (ni + ci)ex P

(4) Note that the conjugate priors (1) and (3), and hence prior mean and covariance matrix p, and T of 3, are functions of X as well as a and b or c.

To complete the prior specification, suppose we have prior information in the form of 95% prior intervals (wLj, wUj) =

(efLj, e3Uj) for each odds ratio e3j conditional on the

remaining odds ratios. At the time of the example study, perhaps as well as one could do was to assume independent /3j and classify the conditional odds ratios into four categories:

(1) probably near null: (wLj,WUj) = (1/4,4), comprising previous abortion;

664

Page 4: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

Relative Risks in Conjugate Prior Distributions

(2) probably moderately positive: (wLj,WUj) = (1/2,8), comprising nonwhite, younger age, null parity, arrested labor, placental/cord abnormality, no monitor, public ward, PROM;

(3) probably strongly positive: (wLj, Uj) = (1,16), com-

prising preterm delivery, isoimmunization, hydramnios, multiple birth; malpresentation.

(4) the intercept (wLl,WU1) = (1/10,000,1/200), based on prior hospital experiences with neonates that lacked all of the risk factors in Table 1.

Information from these prior intervals can be incorporated into independent normal priors by reversing the usual steps for interval estimation (Cox and Hinkley, 1974, p. 384; Greenland, 1992; Faraggi and Simon, 1997): Under normality, the prior mean and standard deviation for 3j are

j = (iLj + pUj)/2 = ln [(WLjuj)1/2] and

rj = (3Uj - /Lj)/2(1.96) = ln(wLj/wUj)/3.92.

We thus obtain ,uj = 0 for category 1, /,j = ln(2) for category 2, /Ij = ln(4) for category 3, /1 ln(1/1400) for the intercept, r2 = [ln(16)/3.92]2 = 0.50 for categories 1-3, T2 = [ln(50)/3.92]2 = 1, and T = diag(r2).

Now consider the conjugate prior. To ensure that the flat- tening constants a and b are consistent with prior information about /3, we should at least require that they reflect the prior predicted risks 7r(X/u) or rates exp(X/j), e.g., by imposing the constraint a = 7r(Xp)c in logistic regression and a = exp(Xp,)c in Poisson regression. Although this provides only a rough translation of the prior information because the conjugate prior is asymmetric, the chief problem is determining the prior totals ci. Some authors use a simple constraint such as ci = constant (Clogg et al., 1991) or ci oc ni to fix relative values for the ci and then fix the total number of observations added by the prior, c+ = Ei ci. Unfortunately, the distribution of the ci implies a covariance structure for 3 that would only fortuitously cohere with true prior information about 3. To see this, suppose c+ was large enough so that the conjugate prior could be approximated by a multivariate normal distribution. The prior covariance matrix for 3 would then be P - (X'WcX)-1, where Wc is a function of c. If P is not diagonal, the conjugate prior will not yield independent Pj, and if J < N, pu - 0, and ci is not constant, P will be far from diagonal for most choices of flattening constants. Taking ci constant creates other problems: Because subjects rarely have more than two risk factors, over 99.5% of the covariate patterns were never observed and there is no significant prior information about risk at those patterns; also, adding them to the data creates a huge computing burden (110,592 versus 455 data records in the analysis).

In light of the preceding problems, the flattening-constant approach can be regarded as unworkable in this setting, despite its computational simplicity. It will be shown next, however, how a closely related approach can overcome these problems. 2.2 A Data-Augmentation Prior

Starting with an MVN prior, one can stay within the con- jugate computational framework by transforming the prior

into pseudo-data that augment the observed data (Landaw, Sampson, and Toporek, 1982; Bedrick et al., 1996). Let Xo, yo, and no be the design matrix, response vector, and total vector for observed covariate patterns, respectively. The following development will focus on the special case in which T is diagonal (independent Pj). The general case of arbitrary T follows immediately by initially reparameterizing the model to 3* = T1/2,, ,/* = T-1/2/,, and transforming the covariates to X0 = XoT1/2, which yield the model E(y) 7r(Xo/3*)n and prior /3* MVN(,*,I), where T1/2 is the Cholesky square root of T (Bedrick et al., 1996). Alternatively, one may hierarchically model the sources of prior dependence to produce an independence prior (Kass and Steffey, 1989; Greenland, 1992, 2000).

The posterior density based on independent prior densities

fj (3j) for the /3j is proportional to

[lr(X,)Y]'1,(--Xo)n-Y II f 3(3) J

(5)

The key step for data augmentation is to numerically approximate each fj(/3j) by a density that is functionally identical to a binomial likelihood contribution and that has approximately the correct prior mean tLj and variance r2. Given appropriate choices for Yaj and naj, one such density is the logistic-beta density

j (/3j) cx 7r(/3j)Yaj 7r(-/3j)naj-Yaj (6)

derived by the change of variable Pj = logit(pj) from a

beta(yaj, naj - Yaj) density for pj - r(3j),

hj (pj) oc pij-l p(1 pj)n"-Y- (7)

using the relation d7r(/3)/dl3 = r(/3)7r(-/3). From the standard beta-distribution formulas,

E(pj) = yaj/naj, Y aj (naj - Yaj)

var(pj) na.(n,j+1)

(8)

hence, by setting E(pj) = 7r(/lj) and var(pj) = 7r(/Zj)2x

7r(-,Lj)2r2, we obtain Yaj = 7r(/lj)naj, naj = [7r(/Lj)x

7r(-j)Trj]- 1 - 1, which yield the linear approximations

E(/j) - logit[E(pj)] = /j

(which is the exact 3j prior mean if pj = 0) and

var(/jj) - [r(L)ir(--/)]-2var(pj) = r2.

(9)

(10)

A parallel derivation using a log-gamma prior for 3j yields Yaj = exp(pj)naj and naj = r72. In either case, it follows that the posterior distribution is approximately proportional to the likelihood from the augmented data [X , Xa]', [Yo, Yal', [nO, na]', where Xa is a J x J identity matrix.

Each row of the augmenting data corresponds to a model coefficient. Thus, if the model has an intercept, it must be fit to the augmented data using the no-intercept or no-constant option, and a column of ones must be included in Xo; that column will be augmented by the column in Xa that is one for the row corresponding to the intercept parameter and is zero elsewhere. For semi-Bayes (partial-Bayes or mixed-

665

Page 5: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

Biometrics, September 2001

Table 2 Posterior geometric modes, estimated medians, and approximate 95% intervals for odds

ratios exp(,3j) from data in Table 1, using independent normal priors for each 3j (see text)

Geometric Estimated Posterior intervals from Geometric Estimated

Regressor x mode mediana PPLb Metropolisc

Nonwhite 1.8 1.7 .71, 4.2 .69, 4.5 Young age 1.6 1.7 .63, 4.1 .64, 4.0 Nulliparity 1.5 1.6 .67, 3.6 .65, 3.7 Prematurity 4.5 4.6 2.5, 8.1 2.5, 8.3 Isoimmunization 2.4 2.2 .85, 5.7 .80, 5.6 Previous abortion .83 .78 .31, 1.9 .27, 1.8 Hydramnios 6.1 5.9 1.6, 22 1.5, 22 Dysfunctional labor 1.2 1.2 .41, 3.3 .44, 3.2 Placental/cord abnormality 2.3 2.3 .65, 7.2 .61, 7.2 No monitor 1.7 1.8 .68, 4.8 .73, 5.3 Multiple birth 5.2 5.3 1.8, 14 1.8, 15 Public ward 1.3 1.3 .53, 3.0 .55, 3.0 Premature rupture 1.2 1.2 .41, 3.3 .42, 3.1 Malpresentation 3.9 3.8 1.4, 10 1.3, 10

a Median of Metropolis-sampler output. b Profile penalized likelihood (profile posterior density). c 2.5th and 97.5th percentiles of Metropolis-sampler output.

model) analysis, one may delete the augmented-data rows cor- responding to parameters one wants estimated from the data alone (Bedrick et al., 1996).

Note that, unlike the flattening-constant approach, the above DAP may introduce rows in Xa that contain impos- sible covariate values (e.g., zero age), reflecting the fact that the DAP is a prior for the coefficients rather than the condi- tional response means.

2.3 Accuracy of the Data Augmentation There are two approximations used in the augmentation: First, a logistic-beta distribution is used to approximate a normal prior; second, a linearization of the logit transform is used to match the mean and variance of this approximating distribution to the normal prior. It turns out the error in both steps can be made arbitrarily small. First, from the central-limit effect on binomial likelihoods, it can be seen that the logistic-beta density 7r(3j )Yaj 7r(-/3j)naj -Yaj will

approach a normal(/tj, r?) form as naj increases and hence as r7 shrinks. Second, the logit transform is most linear within small neighborhoods. The accuracy of both approximations can be improved by rescaling the 3j prior transform to

pj = 7r(3j/s) so that pj is highly concentrated. This leads to using Xa/s, ,l/s, and r/s in place of Xa, a, and T in the above expressions for the augmenting data Xa, Ya, na, with s large enough so that all Yaj and naj -Yaj are large (e.g., above 20) and so that each 7r(3j /s) is close to linear over most of the

,3j prior mass (e.g., over the interval p,j ? 1.96rj). The only limit on s is machine (numeric) precision. In the example, s = 20 meets both criteria and is well within the numeric limit. Note that naj is an increasing function of s and so does not measure the amount of information in fj (fj).

To extend data augmentation to conditional logistic and proportional-hazards regression, Greenland and Christensen (2001) apply the above approximations in the reverse order,

first recentering as well as rescaling Pj to (3j - pj)/rjS and applying a beta instead of logistic-beta approximation to the normal prior density. This standardization of 3j further improves both approximations and also simplifies the form of the augmenting counts (to Yaj = naj /2 = 2s2 - 3/2). However, it also requires use of an offset column [f/, (-XapI/rs)']', where f is either the column of offsets for the actual data or a column of N zeros if there are no offsets in those data (Greenland and Christensen, 2001). Because the standardization did not noticeably improve the results below, it is omitted here.

2.4 Results Table 2 presents two sets of results from the example using the independent normal priors directly, with two extremes of computational intensity. The first set is from an approach that makes use of the proportionality of the posterior density f(,/; y) to a penalized likelihood whose logarithm is

PL(,3; y) L(3; y) + ln{f(3)}, (11)

where L(3; y) is the ordinary log likelihood and f(3) is the prior density for 3. Assuming that PL(3; y) is smooth and concave downward, as in the present example, let P be the posterior mode, also known as the maximum a posteriori (MAP) or maximum penalized-likelihood (MPL) estimate, let 3(/3j) be the restricted MPL estimate when component j of 3 is held fixed at Pj and let PPL(/3j;y) - PL[/3(3j);y] be the corresponding restricted maximum. The profile PPL column in Table 2 gives the profile penalized likelihood (profile posterior) intervals found by solving

-2[PPL(/j; y) - PL(P; y)] = Sa, (12)

where Se is chosen so that the posterior probability content of the resulting interval approximates 1 - a; Sa = 3.84, the 95th percentile of the Xi distribution, was used here.

666

Page 6: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

Relative Risks in Conjugate Prior Distributions

Y L . / , , ,"? '/ ,, .. : .\

X'.

0.0 0.5 1.0 1.5 20 2.5 3.0 3.5 4.0

hydromnios coefficient

Figure 1. Plot of ordinary (unpenalized) profile likelihood

(dashed curve), prior density (sparse dots), marginal posterior density estimate from Metropolis sampler (solid curve), and profile penalized likelihood (profile posterior density) (close dots) for the hydramnios coefficient, using independent normal priors. (All curves rescaled to have maximum of one.)

Less accurately, but even more simply, one could set intervals as 3j + Zca,j, where ^2 is a marginal posterior variance estimate for /pj (e.g., diagonal element j of the inverse negative Hessian of PL(,3;y) evaluated at /3) and Z, is the appropriate normal percentile. Because PL(P3; y) has the form of the log likelihood for a random-coefficient model with coefficient distribution f(/3), under an MVN prior these intervals can be computed from standard penalized- likelihood or iterative generalized-least-squares algorithms for fitting generalized-linear mixed models (Breslow and Clayton, 1993; Wolfinger and O'Connell, 1993; Goldstein, 1995; Greenland, 1997), which are implemented in popular software; one must, however, take care to set the random- effects variances to the actual prior variances (Witte et al., 2000). On the other hand, approximations more accurate than (12) can be obtained using a Laplacian method

(Leonard and Tsu, 1999, Section 5.1) but require consi-

derably more computation. Use of (12) avoids determin- ants as well as integrals; with a notebook computer, it took only 10 seconds to get all the PPL intervals in Table 2.

The second set of results in Table 2 are the 50th and 2.5th, 97.5th marginal percentiles from a set of 50,000 draws from the posterior distribution of 3 using a Metropolis sampler (Gelman et al., 1995), which took 49 minutes to generate on the same computer. Five chains of length 11,000 each were started at overdispersed random draws from the approximate (penalized-likelihood) posterior density; the first 1000 draws of each chain were discarded. Convergence was checked using scale-reduction diagnostics (Gelman et al., 1995, p. 331- 332), which were 1.02 or less for each coefficient (1.00 is the limiting value). Despite the extreme data sparsity, there was no practical difference between the PPL and Metropolis intervals.

The marginal posterior distributions appeared moderately skewed but clearly unimodal. Figure 1 shows the normal prior, ordinary profile likelihood, profile penalized likelihood, and a kernel-density estimate from the Metropolis sample for the marginal posterior density of the hydramnios coefficient. It is notable how closely the profile and marginal posterior densities agree for this extreme coefficient. Some of the agreement comes from using (12) rather than a normal(3, j2) approximation to the marginal posterior density; the largest improvement from PPL is for the isoimmunization coefficient, for which the normal 95% posterior limits are exp(/j ?

1.96&j) = .94, 6.2, as opposed to the PPL limits of .85, 5.7 and the Metropolis limits of .80, 5.6 in Table 2.

Table 3 summarizes two conjugate analyses. The first uses flattening constants a = 7r(Xpl)c and b = c- a with c = n/2; this c was chosen because it made the average prior variance for 3 equal that for the MVN prior. Although most of the results resemble those in Table 2, neither this analysis nor others with simple ci (e.g., constant) could shrink the hydramnios coefficient adequately without appearing to overshrink other key coefficients.

The second analysis in Table 3 used augmenting data from the first-order transform of the MVN(/i, T) prior; the results are very similar to those in Table 2, but the hydramnios limits are much wider. This divergence from Table 2 appears due entirely to approximation error: When this analysis was repeated after rescaling the transform using s = 10, all estimates agreed to two digits with those from PPL; s = 20 produced agreement to three digits. An analysis based on a second-order approximation without rescaling (not shown), though also closer to PPL, was not as close as the rescaled first-order analysis. Although based on the same prior for /3, the analyses without and with rescaling (s = 20) added

Ej naj = 1516 and 46,439 prior observations to the actual data, illustrating that the number of observations added by a DAP does not measure the prior information.

As an endnote to the example, in the year after the above data were collected, there were 13 neonatal deaths among 2561 births in the hospital. From the covariate distribution of these births, both the full ML and reduced (three-variable) fits predicted a total of 17 deaths; the PPL and Metropolis fits in Table 2 predicted 13 deaths; the fit in Table 3 using flattening constants predicted 19 deaths; the data-augmentation fits predicted 14 deaths before rescaling and 13 deaths after; and the crude (null) prediction was (17/2992)2561 = 15 deaths. Hydramnios was the major determinant of these results in that the large estimates of its coefficient were most responsible for overestimates of ensuing mortality. While this is only one unstable example, the improved prediction afforded by shrinkage of 3 toward ,t conforms to expectations from theory, simulations, and other examples (Gelman et al., 1995; Carlin and Louis, 1996).

3. Discussion Bedrick et al. (1996, 1997) employed conjugate priors for analyses in which prior information comes in the form of distributions for the conditional means (absolute risks) 7r(X/3). They commented that "it is inherently easier to think about conditional means of observables given the regression variables than it is to think about model-dependent regression

667

Page 7: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

Biometrics, September 2001

Table 3 Posterior geometric modes and approximate 95% posterior intervals for

odds ratios exp(f3j) from data in Table 1, using conjugate priors (see text)

Geometric modes Posterior intervalsa

Regressor x ci = ni/2 Data aug.b ci = ni/2 Data aug.b

Nonwhite 2.0 1.8 .82, 4.8 .68, 4.6 Young age 1.8 1.6 .65, 4.4 .55, 4.3 Nulliparity 1.7 1.5 .77, 4.0 .65, 3.7 Prematurity 4.4 4.6 2.7, 7.3 2.5, 8.3 Isoimmunization 2.5 2.5 .74, 6.0 .80, 6.3 Previous abortion .87 .80 .30, 1.9 .25, 2.0 Hydramnios 26 7.6 1.6, 155 1.6, 58 Dysfunctional labor 1.1 1.1 .29, 3.3 .33, 3.3 Placental/cord abnormality 2.6 2.4 .27, 12 .57, 9.5 No monitor 1.5 1.7 .61, 4.5 .67, 4.8 Multiple birth 6.2 5.6 1.7, 18 1.8, 18 Public ward 1.3 1.2 .55, 3.0 .48, 3.0 Premature rupture 1.1 1.1 .27, 3.4 .32, 3.3 Malpresentation 3.8 3.8 1.3, 11 1.3, 11

a Derived from profile posterior density. b Data augmentation without rescaling; upon rescaling by s > 10, results agree with PPL (Table 2)

to the precision displayed.

coefficients" and "it is extremely difficult to directly spec- ify a prior distribution on regression coefficients" (Bedrick et al., 1996, p. 1450). These comments do not apply in typi- cal epidemiologic problems, where much and sometimes all the available prior information comes from case-control stud- ies analyzed with odds-ratio estimators or logistic regression (Rothman and Greenland, 1998). Such studies rarely supply information about absolute risks. Even cohort studies focus on relative-risk estimation, such as SMR or proportional-hazards regression (Breslow and Day, 1987). As a consequence, epi- demiologists often have much information about relative risks but only vague ideas about absolute risks, especially if a large number of regressors are present.

I believe the comments of Bedrick et al. (1996) arise from the difficulty of specifying priors for coefficients that lack sim- ple contextual interpretations, such as coefficients of probit models for disease risk. This difficulty can be avoided by start- ing with a model that has interpretable parameters, then ex- tending that model in an interpretable fashion. For example, interpretable extensions of a first-order logistic model include models with product terms, whose coefficients can be inter- preted as log ratios of odds ratios. Uninterpretable extensions include mixtures of generalized linear models with different link functions, which introduce abstruse mixture parameters into the problem (Moolgavkar and Venzon, 1987).

Noninformative or reference priors do not solve the problem of prior specification because they often conflict with back- ground information and usually entail epidemiologic absur- dities (Greenland, 1998). In the above example, the standard (improper) noninformative prior has f(f3) constant, which im- plies that a log odds ratio of 1010 is as probable as a log odds ratio of zero. But if a log odds ratio were 1010, every birth in- volving the factor would be followed by death of the neonate. Because ample clinical experience shows that death is uncom- mon even in the presence of any one of the factors in Table 1,

we can be certain that each pj is under 1010; in fact, we can be reasonably sure that every 3j is below ln(50) 4.

Concerns about robustness and sensitivity to priors can be addressed without recourse to improper priors, e.g., by sensi- tivity analysis (Leamer, 1985; Carlin and Louis, 1996, Section 6.1.1) and robust Bayesian analysis (Berger et al., 1996; Rios Insua and Ruggeri, 2000). One can also minimize scientifi- cally important sensitivity by using priors with mass spread across the extant range of substantive opinions about the pa- rameters under study, rather than just the range specified by coinvestigators, and by using a hierarchical structure to incorporate qualitative prior information or exchangeability judgments (Deely and Lindley, 1981; Good, 1983; Kass and Steffey, 1989; Greenland, 1992, 2000; Draper et al., 1993). Of course, when the data are sparse, inferences will be sensitive to all aspects of the analysis model, not just the prior. That sensitivity is all the more reason for choosing models and pri- ors that properly reflect background information.

The derivation of Yaj and naj uses only the prior mean and variance of /j; normality is not assumed. Examples such as that above and in Greenland and Christensen (2001) suggest that, given adequate rescaling, the resulting DAP approxi- mation works well with normal 3j priors. The approximation is, however, based on a logistic-beta prior for Pj, which has heavier tails than the normal and which need not be symmet- ric. This suggests that the approximation can be used with a broader range of 3j priors than the normal as long as care is taken in matching the moments of the true prior and the approximating logistic-beta distribution. It also suggests that its accuracy will be excellent if the true priors are in fact logistic-beta, for then the only approximate step will be use of the profile likelihood to approximate the marginal posterior density. The performance of the approximation with nonnor- mal priors would thus be a worthwhile topic for investigation.

668

Page 8: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

Relative Risks in Conjugate Prior Distributions

ACKNOWLEDGEMENTS

I would like to thank Rob Weiss and the referees for helpful comments on this manuscript.

RESUME

Les analyses bayesiennes et bayesiennes empiriques de donnees epidemiologiques utilisent generalement les lois a priori les plus faciles a implementer, distribution multinormale pour le log des risques relatifs ou distribution conjuguee pour le vecteur des reponses discretes. Cet article decrit les problemes de transposition des connaissances acquises sur les risques re- latifs en termes de distribution conjuguee et propose une solu- tion. Traditionnellement, les conjuguees a priori sont specifiees a l'aide de constantes aplatissantes (pseudo-observations), une approche en contradiction avec la vraie structure de covari- ance a priori des log de risques relatifs. On peut cependant deriver une loi conjuguee a priori en accord avec cette struc- ture en approchant par augmentation des donnees la vraie loi a priori des log de risques relatifs a condition de prevoir une etape de re-etalonnage. Une modelisation logistique du risque de deces neonatal illustre ces points.

REFERENCES

Bedrick, E. J., Christensen, R., and Johnson, W. (1996). A new perspective on generalized linear models. Journal of the American Statistical Association 91, 1450-1460.

Bedrick, E. J., Christensen, R., and Johnson, W. (1997). Bayesian binomial regression: Predicting survival at a trauma center. American Statistician 51, 211-218.

Berger, J. O., Betr6, B., Moreno, E., Pericchi, L. R., Rug- geri, F., Salinetti, G., and Wasserman, L. (eds). (1996). Bayesian Robustness. Hayward, California: Institute of Mathematical Statistics.

Breslow, N. E. and Day, N. E. (1980). Statistical Methods in Cancer Research, Volume I. The Analysis of Case- Control Studies. Lyon: IARC.

Breslow, N. E. and Day, N. E. (1987). Statistical Methods in Cancer Research, Volume II. The Design and Analysis of Cohort Studies. Lyon: IARC.

Breslow, N. E. and Clayton, D. G. (1993). Approximate infer- ence in generalized linear mixed models. Journal of the American Statistical Association 88, 9-25.

Carlin, B. P. and Louis, T. A. (1996). Bayes and Empirical- Bayes Data Analysis. New York: Chapman and Hall.

Clogg, C. C., Rubin, D. B., Schenker, N., Schultz, B., and Weidman, L. (1991). Multiple imputation of industry and occupation codes in census public-use samples us- ing Bayesian logistic regression. Journal of the American Statistical Association 86, 68-78.

Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. New York: Chapman and Hall.

Deely, J. E. and Lindley, D. V. (1981). Bayes empirical Bayes. Journal of the American Statistical Association 76, 833- 841.

Draper, D., Hodges, J. S., Mallows, C. L., and Pregibon, D. (1993). Exchangeability and data analysis. Journal of the Royal Statistical Society, Series A 156, 9-37.

Faraggi, D. and Simon, R. (1997). Large sample Bayesian in- ference on the parameters of the proportional hazards model. Statistics in Medicine 16, 2573-2585.

Gelman, A., Carlin, J. B., Stern, H. A., and Rubin, D. B. (1995). Bayesian Data Analysis. New York: Chapman and Hall.

Goldstein, H. (1995). Multilevel Statistical Models, 2nd edi- tion. London: Edward Arnold.

Good, I. J. (1983). Good Thinking. Minneapolis: University of Minnesota Press.

Greenland, S. (1992). A semi-Bayes approach to the analysis of correlated associations, with an application to an oc- cupational cancer-mortality study. Statistics in Medicine 11, 219-230.

Greenland, S. (1993). Methods for epidemiologic analyses of multiple exposures: A review and comparative study of maximum-likelihood, preliminary-testing, and empirical- Bayes regression. Statistics in Medicine 12, 717-736.

Greenland, S. (1997). Second-stage least squares versus pe- nalized quasi-likelihood for fitting hierarchical models in epidemiologic analyses. Statistics in Medicine 16, 515- 526.

Greenland, S. (1998). Probability logic and probabilistic in- duction. Epidemiology 9, 322-332.

Greenland, S. (2000). When should epidemiologic regressions use random coefficients? Biometrics 56, 915-921.

Greenland, S. and Christensen, R. (2001). Data augmentation priors for Bayesian and semi-Bayes analyses of condition- al-logistic and proportional hazards regression. Statistics in Medicine 20, in press.

Kass, R. E. and Steffey, E. (1989). Approximate Bayesian inference in conditionally independent hierarchical mod- els. Journal of the American Statistical Association 84, 717-726.

Landaw, E. M., Sampson, P. F., and Toporek, J. D. (1982). Advanced nonlinear regression in BMDP. In Proceedings of the Statistical Computing Section, 228-233. Washing- ton, D.C.: American Statistical Association.

Leamer, E. E. (1985). Sensitivity analyses would help. Amer- ican Economic Review 75, 308-313.

Leonard, T. and Hsu, J. S. J. (1999). Bayesian Methods. Cam- bridge: Cambridge University Press.

Lindley, D. V. (2000). The philosophy of statistics (with dis- cussion). The Statistician 49, 293-337.

Moolgavkar, S. H. and Venzon, D. J. (1987). General relative- risk regression models for epidemiologic studies. Ameri- can Journal of Epidemiology 126, 949-961.

Neutra, R. R., Fienberg, S. E., Greenland, S., and Friedman, E. A. (1977). The effect of fetal monitoring on neonatal death rates. New England Journal of Medicine 299, 324- 326.

Rios Insua, D. and Ruggeri, F. (2000). Robust Bayesian Anal- ysis. New York: Springer-Verlag.

Roecker, E. B. (1991). Prediction error and its estimation for subset-selected models. Technometrics 33, 459-468.

Rothman, K. J. and Greenland, S. (1998). Modern Epidemi- ology, 2nd edition. Philadelphia: Lippincott-Raven.

Tanner, M. A. and Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with dis- cussion). Journal of the American Statistical Association 82, 528-550.

Witte, J. S., Greenland, S., Kim, L. L., and Arab, L. K. (2000). Multilevel modeling in epidemiology with GLIM- MIX. Epidemiology 11, 684-688.

669

Page 9: Putting Background Information about Relative Risks into ... · maximum likelihood (ML) model, and the hydramnios coeffi- cient was even more inflated in the reduced model (Greenland,

Biometrics, September 2001

Wolfinger, R. and O'Connell, M. (1993). Generalized linear mixed models: A pseudo-likelihood approach. Journal of Statistical Computing and Simulation 48, 223-243.

Received January 2000. Revised November 2000.

Accepted January 2001.

670