1992 Smith Gelfand

Embed Size (px)

Citation preview

  • 8/13/2019 1992 Smith Gelfand

    1/6

    Bayesian Statistics without Tears: A Sampling-Resampling Perspective

    Author(s): A. F. M. Smith and A. E. GelfandSource: The American Statistician, Vol. 46, No. 2 (May, 1992), pp. 84-88Published by: American Statistical AssociationStable URL: http://www.jstor.org/stable/2684170

    Accessed: 14/08/2010 13:31

    Your use of the JSTOR archive indicates your acceptance of JSTOR's Terms and Conditions of Use, available at

    http://www.jstor.org/page/info/about/policies/terms.jsp. JSTOR's Terms and Conditions of Use provides, in part, that unless

    you have obtained prior permission, you may not download an entire issue of a journal or multiple copies of articles, and you

    may use content in the JSTOR archive only for your personal, non-commercial use.

    Please contact the publisher regarding any further use of this work. Publisher contact information may be obtained athttp://www.jstor.org/action/showPublisher?publisherCode=astata.

    Each copy of any part of a JSTOR transmission must contain the same copyright notice that appears on the screen or printed

    page of such transmission.

    JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range of

    content in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new forms

    of scholarship. For more information about JSTOR, please contact [email protected].

    American Statistical Associationis collaborating with JSTOR to digitize, preserve and extend access to The

    American Statistician.

    http://www.jstor.org/stable/2684170?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/action/showPublisher?publisherCode=astatahttp://www.jstor.org/action/showPublisher?publisherCode=astatahttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/stable/2684170?origin=JSTOR-pdf
  • 8/13/2019 1992 Smith Gelfand

    2/6

    Bayesian Statistics Without Tears:A Sampling-Resampling Perspective

    A. F. M. SMITH and A. E. GELFAND*

    Even to the initiated, statistical calculations based onBayes's Theorem can be daunting because of the nu-merical integrations required in all but the simplest ap-plications. Moreover, from a teaching perspective, in-troductions to Bayesian statistics-if they are given atall-are circumscribed by these apparent calculationaldifficulties. Here we offer a straightforward sampling-resampling perspective on Bayesian inference, whichhas both pedagogic appeal and suggests easily imple-mented calculation strategies.KEY WORDS: Bayesian inference; Exploratory dataanalysis; Graphical methods; Influence; Posterior dis-tribution; Prediction; Prior distribution; Random var-iate generation; Sampling-resampling techniques; Sen-sitivity analysis; Weighted bootstrap.

    1. INTRODUCTIONGiven data x obtained under a parametric model in-dexed by finite-dimensional 0, the Bayesian learningprocess is based on

    p(0Jx) = 1(0;X)P(0) (1.1)f 1(0;x)p(0) dOthe familiar form of Bayes's Theorem, relating the pos-terior distribution p(0lx) to the likelihood 1(0; x), andthe prior distribution isp(0). If 0 = (/, tf), with interestcentering on /, the joint posterior distribution is mar-ginalized to give the posterior distribution for /,

    p(OJX)= p(o, tfrx)dtf. (1.2)If summary inferences in the form of posterior expec-tations are required (e.g., posterior means and vari-ances), these are based on

    E[m(0)Jx] = f m(0)p(0Jx) dO, (1.3)for suitable choices of m(Q).

    Thus, in the continuous case, the integration oper-ation plays a fundamental role in Bayesian statistics,whether it is for calculating the normalizing constant in

    (1.1), the marginal distribution in (1.2), or the expec-tation in (1.3). However, except in simple cases, explicitevaluation of such integrals will rarely be possible, andrealistic choices of likelihood and prior will necessitatethe use of sophisticated numerical integration or ana-lytic approximation techniques (see, for example, Smithet al. 1985, 1987; Tierney and Kadane, 1986). This canpose problems for the applied practitioner seeking rou-tine, easily implemented procedures. For the student,who may already be puzzled and discomforted by theintrusion of too much calculus into what ought surelyto be a simple, intuitive, statistical learning process, thiscan be totally off-putting.In the following sections, we address this problem bytaking a new look at Bayes's Theorem from a sampling-resampling perspective. This will open the way to botheasily implemented calculations and essentially calculus-free insight into the mechanics and uses of Bayes'sTheorem.

    2. FROM DENSITIES TO SAMPLESAs a first step, we note the essential duality betweena sample and the density (distribution) from-which it isgenerated. Clearly, the density generates the sample;conversely, given a sample we can approximately re-create the density (as a histogram, a kernel density

    estimate, an empirical cdf, or whatever).Suppose we now shift the focus in (1.1) from densitiesto samples. In terms of densities, the inference processis encapsulated in the updating of the prior density p(0)to the posterior density p(OJx)through the medium ofthe likelihood function 1(0;x). Shifting to samples, thiscorresponds to the updating of a sample from p(O) toa sample from p(Olx) through the likelihood function1(0; x).In Section 3, we examine two resampling ideas thatprovide techniques whereby samples from one distri-bution may be modified to form samples from anotherdistribution. In Section 4, we illustrate how these ideasmay be utilized to modify prior samples to posteriorsamples, as well as to modify posterior samples arisingunder one model specification to posterior samples aris-ing under another. An illustrative example is providedin Section 5.

    3. TWO RESAMPLING METHODSSuppose that a sample of random variates is easilygenerated, or has already been generated, from a con-tinuous density g(0), but that what is really required isa sample from a density h(0) absolutely continuous with

    *A. F. M. Smith is Professor, Department of Mathematics, Im-perial College of Science Technology and Medicine, London SW72BZ, England. A. E. Gelfand is Professor, Department of Statistics,University of Connecticut, Storrs, CT 06269. The authors are gratefulto David Stephens for assistance with computer experiments. Hiswork and a visit to the United Kingdom by the second author weresupported by the U.K. Science and Engineering Council ComplexStochastic Systems Initiative.

    84 The American Statistician, May 1992, Vol. 46, No. 2 (? 1992 American StatisticalAssociation

  • 8/13/2019 1992 Smith Gelfand

    3/6

    respect to g(O). Can we somehow utilize the samplefrom g(O) to form a sample from h(O)? Slightly moregenerally, given a positive function f(O) which is nor-malizable to such a density h(O) = f(O)lff(O) dO, canwe form a sample from the latter given only a samplefrom g(O) and the functional form of f(O)?3.1 Random Variates via the Rejection Method

    In the case where there exists an identifiable constantM > 0 such that f(O)lg(O) < M, for all 0, the answer isyes to both questions, and the procedure is as follows:1. Generate 0 from g(0).2. Generate u from uniform (0, 1).3. If u c f(O)/Mg(O),accept 0; otherwise, repeat Steps1-3.

    Any accepted 0 is then a random variate from

    h(0) = f(O) f(o) do.The proof (see also Ripley 1986, p. 60) is straight-

    forward.LetSO= [(0, u): 0 c 00, u c f(O)/Mg(O)],

    and letS = [(0, u): U < f(0)/Mg(0)].

    Then the cdf of accepted 0, according to the precedingprocedure, isPr(0 c O, 0 accepted)

    fIs(i g(0) du dOIs g(O) du dOf(O) dO

    ff(o) dOIt follows that accepted 0 have density h(0) -cf(0).Hence, for a sample Oi, = 1, . . , n, from g(0), inresampling to obtain a sample from h(0) we will tendto retain those Oi or which the ratio of f relative to gis large, in agreement with intuition. The resulting sam-ple size is, of course, random, and the probability thatan individual item is accepted is given by

    Pr(0 accepted) = fIs g(0) du dO= M-1 ff(x) dx.

    The expected sample size for the resampled 0i'sis there-fore M1ln f> xf(x) dx.

    3.2 Random Variates via a Weighted BootstrapIn cases where the bound M required in the precedingprocedure is not readily available, we may still approx-imately resample from h(0) = f(0)/ff(0) dOas follows.

    Given Oi, i = 1, . . ., n, a sample from g, calculateCi = f(01)lg(01) and then

    / Iqi = (oi / EC).

    Draw 0* from the discrete distribution over {0l, .0,1}placing mass qi on Oi. Then 0* is approximatelydistributed according to h with the approximation "im-proving" as n increases. We provide a justification forthis claim in a moment. However, first note that thisprocedure is a variant of the by now familiar bootstrapresampling procedure (Efron 1982). The usual boot-strap provides equally likely resampling of the Oi,whilehere we have weighted resampling with weights deter-mined by the ratio of f:g, again in agreement with in-tuition. See also Rubin (1988), who referred to thisprocedure as SIR (sampling/importance resampling).Returning to our claim, suppose for convenience that0 is univariate. Under the customary bootstrap, 0* hascdf

    Pr(0* < a) - l(r a)( i) , Eg1(_ -,a)(0)

    g(0) dOso that 0* is approximately distributed as an observationfrom g(0). Similarly, under the weighted bootstrap, 0*has cdf

    Pr(0* ca) =Eqil ( - -a) (0O)i = 11 ~~~~~~~7f(0)n 1 ()il(--,)(0) l 2 f -,a(0)n go

    f O ) dO a= fah(O)dOff(0) dO

    so that 0 is approximately distributed as an observationfrom h. Note that the sample size under such resamplingcan be as large as desired. We mention one importantcaveat. The less h resembles g, the larger the samplesize n will need to be in order that the distribution of0* well approximates h.Finally, the fact that either resampling method allowsh to be known only up to proportionality constant (i.e.,only through f) is rucial, since in our Bayesian appli-cations we wish to avoid the integration required toThe American Statistician, May 1992, Vol. 46, No. 2 85

  • 8/13/2019 1992 Smith Gelfand

    4/6

    standardize f. (Although, from the preceding, we cansee that1 7

    i = 1provides a consistent estimator of the normalizingconstant

    ff(0) dOif such be required.)

    4. BAYESIAN CALCULATIONS VIASAMPLING-RESAMPLINGBoth methods of the previous section may be usedto resample the posterior (h) from the prior (g) andalso to resample a second posterior (h) from a first (g).In this section we give details of both applications.

    4.1 Prior to PosteriorHow does Bayes's Theorem generate a posterior sam-

    ple from a prior sample? For fixed x, define fv(0) =1(0; x)p(0). If 0 maximizes 1(0; x), let M = 1(0;x). Thenwith g(0) = p(O), we may immediately apply the re-jection method of Section 3.1 to obtain samples fromthe density corresponding to the standardized f, which,from (1.1), is preciselythe posteriordensityp(Olx).Thus,we see that Bayes's Theorem, as a mechanism for gen-erating a posterior sample from a prior sample, takesthe following simple form: for each 0 in the prior sampleaccept 0 into the posterior sample with probabilityfl(0) 1(0;x)

    Mp(0) 40; x)otherwise reject it.The likelihood therefore acts as a resampling prob-ability; those 0 in the prior sample having high likeli-hood are more likely to be retained in the posteriorsample. Of course, since p(Olx) oc 1(0, x)p(0), we canalso straightforwardly resample using the weightedbootstrap with

    qi = I(0i; X)/ E4I0; x).Several obvious uses of this sampling-resamplingperspective are immediate. Using large prior samplesand iterating the resampling process for successive in-dividual data elements for two-dimensional 0, sayprovides a simple pedagogic tool for illustrating thesequential Bayesian learning process, as well as the in-creasing concentration of the posterior as the amountof data increases. In addition, the approach providesnatural links with elementary graphical displays (e.g.,histograms, stem-and-leaf displays, boxplots to sum-marize univariate marginal posterior distributions, scat-terplots to summarize bivariate posteriors). In general,the translation from functions to samples provides a

    wealth of opportunities for creative exploration ofBayesian ideas and calculations in the setting of com-puter graphical and exploratory data analysis (EDA)tools.4.2 Posterior to Posterior

    An important issue in Bayesian inference is sensitivityof inferences to model specification. In particular, wemight ask:1. How does the posterior change if we change theprior?2. How does the posterior change if we change thelikelihood?

    In the density function/numerical integration setting,such sensitivity studies are rather off-putting, in thateach change of a functional input typically requires oneto carry out new calculations from scratch. This is notthe case with the sampling-resampling approach, as wenow illustrate in relation to the questions posed above.In comparing two models in relation to the secondquestion, we note that change in likelihood may arisein terms of:

    1. change in distributional specification with 0 re-taining the same interpretation, for example, a location2. change in data to a larger data set (prediction), asmaller data set (diagnostics), or a different data set(validation)To unify notation, we shall in either case denote twolikelihoods by ll(O) and 12(0). We denote two differentpriors to be compared in relation to the first questionby pl(O) and P2(O). For complete generality, we shallconsider changes to both / and p, although in any par-ticular application we would not typically change both.Denoting the corresponding posterior densities by i1(0),I52(O),we easily see that

    ,02(0) XC 2 PI(O). (4.1)li(O)Pi(O)Letting v(O) = 12(0)P2(0)l11(0)p1(O), we note that toimplement the rejection method for (4.1) requires

    sup v(O).0In many examples this will simplify to an easy calcu-lation. Alternatively, we may directlyapply the weightedbootstrap method taking g = I1(0), f = v(0)131(0),and)i = V(Oi). Resampled 0* will then be approximatelydistributed according to f standardized, which is pre-cisely P2(0).Again, different aspects of the sensitivity of the pos-teriors to changes in inputs are easily studied by graph-ical examination of the posterior samples.

    5. AN ILLUSTRATIVE EXAMPLETo illustrate the passage, via Bayes's Theorem, froma prior sample to a posterior sample, we consider a two-

    86 The American Statistician, May 1992, Vol. 46, No. 2

  • 8/13/2019 1992 Smith Gelfand

    5/6

    parameter problem first considered by McCullagh andNelder (1989, sec. 9.3.3).For i 1, 2, 3, suppose that Xi1, Binomial(ni1, 01)and Xi2 Binomial(ni2, 02), conditionally independentgiven 01, 02, with nil, ni2 specified. Suppose further thatthe observed random variables are Yi = X1, + Xi2, i =1, 2, 3, so that the likelihood for 01, 02 given Y1 = yl,Y2 = Y2 and Y3 = y3 takes the form:

    I1[I (n l) ni2)

    X 0j(1 - 01)nii -yio0i-ji(1 - 02)) i +ji

    wheremaxtO, i - ni2}

  • 8/13/2019 1992 Smith Gelfand

    6/6

    ginal posteriors for the parameters log [0i/(1- 0)], i =1, 2. The sample for 01, 02 translates directly into asample from the logit transformed parameters and re-sults in the summary picture given in Figure 3. We notethat the forms of joint posterior revealed in Figures 2and 3 are far from "nice" and would require subtlenumerical handling by the nonsampling approaches citedin the Introduction.So far as choice of sample sizes for initial and resam-ples is concerned, this will typically be a matter of ex-perimentation with particular applications, having inmind the level of "precision" required from pictorial ornumerical summaries. For example, in Figure 1 we dis-play 1,000 sample points as an effective pictorial rep-resentation. However, the resampling ratio needs to beat least 1 in 10, with some 2,000 points plotted in Figures2 and 3 to convey adequately these awkward posteriorforms. The actual generated sample from the prior thusneeded to be in excess of 20,000 points.Clearly, there is considerable scope for more refinedgraphical outputs in terms of joint density contours,kernel density curves, and so on. We encourage readers

    to be creative in fusing EDA and graphical techniques

    with the sample-based approach to formal inferencepresented here.[Received May 1990. Revised Januiary1991.]

    REFERENCESEfron, B. (1982), The Bootstrap, Jackknife and Other Resampling

    Plans, Philadelphia:Society of Industrialand Applied Mathematics.McCullagh, P., and Nelder, J. A. (1989), Generalized Linear Models(2nd ed.), London: Chapman and Hall.Ripley, B. (1986), Stochastic Simulation, New York: John Wiley.

    Rubin, D. B. (1988), "Using the SIR Algorithm to Simulate PosteriorDistributions," in Bayesian Statistics 3, eds. J. M. Bernardo,M. H. DeGroot, D. V. Lindley, and A. F. M. Smith, Cambridge,MA: Oxford University Press, pp. 395-402.

    Smith, A. F. M., Skene, A. M., Shaw, J. E. H., and Naylor, J. C.(1987), "Progress with Numerical and Graphical Methods forBayesian Statistics," The Statistician, 36, 75-82.

    Smith, A. F. M., Skene, A. M., Shaw, J. E. H., Naylor, J. C., andDransfield, M. (1985), "The Implementation of the Bayesian Par-adigm," Communications in Statistics, Par A - Theory and Meth-ods, 14, 1079-1102.Tierney, L., and Kadane, J. (1986), "Accurate Approximations forPosterior Moments and Marginal Densities," Journal of the Amer-ican StatisticalAssociation, 81, 82-86.

    88 The American Statistician, May 1992, Vol. 46, No. 2