On Plotting Renovated Samples

Embed Size (px)

Text of On Plotting Renovated Samples

  • On Plotting Renovated SamplesAuthor(s): Peter J. SmithSource: Biometrics, Vol. 51, No. 3 (Sep., 1995), pp. 1147-1151Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2533014 .Accessed: 25/06/2014 09:07

    Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

    .JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact support@jstor.org.

    .

    International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

    http://www.jstor.org

    This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/action/showPublisher?publisherCode=ibshttp://www.jstor.org/stable/2533014?origin=JSTOR-pdfhttp://www.jstor.org/page/info/about/policies/terms.jsphttp://www.jstor.org/page/info/about/policies/terms.jsp

  • BIOMETRICS 51, 1147-1151 September 1995

    On Plotting Renovated Samples

    Peter J. Smith

    Department of Statistics and Operations Research, Royal Melbourne Institute of Technology,

    G.P.O. Box 2476V, Melbourne, Victoria, Australia 3001

    SUMMARY

    In this note we use the Buckley-James method for censored regression in the p sample problem where the samples are subject to right-censoring. The samples are reconstructed so as to remove the effect of censoring, and graphical procedures based on quantiles (such as boxplots) may then be used as a standard data-analytic tool to describe the variable being measured.

    1. Introduction When two censored samples are to be compared, a commonly used initial graphical approach is to place the product-limit suivival curves on the same diagram. In such plots, it may be difficult to visually separate the curves. To facilitate direct visual communication of the information contained in censored data, Gentleman and Crowley (1992) show how to construct rank plots, QQ plots and comparative boxplots; the focus is on the plot as a functional of the product-limit estimator.

    In this paper, we take the approach of reniovatinlg the data to provide a view of what the response would be like had it been unaffected by censoring. We apply renovated scatterplots (Smith and Zhang, 1995) to thep sample problem using the Buckley-James method (Buckley and James, 1979; Miller and Halpern, 1982; James and Smith, 1984; Lai and Ying, 1991; Lin and Wei, 1992a; Hillis, 1993) for regression with censored data. We note that the linear model is often appropriate when the response is measured on the logarithm scale (Buckley and James, 1979).

    2. Right-censoring Suppose that the outcomes Y' Y2, .I . Y , y,, of a positive-valued right-censored response variable Y comprise p = 2 samples, of combined size n, with group membership held by the covariate

    {I if zi is from Sample 1; xi - 0 if zi is from Sample 2.

    This means that instead of observing the outcome yi directly, we observe the data

    (XI, z1, 8k), (x2, z2, 8), . *, (x,I , zIt, 8,,), (1)

    where zi = min4yi, ti} denote the observed responses defined in terms of ti, the censor time associated with Yi, and the censor indicators

    fI if Yi < tj; = l0 other-wise

    return the value 1 only when yi is observed exactly (uncensored). We assume that survival is independent of the causes of censoring and that the censor times are fixed (as is likely for the case for an experiment termination date). Our methods are also applicable to left-censor-ing, which is a special case of right-censoring with the response axis reversed (Turnlbull, 1974).

    3. Data Renovation When the censor indicator 8 is interpreted as a plot symbol, the data (xi, zi, 6i)i = 1, 2, 3, ..., ii, may be depicted in a scatterplot composed of censored points (xi, zi, o) and uncensored poinlts

    Key words. Buckley-James estimator; Censored regression; p sample problem; Right-censorinlg. 1147

    This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsp

  • 1148 Biometrics, Septem1ber 1995

    (x1, z1, ) in two lines of dots standing above the respective covariate values. When the scatterplot is used as a guide for the effect of the explanatory variable on the response, the points are in the "wrong place": the plotted points are lower than they would likely be in the absence of censorinig. We lift the positionls of these points by using the Buckley-James method which we nlow describe in an easily progr-amnmed matrix formulation.

    For the p sample problem we fit the model

    Y = X/3 + R (2)

    in terms of: (p - 1) parameters (8; matrix X of order n x (p - 1); residual vector R = (RI, R2, *.. R,)7, where Ri are independenit and identically distributed with mean a, finite variance, ancl common unknown distribution function F = 1 - S. Censored points in the scatterplot are replaced by their estimated conditional expected values by using a weighted linear combination of observed ranked residuals E(b) = (e,(b), e,(b), ..., el,(b))T = Z - Xb from a fitted line of "slope" b from the observed data Z = (ZI, Z2, ... , Z,j)7. Let F = 1 - S denote the product-limit estimator based on the observecl residual vector E(b). The weights (Buckley and James, 1979) used in the linear combinationi are defined by

    lF(ek(b))6(k - 8i)

    'ik(b) = S(ei(b)) if ek(b) > (3)

    10 otherwise.

    dF(r) If the true parameter were known, these weights would estimate in the equation

    E[R2jRi > 1] = r'

    so that the conditional expectation E(RiRi > ej(,l)) is estimated as a linear combination of XIl=> ek(b)wik(b) when b is near 38.

    In a multivariate setting, such as for comparing p > 2 samples, the Buckley-James method consists of determining an iterative solution b = (,, to the equation

    (XTX)-'XTY*(b) - b = 0 (4)

    through the renovated responises Y*(b) = Xb + VW(b)(Z - Xb), where ( 1 w 12(b) 13(b) ... w l,/(b) 0 b w2 3(b) ... * 21(b)

    W(b)= 0 0 0 * ),,(b) 0 0 0 /

    is the renovationi weight matrix containing the censor indicators on the main diagonal (Smith and Zhang, 1995). A "solution," b = ,, is reached iteratively when the norm of the left-side of (4) is minimum.

    In a univariate setting for comparing p = 2 samples, the equation in (4) may be easily solved

    E ix- (-) -Y )Yj'(b) iteratively: , is the solution to - b = 0; &-,, is then the mean of the resulting

    partial residuals yj(,8j) - ,3,xi. This gives 3 as a Buckley-James estimator of 3. Distributional properties of /, are succilnctly outlined in Lin and Wei (1992b).

    4. Plots of Renovated Data Once a Buckley-James solution b = f3,, has been found in the least squares two-sample problem, then the data may be "'renovated": by a renlov-ated dlotplot we mean a plot of (X, Y*, 6), wheret

    = *() X(3 1)X,, + 914( Z - ,l,/ ( 5)

    14/ = W1(fl/), and S contains the plot symbols for uncensored points and renovated points.

    This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions

    http://www.jstor.org/page/info/about/policies/terms.jsp

  • Plotting Renovated Samples 1149

    Notice that when the Buckley-James method provides a unique solution, f,, may then be written as f,B, = (XTWX)- IXTWY* (Smith and Zhanig, 1995) and is the least squares estimator of 38 usinlg clata from the renovated dotplot. Standard boxplot comparisons may then take place on the renovated data (X, Y*, 5).

    The reniovated data may be usefully employed in QQ plots to detect an unlder-lying distribution for the respoinse. Importantly, after renovation, for each of thep samples we may produce plots of the empirical survivor function S,(u), defined as the fraction of the in within-sample data exceeding 11. Wheni the linear model is appropriate, the consistency of the Buckley-James estimators implies that the renovatecl points will provide a guide to the shape of the suivival funlctioll for each group.

    5. Comparative Boxplots: An Example Lawless (1982), Gehan (1965), aind others have discussed data from a clinical trial examining steroid induced remission times (weeks) for leukemia patients. One group of 21 patients were given 6-mercaptopurliie (6-MP); a second group of 21 patients were given a placebo. Since the trial lasted 1 year and patients were admitted to the trial during the year, right-censoring occurred at the cut-off date wheni some patients were still in remission. Observations logf,Z on log remission time logt,Y are as follows:

    6-MP: 1.79 1.79 1.79 1.79+ 1.95 2.2(0+ 2.30 2.30+ 2.40+ 2.56 2.77 (Gr-oup 1) 2.83+ 9.94+ 3.00+ 3.09 3.14 3.22+ 3.47+ 3.47+ 3.52+ 3.56+ Placebo: .00 .00 .69 .69 1.10 1.39 1.39 1.61 1.61 2. 08 2.18 (Group 0) 2.08 208 240 2 4 _48 2.48 2.71 2. 83 3.0)9 3.14

    The '+' denotes right censoring in the 6-MP group, so that 6+ repr-esenits an observed 6-week remission which was still in effect at the closure of the trial.

    0~~~~~~~~ C

    1)4

    0 E~~~~~~~~~ o co o** _:

    E

    E c\ 0 CMj 6 0 C

    0

    LC

    CQ Q

    0 1 0 1

    Group Group Figure 1. Dotplots of original and renovated data for leukemia log remission times.

    The Buckley-James method provides an exact solution to the model parametei-s in this two- sample problem on the logarithm scale with covariate x = 1 for 6-MP; x = 0 for placebo. The logarithmic trans