On Plotting Renovated Samples

On Plotting Renovated SamplesAuthor(s): Peter J. SmithSource: Biometrics, Vol. 51, No. 3 (Sep., 1995), pp. 1147-1151Published by: International Biometric SocietyStable URL: http://www.jstor.org/stable/2533014 .

Accessed: 25/06/2014 09:07

Your use of the JSTOR archive indicates your acceptance of the Terms & Conditions of Use, available at .http://www.jstor.org/page/info/about/policies/terms.jsp

.JSTOR is a not-for-profit service that helps scholars, researchers, and students discover, use, and build upon a wide range ofcontent in a trusted digital archive. We use information technology and tools to increase productivity and facilitate new formsof scholarship. For more information about JSTOR, please contact [email protected].

.

International Biometric Society is collaborating with JSTOR to digitize, preserve and extend access toBiometrics.

http://www.jstor.org

This content downloaded from 185.44.78.105 on Wed, 25 Jun 2014 09:07:05 AMAll use subject to JSTOR Terms and Conditions

http://www.jstor.org/action/showPublisher?publisherCode=ibs

http://www.jstor.org/stable/2533014?origin=JSTOR-pdf

http://www.jstor.org/page/info/about/policies/terms.jsp


BIOMETRICS 51, 1147-1151 September 1995

On Plotting Renovated Samples

Peter J. Smith

Department of Statistics and Operations Research, Royal Melbourne Institute of Technology,

G.P.O. Box 2476V, Melbourne, Victoria, Australia 3001

SUMMARY

In this note we use the Buckley-James method for censored regression in the p sample problem where the samples are subject to right-censoring. The samples are reconstructed so as to remove the effect of censoring, and graphical procedures based on quantiles (such as boxplots) may then be used as a standard data-analytic tool to describe the variable being measured.

1. Introduction When two censored samples are to be compared, a commonly used initial graphical approach is to place the product-limit suivival curves on the same diagram. In such plots, it may be difficult to visually separate the curves. To facilitate direct visual communication of the information contained in censored data, Gentleman and Crowley (1992) show how to construct rank plots, QQ plots and comparative boxplots; the focus is on the plot as a functional of the product-limit estimator.

In this paper, we take the approach of reniovatinlg the data to provide a view of what the response would be like had it been unaffected by censoring. We apply renovated scatterplots (Smith and Zhang, 1995) to thep sample problem using the Buckley-James method (Buckley and James, 1979; Miller and Halpern, 1982; James and Smith, 1984; Lai and Ying, 1991; Lin and Wei, 1992a; Hillis, 1993) for regression with censored data. We note that the linear model is often appropriate when the response is measured on the logarithm scale (Buckley and James, 1979).

2. Right-censoring Suppose that the outcomes Y' Y2, .I . Y , y,, of a positive-valued right-censored response variable Y comprise p = 2 samples, of combined size n, with group membership held by the covariate

{I if zi is from Sample 1; xi - 0 if zi is from Sample 2.

This means that instead of observing the outcome yi directly, we observe the data

(XI, z1, 8k), (x2, z2, 8), . *, (x,I , zIt, 8,,), (1)

where zi = min4yi, ti} denote the observed responses defined in terms of ti, the censor time associated with Yi, and the censor indicators

fI if Yi < tj; = l0 other-wise

return the value 1 only when yi is observed exactly (uncensored). We assume that survival is independent of the causes of censoring and that the censor times are fixed (as is likely for the case for an experiment termination date). Our methods are also applicable to left-censor-ing, which is a special case of right-censoring with the response axis reversed (Turnlbull, 1974).

3. Data Renovation When the censor indicator 8 is interpreted as a plot symbol, the data (xi, zi, 6i)i = 1, 2, 3, ..., ii,

may be depicted in a scatterplot composed of censored points (xi, zi, o) and uncensored poinlts

Key words. Buckley-James estimator; Censored regression; p sample problem; Right-censorinlg. 1147



1148 Biometrics, Septem1ber 1995

(x1, z1, ) in two lines of dots standing above the respective covariate values. When the scatterplot is used as a guide for the effect of the explanatory variable on the response, the points are in the "wrong place": the plotted points are lower than they would likely be in the absence of censorinig. We lift the positionls of these points by using the Buckley-James method which we nlow describe in an easily progr-amnmed matrix formulation.

For the p sample problem we fit the model

Y = X/3 + R (2)

in terms of: (p - 1) parameters (8; matrix X of order n x (p - 1); residual vector R = (RI, R2, *.. R,)7, where Ri are independenit and identically distributed with mean a, finite variance, ancl common unknown distribution function F = 1 - S. Censored points in the scatterplot are replaced by their estimated conditional expected values by using a weighted linear combination of observed ranked residuals E(b) = (e,(b), e,(b), ..., el,(b))T = Z - Xb from a fitted line of "slope" b from the observed data Z = (ZI, Z2, ... , Z,j)7. Let F = 1 - S denote the product-limit estimator based on the observecl residual vector E(b). The weights (Buckley and James, 1979) used in the linear combinationi are defined by

lF(ek(b))6(k - 8i)

'ik(b) = S(ei(b)) if ek(b) > (3)

10 otherwise.

dF(r) If the true parameter were known, these weights would estimate in the equation

E[R2jRi > 1] = r'

so that the conditional expectation E(RiRi > ej(,l)) is estimated as a linear combination of XIl=> ek(b)wik(b) when b is near 38.

In a multivariate setting, such as for comparing p > 2 samples, the Buckley-James method consists of determining an iterative solution b = (,, to the equation

(XTX)-'XTY*(b) - b = 0 (4)

through the renovated responises Y*(b) = Xb + VW(b)(Z - Xb), where ( 1 w 12(b) 13(b) ... w l,/(b)

0 b w2 3(b) ... * 21(b) W(b)=

0 0 0 * ),,(b) 0 0 0 /

is the renovationi weight matrix containing the censor indicators on the main diagonal (Smith and Zhang, 1995). A "solution," b = ,, is reached iteratively when the norm of the left-side of (4) is minimum.

In a univariate setting for comparing p = 2 samples, the equation in (4) may be easily solved

E ix- (-) -Y )Yj'(b) iteratively: , is the solution to - b = 0; &-,, is then the mean of the resulting

partial residuals yj(,8j) - ,3,xi. This gives 3 as a Buckley-James estimator of 3. Distributional properties of /, are succilnctly outlined in Lin and Wei (1992b).

4. Plots of Renovated Data Once a Buckley-James solution b = f3,, has been found in the least squares two-sample problem, then the data may be "'renovated": by a renlov-ated dlotplot we mean a plot of (X, Y*, 6), wheret

= *() X(3 1)X,, + 914( Z - ,l,/ ( 5)

14/ = W1(fl/), and S contains the plot symbols for uncensored points and renovated points.



Plotting Renovated Samples 1149

Notice that when the Buckley-James method provides a unique solution, f,, may then be written as f,B, = (XTWX)- IXTWY* (Smith and Zhanig, 1995) and is the least squares estimator of 38 usinlg clata from the renovated dotplot. Standard boxplot comparisons may then take place on the renovated data (X, Y*, 5).

The reniovated data may be usefully employed in QQ plots to detect an unlder-lying distribution for the respoinse. Importantly, after renovation, for each of thep samples we may produce plots of the empirical survivor function S,(u), defined as the fraction of the in within-sample data exceeding 11. Wheni the linear model is appropriate, the consistency of the Buckley-James estimators implies that the renovatecl points will provide a guide to the shape of the suivival funlctioll for each group.

5. Comparative Boxplots: An Example Lawless (1982), Gehan (1965), aind others have discussed data from a clinical trial examining steroid induced remission times (weeks) for leukemia patients. One group of 21 patients were given 6-mercaptopurliie (6-MP); a second group of 21 patients were given a placebo. Since the trial lasted 1 year and patients were admitted to the trial during the year, right-censoring occurred at the cut-off date wheni some patients were still in remission. Observations logf,Z on log remission time logt,Y are as follows:

6-MP: 1.79 1.79 1.79 1.79+ 1.95 2.2(0+ 2.30 2.30+ 2.40+ 2.56 2.77 (Gr-oup 1) 2.83+ 9.94+ 3.00+ 3.09 3.14 3.22+ 3.47+ 3.47+ 3.52+ 3.56+ Placebo: .00 .00 .69 .69 1.10 1.39 1.39 1.61 1.61 2. 08 2.18 (Group 0) 2.08 208 240 2 4 _48 2.48 2.71 2. 83 3.0)9 3.14

The '+' denotes right censoring in the 6-MP group, so that 6+ repr-esenits an observed 6-week remission which was still in effect at the closure of the trial.

0~~~~~~~~ C

1)4

0 E~~~~~~~~~ o co o** _:

E

E c\ 0 CMj 6 0 C

0

LC

CQ Q

0 1 0 1

Group Group Figure 1. Dotplots of original and renovated data for leukemia log remission times.

The Buckley-James method provides an exact solution to the model parametei-s in this two- sample problem on the logarithm scale with covariate x = 1 for 6-MP; x = 0 for placebo. The logarithmic transformation has the effect of stabilising the variance in the two groups being comiipar-ed. In regression terminiology, the appropriateness of the linear model is important since, for an exact solution, the least squares line for the renovated data and the BuLckley-James line for the pre-reniovat ion data coincide (Smith and Zhang, 1995). The renovated log(,Y data are:

6-MP: 1.79 1.79 1.79 3.37 1.95 3.50 2.3() 3.53 3.53 2.56 2.77 (Group 1) 3.72 3.79 3.79 3.09 3.14 3.87 4.022 4.02 4.02 4.12 Placebo: .00 .00 .69 .69 1.10 1.39 1.39 1.61 1.61 2.08 2.08 (Group ()) 2.0)8 2.0?8 2.40) 2.40) 2.48 2.484 2.71 2.83 3.()9 3.14

Dotplots of the original data and the renovated data are given in Figure l, where mulltiple points have been overwritten by a single plot symbol. Notice that it is the smallest censor times which receive the greatest renovation; the censored observation at 1.79 log weeks is renovated to over 3.37



1150 Bioinetrics, Septemiber 1995

log weeks. In general, cenisored poinits with the most negative residuals are relnovated towards thc mean. The renovated dotplot is a view of the patients' survival if no censor-ing was prcsent. SuchI renovation produces a change of rank order in the data set, with wider spread apparent in the 6-MP data in Figure 1.

CO

0

E c\j

-

Placebo 6MP-PL 6MP-Renovate

Figure 2. Boxplots of renovated data for- leukemia log r-emissioni times for- 'PIlacebo' and '6MP- Renovate'; for comparison, '6MP-PL' is a Gentleman and Crowley (19922) boxplot of thle 6-MP

data based on the PL-estimator.

Having established the new r-anks in the r-enovation process, boxplot compar-isons of the two groups may proceed. In particular, in Figure 2, we concentr-ate oni the 6-MP data wherie the r-ight-cenisor-ing occurs. The boxplot method of Gentleman and Crowley (1992), whichi is based oni inver-ting the product-limit estimator- to locate appropriate quantiles for the boxplot display labelled "6MP-PL", is compared directly with the renovated boxplot labelled "6MP-Renovate". Thle differ-ence between the two plots is par-tially caused by the lar-ge pr-oportionl of cenlsor-ec data at the top of the 6-MP distribution leavinig the pr-oduct-limit estimator apparently "hanging" befor-e its conventional assignmenit to zero beyond the largest observation (Efroni, 1967; Miller, 1981).

. _ __ . _ _ _ _ _ _ _ _ _ _

----6MP-PL Llacebo 6MP-PL 6MP-Renovate

C: co

-CD

(Id C\b

0 1 2 3 4 5

Log remission

Figure 3. Compariisons for the 6-MP group: empirical survivor fuinction for raenovated log lifevtiiet (dshe lines); produ ct-limit estimatonr fora lnog lifetime (dotpted lines).

This effec is deonstated dineFigurei3, whereonovthed logxscle,th prollduct-limitestimator' SThe represented byte cothed line andt is panretimator ofse Sby the surviva funcotion forfo remission. Onta the same logfcae the empP isricaburiorfntion Sevig h prdc-isi rpestiaopaentebycahdlins.A ''ani' ceorsequenc



Plotting Renovated Samples 1151

of the final residual rankings of both groups combined, the renovation process moves some censored times beyond the largest pre-renovation observation in the 6-MP group thereby reducing the "hanging effect" of the estimated survival function at the top of the distribution. For the empirical survivor function, the points of discontinuity occur at every distinct renovated data point. In comparison, because of the redistribute-to-the-right algorithm (Efron, 1967), the jump sizes at the points of discontinuity of 5* increase towards the top of the distribution.

Notice that generally ST, depends on the censoring pattern in both the samples being compared, whereas 5* is determined from a single sample. However, when the linear model is appropriate, the consistency of the Buckley-James estimators implies that, provided that the expected number of censored observations and uncensored observations is large over the support of the survival distribution (Meier, 1975), both 5* and ST* are uniformly consistent estimators of the same survival function. For moderate sample sizes, when the linear model is appropriate, the graph of the empirical survivor function of the renovated data provides an alternative to the graph of the product-limit estimator on the observed data.

ACKNOWLEDGEMENTS

Part of the research in this work was undertaken while the author was visiting the Department of Statistics, University of Auckland, New Zealand. The author thanks the referees for their editorial comments and suggestions.

RESUME

Dans cette note, nous utilisons la m6thode de Buckley-James pour la regression censuree dans un probleme de p echantillons quand les echantillons sont censures a droite. Les echantillons sont reconstitu6s pour eliminer l'effet de la censure et des methodes graphiques bas6es sur les quantiles (telles que les "boxplots") peuvent etre utilis6es comme m6thodes standards d'analyse de donn6es pour d6crire la variable qui est mesuree.

REFERENCES

Buckley, J. J. and James, I. R. (1979). Linear regression with censored data. Biometrika 66, 429-436.

Efron, B. (1967). The two sample problem with censored data. Proceedings of the Fifth Berkeley Symposium 4, 831-853.

Gehan, E. A. (1965). A generalised Wilcoxon test for comparing arbitrarily singly-censored samples. Biometrika 52, 203-223.

Gentleman, R. and Crowley, J. J. (1992). A graphical approach to the analysis of censored data. Breast Cancer Research and Treatment 22, 229-240.

Hillis, S. L. (1993). A comparison of three Buckley-James variance estimators. Communications in Statistics B 22, 955-973.

James, I. R. and Smith, P. J. (1984). Consistency results for linear regression with censored data. Annals of Statistics 12, 590-600.

Lai, T. L. and Ying, Z. (1991). Large sample theory of a modified Buckley-James estimator for regression analysis with censored data. Annals of Statistics 19, 1370-1402.

Lawless, J. F. (1982). Statistical Models and Methods for Lifetime Data. New York: Wiley. Lin, J. S. and Wei, L. J. (1992a). Linear regression analysis for multivariate failure time observa-

tions. Journal of the American Statistical Association 87, 1091-1097. Lin, J. S. and Wei, L. J. (1992b). Regression analysis based on Buckley-James estimating equation.

Biometrics 48, 679-681. Meier, P. (1975). Estimation of a distribution function from incomplete observations. In Perspec-

tives in Probability and Statistics, J. Gani (ed), 67-87. New York: Academic Press. Miller, R. G. (1981). Survival Analysis. New York: Wiley. Miller, R. G. and Halpern, J. (1982). Regression with censored data. Biornetrika 69, 521-531. Smith, P. J. and Zhang, J. (1995). Renovated scatterplots for censored data. Biometrika 82, 447-452. Turnbull, B. W. (1974). Nonparametric estimation of a survivorship function with doubly censored

data. Journal of the American Statistical Association 69, 169-173.

Received November .1993; revised February 1995; accepted March 1995.



Documents

On Plotting Renovated Samples