11
http://ctj.sagepub.com/ Clinical Trials http://ctj.sagepub.com/content/early/2013/07/01/1740774513493150 The online version of this article can be found at: DOI: 10.1177/1740774513493150 published online 1 July 2013 Clin Trials Richard E Kennedy, Kofi P Adragni, Hemant K Tiwari, Jenifer H Voeks, Thomas G Brott and George Howard Risk-stratified imputation in survival analysis - Jul 29, 2013 version of this article was published on more recent A Published by: http://www.sagepublications.com On behalf of: The Society for Clinical Trials can be found at: Clinical Trials Additional services and information for http://ctj.sagepub.com/cgi/alerts Email Alerts: http://ctj.sagepub.com/subscriptions Subscriptions: http://www.sagepub.com/journalsReprints.nav Reprints: http://www.sagepub.com/journalsPermissions.nav Permissions: What is This? - Jul 1, 2013 OnlineFirst Version of Record >> - Jul 29, 2013 Version of Record at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014 ctj.sagepub.com Downloaded from at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014 ctj.sagepub.com Downloaded from

Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

  • Upload
    others

  • View
    3

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

http://ctj.sagepub.com/Clinical Trials

http://ctj.sagepub.com/content/early/2013/07/01/1740774513493150The online version of this article can be found at:

 DOI: 10.1177/1740774513493150

published online 1 July 2013Clin TrialsRichard E Kennedy, Kofi P Adragni, Hemant K Tiwari, Jenifer H Voeks, Thomas G Brott and George Howard

Risk-stratified imputation in survival analysis  

- Jul 29, 2013version of this article was published on more recent A

Published by:

http://www.sagepublications.com

On behalf of: 

  The Society for Clinical Trials

can be found at:Clinical TrialsAdditional services and information for    

  http://ctj.sagepub.com/cgi/alertsEmail Alerts:

 

http://ctj.sagepub.com/subscriptionsSubscriptions:  

http://www.sagepub.com/journalsReprints.navReprints:  

http://www.sagepub.com/journalsPermissions.navPermissions:  

What is This? 

- Jul 1, 2013OnlineFirst Version of Record >>  

- Jul 29, 2013Version of Record

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 2: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

CLINICALTRIALS

ARTICLE Clinical Trials 2013; 0: 1–10

Risk-stratified imputation in survival analysis

Richard E Kennedya, Kofi P Adragnib, Hemant K Tiwaria, Jenifer H Voeksc, Thomas G Brottd

and George Howarda

Background Censoring that is dependent on covariates associated with survival

can arise in randomized trials due to changes in recruitment and eligibility criteria to

minimize withdrawals, potentially leading to biased treatment effect estimates.

Imputation approaches have been proposed to address censoring in survival analy-

sis; while these approaches may provide unbiased estimates of treatment effects,

imputation of a large number of outcomes may over- or underestimate the asso-

ciated variance based on the imputation pool selected.

Purpose We propose an improved method, risk-stratified imputation, as an alterna-

tive to address withdrawal related to the risk of events in the context of time-to-

event analyses.

Methods Our algorithm performs imputation from a pool of replacement subjects

with similar values of both treatment and covariate(s) of interest, that is, from a risk-

stratified sample. This stratification prior to imputation addresses the requirement of

time-to-event analysis that censored observations are representative of all other

observations in the risk group with similar exposure variables. We compared our

risk-stratified imputation to case deletion and bootstrap imputation in a simulated

dataset in which the covariate of interest (study withdrawal) was related to treat-

ment. A motivating example from a recent clinical trial is also presented to demon-

strate the utility of our method.

Results In our simulations, risk-stratified imputation gives estimates of treatment

effect comparable to bootstrap and auxiliary variable imputation while avoiding

inaccuracies of the latter two in estimating the associated variance. Similar results

were obtained in analysis of clinical trial data.

Limitations Risk-stratified imputation has little advantage over other imputation

methods when covariates of interest are not related to treatment. Risk-stratified

imputation is intended for categorical covariates and may be sensitive to the width

of the matching window if continuous covariates are used.

Conclusions The use of the risk-stratified imputation should facilitate the analysis of

many clinical trials, in which one group has a higher withdrawal rate that is related

to treatment. Clinical Trials 2013; 0: 1–10. http://ctj.sagepub.com

Introduction

A central assumption of survival analysis is that cen-soring is independent of exposure variables

associated with survival, implying that the exposurevariable for individuals who are censored is

aDepartment of Biostatistics, School of Public Health, University of Alabama at Birmingham, Birmingham, AL, USA,bDepartment of Mathematics and Statistics, University of Maryland, Baltimore, MD, USA, cDepartment of Neurosciences,Medical University of South Carolina, Charleston, SC, USA, dDepartment of Neurology, Mayo Clinic, Jacksonville, FL,USAAuthor for correspondence: George Howard, Department of Biostatistics, School of Public Health, University of Ala-bama at Birmingham, 1665 University Boulevard, Birmingham, AL 35294-0022, USA.Email: [email protected]

� The Author(s), 2013Reprints and permissions: http://www.sagepub.co.uk/journalsPermissions.nav 10.1177/1740774513493150

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 3: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

representative of all other individuals in the riskgroup [1]. This assumption is not always met in theanalysis of time-to-event randomized clinical trials,in which censoring is dependent on covariates thatare also associated with survival, such as sympto-matic status.

Censored data in survival analysis are comparableto missing (or coarsened) data in other types of ana-lyses. Multiple imputation is a powerful method fordealing with missing or coarsened data that areapplicable in a variety of contexts [2]. Taylor et al.[3] developed a modification of multiple imputationspecific to survival analysis. Under their method,the follow-up time and outcome (event/non-eventat that follow-up time) for a censored individual israndomly selected from other individuals in the riskset at the time of censoring, a form of hot-deckimputation [4]. Imputation proceeds from the parti-cipant with the smallest censoring time to the lar-gest censoring time, so that an observation that isimputed as censored is not subsequently imputedagain (regardless of whether the selected observationin the hot-deck process was censored at a later time),and imputed values are not included in the risk setfor imputation of other censored individuals. Multi-ple imputation is achieved by repeatedly generatingbootstrap samples from the original dataset, witheach bootstrap sample undergoing the imputationprocedure. The multiple datasets are then combinedusing standard formulae. Hsu et al. [5] modified thisapproach to include auxiliary variables in the impu-tation process. Their method focuses on continuousauxiliary variables, defining two risk scores based onthe relationship of the auxiliary variables to the fail-ure time and the censoring time. These risk scoresare then used to define a neighborhood around theindividual to be imputed, from which the imputa-tion values are randomly selected using the methodof Taylor et al. [3]

Multiple imputation also depends critically onthe data coarsening mechanism, which is usuallyassumed to be coarsened completely at random(CCAR) or coarsened at random (CAR) using the ter-minology of Heitjan and Rubin [6] and Tsiatis [7].Violations of these assumptions would presumablylead to biased estimates under multiple imputation,although it is often difficult to identify such viola-tions in practice [8,9]. In this article, we describe amotivating example from the Carotid Revasculariza-tion Endarterectomy versus Stenting Trial (CREST)[10], where the withdrawal mechanism was relatedto measured exposure variables, corresponding tothe CAR mechanism. In contrast to the approach ofTaylor et al. [3], we distinguish between censoredobservations, in which the event does not occurduring the study period (type 1 censoring), andwithdrawals, in which the subject terminates parti-cipation prior to the end of the study period (type 2

and random censoring). We then consider two fac-tors as potentially influencing the likelihood of cen-soring: (1) the factor of interest (i.e., the ‘treatment’)that is the focus of the randomized trial, and (2) a‘nuisance’ covariate that is not the focus of the ran-domized trial, but which may be associated with therisk of outcome events.

We envision five potential scenarios:

1. Neither differences in outcome nor the risk ofwithdrawal was related to treatment or thecovariate.

2. Treatment, but not the covariate, is related tooutcome events, but neither the treatment norcovariate affects the likelihood of withdrawal.

3. Both treatment and the covariate are related tooutcome events, but the likelihood of withdra-wal is not related to either treatment or thecovariate.

4. Both treatment and the covariate are related tothe outcome, and withdrawal was related totreatment but not the covariate.

5. Both the treatment and the covariate are relatedto both the outcome and withdrawal.

The first three scenarios correspond to a CCARmechanism, while the last two correspond to a CARmechanism, so that sensitivity of the methods tothe analytical assumptions can be evaluated. Stan-dard analytical approaches for survival analysis inclinical trials assume one of the first three cases, spe-cifically that censoring is not related to the treat-ment. The imputation methods such as those ofTaylor et al. [3] are readily applied to the first fourcases. However, the last case was not addressed bythe approach of Taylor, and his proposed bootstrapimputation may not be suitable. In general, the useof hot-deck imputation for a large number of indivi-duals can underestimate the variance, as the rangeof sample variability is restricted by the variabilityin the replacement set [11]. This underestimationwould be accentuated in a hot-deck procedure thatdoes not properly account for the effects of the cov-ariate on the outcome and censoring. Alternatively,the imputation of a large number of outcomes mayoverestimate the variance if the pool from whichthe imputed values are drawn is not sufficientlysimilar to the individual whose values are beingimputed.

Herein, we describe the properties of a multipleimputation approach, risk-stratified imputation, thatwe demonstrate to have the potential to reduce orremove the bias arising from withdrawals related tomeasured exposure variables, as well as the over- orunderestimation of the variance in other imputationapproaches. This is a specific implementation, focus-ing on categorical covariates, of the multiple impu-tation procedures described in the recent National

2 RE Kennedy et al.

Clinical Trials 2013; 0: 1–10 http://ctj.sagepub.com

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 4: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

Research Council report on missing data [12] appliedto withdrawals in clinical trials. In addition, becauserandomization implies equal exposure to the chan-ging risk of events introduced through changes ineligibility, the approach can remove the bias intro-duced by both measured and unmeasured character-istics of eligibility (i.e., it is not required to identifysymptomatic status as the factor that potentiallychanges the risk of events over the course of thetrial). Hence, as shown in later sections, the use ofour approach can potentially correct for biases intro-duced by desired changes to reduce withdrawals,and changes in eligibility criteria when the associa-tion with the risk of events is not fully described,and thereby reduce or avoid a systematic bias fre-quently ignored in randomized trials. We illustratethe properties of the risk-stratified imputationthrough a series of simulation studies. These meth-ods were also applied to the CREST data and demon-strate that the magnitude of this particular bias inthis particular trial was minimal (perhaps because ofa lack of differences in treatment efficacy).

Methods and simulation studiesSurvival model

Let T1, T2, . . . , Tn denote the event times for the nsubjects in the study. In the survival model, anobservation will be considered censored if the eventdoes not occur during the study period, that is, thesubject completes the study without an event. Anobservation will be considered withdrawn if the sub-ject terminates participation prior to the end of thestudy period and prior to the occurrence of an event.(This is in contrast to many survival analyses, wherewithdrawals are counted as part of the censoringprocess.) An observation will be considered a failureif an event occurs prior to the censoring and with-drawal times. Let C1, C2, . . . , Cn and W1, W2, . . . , Wn

denote the corresponding potential censoring andwithdrawal times, respectively. The observed datafor subject i are Yi =min Ti, Ci, Wið Þ. Thus, the classi-fication of an individual as failure, withdrawn, orcensored is as in Table 1.

The survival data were generated using a modelwith a separate event hazard h(t) and withdrawal

hazard w(t) as a function of time t. The hazardfunctions incorporated both a treatment x1 and acovariate x2 given by the functions c and f as

c x1, x2ð Þ= a0 + a1x1 + a2x2

f x1, x2ð Þ= b0 + b1x1 + b2x2

where a0, a1, and a2 are the coefficients associatedwith events, and b0, b1, and b2 are the coefficientsassociated with withdrawals. The treatment andcovariate were categorical variables with two levels(0, 1) and three levels (21, 0, 1), respectively. Theevent times and withdrawal times were generatedusing exponential hazard functions

h tð Þ= h0 tð Þexp t1c xð Þ½ �

w tð Þ= U 0, t2½ � � exp f tð Þ½ �

The tuning parameters t1 and t2 were chosen sothat about 10%–15% of the subjects would havefailure, about 10% would withdraw, and the remain-ing 75%–80% would be censored by the end of thestudy, where these parameter values were chosen toprovide proportions that are similar to the propor-tions in our motivating example.

The five scenarios discussed above can be imple-mented by appropriate choices of the a and b para-meters, specifically:

1. Both the event hazard and withdrawal hazardwere independent of the treatment and ofthe covariate, so that a1 =0, a2 =0, b1 =0, andb2 =0.

2. The event hazard was dependent on the treat-ment, while the withdrawal hazard was inde-pendent of the treatment and the covariate, sothat a1 = 1, a2 = 0, b1 = 0, and b2 =0:

3. The event hazard was dependent on the treat-ment and the covariate, while the withdrawalhazard was independent of the treatment andthe covariate, so that a1 =1, a2 =1, b1 =0, andb2 =0.

4. The event hazard was dependent on the treat-ment and the covariate, while the withdrawal

Table 1. Classification of individuals within the survival model

Ti�Ci Ti � Ci

Wi.Ti Wi�Ti Wi�Ti Wi.Ti

Wi�Ci Wi.Ci

Event type Failure Withdraw Withdraw Censored Censored

Risk-stratified imputation 3

http://ctj.sagepub.com Clinical Trials 2013; 0: 1–10

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 5: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

hazard was dependent on the treatment, so thata1 =1, a2 =1, b1 =1, and b2 =0.

5. Both the event hazard and the withdrawalhazard were dependent on the treatment andthe covariate, so that a1 =1, a2 =1, b1 =1, andb2 =1.

A series of r =10,000 replicates were generated foreach scenario. There were 200 observations (sub-jects) for each level of the treatment and covariatefor a total sample size of 1200 for each replication.

Imputation

In addition to straightforward analyses of the result-ing data, three different imputation strategies wereused. The first was the bootstrap imputationdescribed by Taylor et al. [3], where a bootstrap sam-ple (of the same size as the original data) was createdby selecting with replacement from observations inthe original simulated data. For each censored sub-ject (either from withdrawal or lack of an event dur-ing the study period), a pool of observations wascreated consisting of all individuals experiencingfailure or censoring with time greater than the cen-soring time for the subject to be imputed. A singlesubject was randomly selected from the pool, andthe values for the subject to be imputed were thenset to the outcome (observed event versus censoredoutcome) and follow-up time (at the time of theoutcome) for the selected subject. In the bootstrapimputation, bootstrapping was used to introduceuncertainty to the observed data being employedfor imputation. This differs from the typical applica-tion of bootstrapping to approximate a distributionnonparametrically and requires only a small num-ber of bootstrap samples. For this analysis, the boot-strapping was performed m =10 times. The secondstrategy was the auxiliary variable imputationdescribed by Hsu et al. [5]. Using a proportionalhazards (PH) model, two risk scores were definedbased on the relationship of the auxiliary variablesto the failure time and the censoring time, respec-tively. These risk scores are weighted and then usedto define a nearest neighborhood (NN = 10 basedon the original description of the algorithm) aroundthe individual to be imputed, from which the impu-tation values are randomly selected using themethod of Taylor et al. [3]. The third strategy wasthe risk-stratified imputation, our proposed newapproach. For this approach, a pool of observationswas created consisting of all individuals with thesame treatment and covariate values as the subjectbeing imputed and experiencing failure or censoringwith an event or censoring time greater than thewithdrawal time for the subject being imputed tocondition on survival up to the time of withdrawal.

A single subject was randomly selected from thepool, and the values for the subject to be imputedwere then set to the values for the selected subject.The risk-stratified imputation was performed m =10times. Hence, one difference between our approachand that of Taylor is that rather than selecting a ran-dom person from the treatment group, we select arandom person from the treatment group with asimilar value of the covariate associated with bothoutcome and censoring (in the motivating example,person with the same treatment and symptomaticstatus). The bootstrap, auxiliary variable, and risk-stratified imputation are depicted in Figure 1(a) to(c) to illustrate these differences.

Analysis

Data were analyzed using a Cox PH model [13] withthe treatment and covariate. Model fitting was per-formed using the survival package version 2.35 [14]in the R programming environment version 2.10[15]. For the three imputation methods, eachimputed dataset was analyzed separately. The mimputed datasets were then combined using Rubin’sformulae [16]. For comparison, a fourth analysis wasconducted on the original simulated data by remov-ing subjects who withdrew (case-wise deletion).

Performance comparison

Performance of the four analytic approaches wasevaluated using the root mean square error (RMSE),percent coverage for the treatment effect, and theaverage length of the 95% confidence interval. TheRMSE is a standardized metric for assessing the per-formance of an estimate relative to a known value,computed as

RMSE =

ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiPri =1

ai � a1ð Þ2

r

vuuut

where a1 is the true value of the coefficient for treat-ment effect, and ai is the estimated coefficient onthe ith of r replications. Smaller values of the RMSEindicate statistics with reduced bias compared to lar-ger values. A second measure of performance wasthe coverage. For each of the r replicates, the meanand standard error for the estimated ai in the Coxmodel were used to construct a 95% confidenceinterval across the m imputations using Rubin’s for-mulae [16]. The percent coverage, which assessesthe accuracy of the variance estimate, was the per-centage of confidence intervals containing the truevalue of a1, computed as

4 RE Kennedy et al.

Clinical Trials 2013; 0: 1–10 http://ctj.sagepub.com

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 6: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

Figure 1. (Continued)

Risk-stratified imputation 5

http://ctj.sagepub.com Clinical Trials 2013; 0: 1–10

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 7: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

Coverage =

Pri =1

# CL, i�a1�CU , if g

r

where # is the count of members in the set, and CL, i

and CU , i are the lower and upper boundaries of the95% confidence interval for a1 respectively on theith replicate, so that the coverage should be 95% forour analyses. Lower coverage would indicate falselyreduced variance estimates, resulting in an excessive(inappropriate) number of positive test results, whilehigher coverage would indicate overestimation ofthe variance, resulting in a loss of power to detectsignificant treatment differences. The average lengthof the confidence interval was calculated as

Mean CI Interval=

Pri =1

CU , i � CL, ij j

r

with smaller values for the average interval indicatingtighter confidence bounds and increased statisticalpower to detect significant treatment differences.

Results

Results for each of the four analytic strategies (case-wise deletion, Taylor’s bootstrap imputation, Hsu’sauxiliary variable imputation, and our proposed risk-stratified imputation) for each of the five simulationscenarios are given in Table 2. For the first three sce-narios (where the assumptions of survival analysisare uniformly met), the rate of withdrawal did notdepend on the treatment or the covariate, corre-sponding to a CCAR pattern of withdrawals. Asexpected, all four analytic strategies gave accurateestimates of the treatment effect. When the rate ofwithdrawal depended on the treatment or on thetreatment and the covariate, analysis using only thecomplete data (after removing withdrawals) led tobias in estimation of the treatment effect, whichwould also be expected. All three imputation strate-gies led to accurate estimates of the treatment effect,although the RMSE statistic indicates that the risk-stratified imputation slightly outperformed the boot-strap and auxiliary variable imputation in accuracy.Of concern, the bootstrap imputation consistently

Figure 1. (a) Graphical flowchart of the bootstrap imputation, illustrating the imputation of all censored observations (including with-

drawals) and lack of stratification. (b) Graphical flowchart of the auxiliary variable imputation, illustrating the imputation of all censoredobservations (including withdrawals) and lack of stratification. (c) Graphical flowchart of the risk-stratified imputation, illustrating theimputation of only withdrawals after stratification based on covariate(s) of interest.

6 RE Kennedy et al.

Clinical Trials 2013; 0: 1–10 http://ctj.sagepub.com

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 8: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

had larger than expected coverage and average confi-dence interval length across all five scenarios, imply-ing that the Taylor’s approach appears to beuniformly liberal in testing. This appears to be due toa combination of greater error in estimating themean and overestimation of the variance using thismethod. The performance of auxiliary variable impu-tation was more complex, with larger than expectedcoverage in the first two scenarios, but smaller thanexpected coverage for the last two scenarios wherewithdrawal was related to treatment and the covari-ate. This appears to be due to greater error in estimat-ing the mean, together with overestimation of thevariance in the first two scenarios and excessivereduction in variance in the last two scenarios.

Application of method to true CRESTdata

We applied the multiple imputation approach devel-oped herein to the CREST, which compared carotid

artery stenting (CAS) to carotid endarterectomy(CEA) to reduce cardiovascular outcomes in 2502patients with high-grade carotid stenosis [10]. In thisstudy, the proportion of patients withdrawing wasrelated to treatment (p = 0.0019), with 68 (5.4%) ofthe 1262 patients randomized to receive stenting(CAS) withdrawing consent or being lost to follow-up, compared to 106 (8.6%) of the 1240 assigned toCEA doing so [17]. Enrollment was initially limitedto symptomatic patients; after approximately 30%of the patients had been recruited, eligibility wasbroadened to enrollment of asymptomatic patientsto reduce withdrawals. This implied that asympto-matic patients were recruited disproportionatelyduring the period in which withdrawals from thestudy had been minimized, creating a covariate(symptomatic vs asymptomatic status) that was asso-ciated with a lower (p = 0.0043) proportion of with-drawal for the asymptomatic patients (5.4%) thanfor the symptomatic patients (8.3%). The combinedeffect of a higher withdrawal rate in the CEAtreatment group and a higher withdrawal of

Table 2. Estimated values for the treatment effect (a1) and performance measures under each of the five simulation scenarios

Complete

data

Risk-

stratified

imputation

Bootstrap

imputation

(Taylor et al. [3])

Auxiliary

variable

imputation

(Hsu et al. [5])

Scenario 1:

a0 =1, a1 =0, a2 =0;

b0 =1, b1 =0, b2 =0

Failure: 19%;

withdraw: 7%;

censored: 75%

Mean

(SE)

0.00042

(0.0949)

0.00049

(0.0950)

0.00057

(0.1007)

0.00074

(0.0921)

RMSE 0.0949 0.0950 0.1006 0.0921

Coverage 95.0 94.9 99.2 98.8

Mean CI length 0.3692 0.3696 0.5746 0.4954

Scenario 2:

a0 =0, a1 =1, a2 =0;

b0 =1, b1 =0, b2 =0

Failure: 13%;

withdraw: 7%;

censored: 80%

Mean

(SE)

1.0039

(0.0013)

1.0058

(0.0013)

1.0098

(0.0013)

0.9534

(0.0013)

RMSE 0.1261 0.1265 0.1336 0.1357

Coverage 95.0 94.9 99.2 97.6

Mean CI length 0.4928 0.4934 0.7565 0.6671

Scenario 3:

a0 =0, a1 =1, a2 =1;

b0 =1, b1 =0, b2 =0

Failure: 17%;

withdraw: 7%;

censored: 76%

Mean

(SE)

0.9983

(0.0011)

1.0019

(0.0011)

1.0056

(0.0012)

0.9081

(0.0011)

RMSE 0.1102 0.1103 0.1180 0.1416

Coverage 95.1 95.2 99.3 94.7

Mean CI length 0.4344 0.4349 0.6780 0.5811

Scenario 4:

a0 =0, a1 =1, a2 =1;

b0 =0, b1 =1, b2 =0

Failure: 16%;

withdraw: 13%;

censored: 71%

Mean

(SE)

0.9230

(0.00114)

0.9968

(0.00114)

0.9335

(0.00122)

0.8532

(0.00113)

RMSE 0.1373 0.1143 0.1386 0.1850

Coverage 88.8 94.5 98.2 88.2

Mean CI length 0.4451 0.4449 0.6916 0.6050

Scenario 5:

a0 =0, a1 =1, a2 =1;

b0 =0, b1 =1, b2 =1

Failure: 16%;

withdraw: 18%;

censored: 66%

Mean

(SE)

0.9383

(0.00114)

0.9912

(0.00115)

0.9522

(0.00122)

0.8757

(0.00114)

RMSE 0.1293 0.1150 0.1306 0.1683

Coverage 90.8 94.9 98.5 91.5

Mean CI length 0.4419 0.4412 0.6849 0.6065

RMSE: root mean square error; SE: standard error; CI: confidence interval.

a0, a1, and a2 represent the intercept (base rate), treatment effect, and nuisance covariate effect on the rate of events, respectively, while b0, b1, and b2

represent the corresponding effects on the rate of withdrawals.

Risk-stratified imputation 7

http://ctj.sagepub.com Clinical Trials 2013; 0: 1–10

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 9: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

symptomatic patients implies that a higher propor-tion of high-risk patients have been removed fromthe CEA group relative to the CAS group, therebysystematically biasing the study to the benefit of theCEA group. Under the alternative hypothesis thattreatment affected outcome, this would be similar tothe fifth scenario in our simulations. In the originalanalysis of the CREST dataset, the Kaplan–Meier esti-mate of the hazard ratio was not significantly differ-ent between the two groups (hazard ratio forstenting versus endarterectomy = 1.11; 95% confi-dence interval = 0.81–1.51; p = 0.51). Reanalysis ofthe CREST data using risk-stratified imputation withsymptomatic status as the covariate for stratificationgave an estimated hazard ratio associated with treat-ment of 1.10 (95% confidence limits about this esti-mated hazard were 0.54–2.23), while bootstrapimputation gave a slightly higher hazard ratio of1.16 (and wider 95% confidence limits of 0.49–2.77). As such, while withdrawals related to sympto-matic status could potentially introduce bias, thesefindings support that this is not the case for theCREST study. In this motivating example, the differ-ences in treatment efficacy were not apparent, andthis may contribute to the lack of differences inresults from the alternative approaches. However,this lack of difference provided support for the claimmade in the primary study paper that the treatmentsdid not differ.

Discussion

In clinical trials (and other longitudinal studies), asubstantial number of subjects may withdraw priorto the conclusion of the study. Such subjects mustbe properly dealt with in the analytic phase to pre-vent biased estimates of the treatment effect. Inaddition, issue could be raised that not includingwithdrawals from trials in the analysis is a violationof the intention to treat principle (where all rando-mized patients should be included in the analysis intheir assigned treatment group). We have raised theissue of changing enrollment criteria specifically inthe CREST, but would suggest most trials have poli-cies that could naturally lead to changing withdra-wal rates over time. It is common for trials to striveto improve methods to reduce withdrawals; as such,as a study progresses, a goal is to constantly work tohave the withdrawal (censoring) rates change tolower levels. The alternative, not constantly strivingto reduce the proportion of patients withdrawingfrom a study, is simply not consistent with goodtrial conduct. However, maintaining a constanthigh withdrawal rate across the duration of thestudy would avoid this issue addressed herein. Inaddition, it is common (as in the CREST) that elig-ibility criteria will change, reflecting experience in

recruitment as the trial progresses. In the CREST, thebroadened eligibility criteria were important toensure the generalizability of the results (approxi-mately 80% of revascularizations performed in theUnited States are done in asymptomatic patients,who were initially excluded from our trial) and toassist in meeting recruitment goals (recruitment waslagging substantially when eligibility was broadenedto include asymptomatic patients). To the extentthat the eligibility criteria are associated with eventrates, the combination of the successful changesduring the conduct of the trial to reduce withdra-wals and changes also during the conduct of thetrial in eligibility criteria holds the possibility to biasmany randomized trials.

In survival analysis, withdrawals are often viewedas a form of censoring. If the withdrawal mechan-ism is not related to survival, the usual methodsmay be applied to the data, but withdrawal that isdependent on covariates related to survival requiresadjustments to reduce bias due to differential cen-soring among groups. Robins and Finkelstein [18]proposed a method for censoring that depends oncovariates related to survival, the inverse probabilityof censoring weighted (IPCW) method, in whicheach observation is weighted by the inverse of theindividual being censored at time t. Weights areconstructed using a PH model of the relationshipbetween censoring and auxiliary variables andappear more sensitive to misspecification of the cen-soring model than other methods. An alternativeapproach is that of multiple imputation, which iscommonly used in fields outside of survival analy-sis. Taylor et al. [3] proposed a bootstrap imputationprocedure to replace censored and withdrawn obser-vations in survival analysis. For this and other impu-tation methods, it should not be assumed thatsubjects drop out for reasons unrelated to the treat-ment; we have provided the example of the CRESTstudy to demonstrate how this assumption may beviolated. We also describe a modified imputationprocedure, the risk-stratified imputation, to addressthe problem that arises in studies where withdrawaldepends on the treatment.

Our imputation procedure extends the approachof Taylor et al. [3] in two critical ways:

1. The bootstrap imputation imputes values for allcensored patients, corresponding to both cen-sored and withdrawal groups in our classifica-tion scheme. In contrast, the risk-stratifiedimputation imputes values only for the withdra-wal group; there is no imputation for those inthe censored group. The former imputes a sub-stantial portion of the individuals, requiring alarge pool from which imputed values may bedrawn. An insufficiently large pool could falselyreduce the variance estimates and give an

8 RE Kennedy et al.

Clinical Trials 2013; 0: 1–10 http://ctj.sagepub.com

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 10: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

inflated amount of power. The latter will imputeonly values for subjects who withdraw but notfor subjects who are censored. This greatlyincreases the pool of potential subjects fromwhich the imputed value may be drawn and, asseen in our simulations, would potentially avoidthe false reduction in variance that can occurwith the bootstrap imputation. This modifica-tion also makes the bootstrapping step of Tayloret al. [3] unnecessary.

2. The bootstrap imputation imputes by randomlyselecting a subject from the study, who is stillactive at the time of censoring. Our procedurestratifies the risk of the subjects as a function ofthe covariate and imputes by randomly selectinga subject with a similar value of the covariate. Ifthe covariate is related to the risk of events, thisprocedure imputes data for the withdrawn per-son using a person who is at similar risk, so thatwe are using a risk-stratified random person. Inaddition, by stratifying based on treatment, therisk-stratified imputation only selects from apool of individuals within the same treatmentgroup to estimate the survival time if a subjecthad not withdrawn but without considerationof whether the individual remained on his orher assigned treatment. In contrast, the boot-strap imputation selects from a pool of indivi-duals at risk at the time a subject withdrew,which may include individuals from differenttreatment groups. Although the former poolwould naturally seem to be more representativeof the withdrawn subject than the latter pool,there is no empirical data to justify one over theother.

Hsu et al. [5] proposed modifications to theapproach of Taylor et al. [3] to include auxiliary vari-ables in the imputation process. Although catego-rical variables can be used, their method primarilyfocuses on continuous auxiliary variables. This dif-fers from our risk-stratified imputation focused oncategorical auxiliary variables, in which the poolof potential values consists of all individualswithin the stratum. Furthermore, Hsu et al. [5] per-form imputation for all censored patients, whilethe risk-stratified imputation only imputes valuesfor the withdrawal group. Wang et al. [19] proposean alternative approach for hot-deck imputationusing predictive mean matching (PMM) to includecovariates in the imputation process, albeit in thecontext of imputing recurrent unobserved eventsrather than a single terminal event. As with Hsuet al. [5], this method focuses primarily on contin-uous covariates and performs imputation for allmissing events.

As seen in Table 2, all three methods for dealingwith coarsened data perform well when the

withdrawal is unrelated to the treatment. In such cir-cumstances, it is well known that analysis of com-plete cases is sufficient for obtaining unbiasedestimates of the treatment effect [18]. Analysis ofcomplete cases is also well known to be inappropri-ate when withdrawal is related to treatment or to arelated covariate [20]. However, the appropriatenessof various imputation methods is unclear. In oursimulations, all three imputation methods lead tounbiased estimates of the treatment effect when thewithdrawal depends on treatment. The RMSE statis-tic favors estimates from the risk-stratified imputa-tion over the bootstrap imputation and auxiliaryvariable imputation, although the differencebetween the first two is not large. However, thebootstrap imputation and auxiliary variable imputa-tion also lead to problems in estimating the var-iance, reflected by excessively large or excessivelysmall coverage. In the former case, the imputationof a large number of individuals (particularly with-out stratification) may lead to pools for imputationcontaining individuals sufficiently dissimilar to theone being imputed to affect results. In the latter case,the imputation of a large number of individuals inthese procedures may falsely reduce the variance ofthe sample, and the bootstrapping step may not besufficient to address this problem. Although the lat-ter was only observed with auxiliary variable impu-tation in our simulations, it has been reported withbootstrap imputation with a censoring rate lowerthan we used [3]. In contrast, our risk-stratifiedimputation imputes only the much smaller numberof individuals who withdrew, so that the varianceand coverage remain appropriate. The use of riskstratification in selecting the imputed valuesaddresses the requirement for survival analysis thatthe individuals who withdrew are representative ofall other individuals in the risk group with similarexposure variables. However, as with other uses ofstratification, our risk-stratified imputation is appro-priate for a small number of categorical covariates;use of high-dimensional covariates would naturallylead to excessively small pools for imputation andcorresponding increase in the variance. Conversely,use of a small number of covariates could potentiallylead to biased estimates due to large pools for impu-tation that contain individuals dissimilar to the sub-ject whose values are to be imputed, but the lowvalues of the RMSE and appropriate coverage in oursimulations suggest this effect is minimal. Reanalysisof the CREST data shows that both the bootstrapimputation and the risk-stratified imputation givesimilar hazard ratio estimates as the original analy-sis. It is likely that the imputation techniques do notshow a strong advantage over the original analyticmethod due to the lack of differences in efficacybetween the two treatments. This is similar to Hsuet al. [5] who found no difference between treatment

Risk-stratified imputation 9

http://ctj.sagepub.com Clinical Trials 2013; 0: 1–10

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from

Page 11: Clinical Trials …kofi/reprints/ClinicalTrials.pdf · 2014. 11. 29. · Downloaded from ctj.sagepub.com at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014. representative of all

and placebo in AIDS data from the AIDS ClinicalTrial Group–019 (ACTG-019) clinical trial using mul-tiple imputation. However, as with our simulations,the bootstrap imputation leads to wider confidencelimits than the risk-stratified imputation in theCREST data. This provides evidence favoring therisk-stratified imputation in a real dataset andfurther strengthens the conclusion drawn from oursimulation studies.

Conclusion

We propose a risk-stratified imputation procedurefor addressing the problem of nonrandom withdra-wals in the context of survival analysis. Thisapproach eliminates the bias present using analysesthat assume random withdrawal, while avoiding theover- or underestimation of the variance that canoccur with bootstrap and auxiliary variable imputa-tion. The risk-stratified imputation will facilitate theanalysis of many clinical trials involving time-to-event data, in which one group experiences a higherwithdrawal rate and the withdrawal rate is related totreatment.

FundingThis work was partially supported by the NationalInstitute of Neurological Disorders and Stroke(NINDS; grant number RO1 NS 038384), by supple-mental funding from Abbott Vascular Solutions (for-mally Guidant) including donations of Accunet andAcculink Systems equivalent to approximately 15%of the total study costs, and by the National Heart,Lung, and Blood Institute (NHLBI; grant numberT32 HL 072757).

Conflict of interestThe authors declare that there is no conflict ofinterest.

References

1. Collett D. Modelling Survival Data in Medical Research.Chapman & Hall/CRC, Boca Raton, FL, 2003.

2. Rubin D. Multiple imputation after 18+ years. J Am StatAssoc 1996; 91: 473–89.

3. Taylor JMG, Murray S, Hsu C-H. Survival estimation andtesting via multiple imputation. Stat Probabil Lett 2002;58: 221–32.

4. Little RJA, Rubin DB. Statistical Analysis with MissingData. Wiley, New York, 1987.

5. Hsu C-H, Taylor JM, Murray S, Commenges D. Survivalanalysis using auxiliary variables via non-parametric mul-tiple imputation. Stat Med 2006; 25: 3503–17.

6. Heitjan D, Rubin D. Ignorability and coarse data. AnnStat 1991; 19: 2444–53.

7. Tsiatis AA. Semiparametric Theory and Missing Data, ch. 7.Springer, New York, 2006.

8. Little R. A test of missing completely at random for mul-tivariate data with missing values. J Am Stat Assoc 1988;83: 1198–202.

9. Potthoff RF, Tudor GE, Pieper KS, Hasselblad V. Canone assess whether missing data are missing at random inmedical studies? Stat Methods Med Res 2006; 15: 213–34.

10. Sheffet AJ, Roubin G, Howard G, et al. Design of theCarotid Revascularization Endarterectomy vs. StentingTrial (CREST). Int J Stroke 2010; 5: 40–46.

11. Perez A, Dennis RJ, Gil JFA, Rondon MA, Lopez A. Useof the mean, hot deck and multiple imputation techni-ques to predict outcome in intensive care unit patients inColombia. Stat Med 2002; 21: 3885–96.

12. National Research Council. The Prevention and Treat-ment of Missing Data in Clinical Trials. Panel on HandlingMissing Data in Clinical Trials. Committee on NationalStatistics, Division of Behavioral and Social Sciences andEducation. The National Academies Press, Washington,DC, 2010.

13. Cox DR. Regression models and life-tables. J Roy Stat SocB 1972; 34: 187–220.

14. Therneau TM, Grambsch PM. Modeling Survival Data.Springer-Verlag, New York, 2000.

15. R Development Core Team. R: A Language and Environ-ment for Statistical Computing. R Foundation for StatisticalComputing, Vienna, 2009.

16. Rubin DB. Multiple Imputation for Nonresponse in Surveys.John Wiley & Sons, New York, 1987.

17. Brott TG, Hobson RW, Howard G, et al. Stenting versusendarterectomy for treatment of carotid-artery stenosis.N Engl J Med 2010; 363: 11–23.

18. Robins JM, Finkelstein DM. Correcting for noncompli-ance and dependent censoring in an AIDS clinical trialwith inverse probability of censoring weighted (IPCW)log-rank tests. Biometrics 2000; 56: 779–88.

19. Wang C-N, Little R, Nan B, Harlow SD. A hot-deck mul-tiple imputation procedure for gaps in longitudinal recur-rent event histories. Biometrics 2011; 67: 1573–82.

20. Enders CK. A primer on the use of modern missing-datamethods in psychosomatic medicine research. PsychosomMed 2006; 68: 427–36.

10 RE Kennedy et al.

Clinical Trials 2013; 0: 1–10 http://ctj.sagepub.com

at UNIV OF MARYLAND BALTIMORE CO on November 29, 2014ctj.sagepub.comDownloaded from